Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem Prof. Massimo Guidolin 20192 Financial Econometrics Winter/Spring 2018
Overview Stochastic vs. deterministic trends The random walk process Isolating and removing trends and the associated perils Spurious regressions Unit root tests 2
Trends in Time Series Can the methods of lectures 2-4 be applied to nonstationary series? o No, the inference would be invalid, and the costs are considerable o If nonstationary is simply ignored, it can produce misleading inferences because they will be spurious Nonstationarity may be linked to the presence of trends in time series processes Time series often contain a trend, a possibly random component with a permanent effect on a time series 1 Deterministic trends, functions (linear or non-linear) of time, t, ; for instance, polynomials where εε tt is a white noise process o The trending effect caused by functions of t is permanent and impresses a trend because time is obviously irreversible, e.g., o The solution to the difference equation is ηη ττ 3
Trends in Time Series The long-term forecast of yy tt will converge to the trend line, δδtt, so that this type of model is said to be trend stationary 2 Stochastic trends, which characterize all processes that can be written as: o Because, presence of a stochastic trend, implies a random walk with drift: yy tt+1 = μμ + yy tt + εε tt+1 Therefore a stochastic trend is not a RW, but it implies its presence o A RW is the non-stationary variant of AR(1) with μμ = φφ 0 and φφ 1 = 1 A determinist trend also implies the presence of a stochastic trend, but stochastic trends may arise on their own Since all values of εε ττ carry a coefficient of unity, the effect of each shock on the intercept term is permanent, which is indeed the intrinsic nature of a trend If shocks are never forgotten and time series have infinite memory both deterministic and stochastic trends are non-stationary, denoted as I(d), d 1 4
Trends in Time Series Because unit root tests are sensitive to the presence of deterministic trends, this distinction is not just classification Useful unit root tests will manage to tell deterministic time trends apart from stochastic ones to generate decompositions such as These 6 series are generated from the same sequence of IID N(0,1) shocks 5
The Random Walk Process The RW is the key (but not only) type of non-stationary process o While its conditional mean is well-defined, its unconditional mean explodes: EE tt [yy tt+1 ] = μμ + EE tt [yy tt ] + EE tt [εε tt+1 ] = μμ + yy tt EE tt [yy tt+ss ] = μμ + EE tt [yy tt+ss 1 ] +EE tt [εε tt+ss ] = 2μμ + EE tt [yy tt+ss 2 ] + EE tt [εε tt+ss +εε tt+ss 1 ] = = ssμμ + yy tt (s > t) so it depends on t and if there is no drift (μμ = 0), then EE tt [yy tt+1 ] = EE tt [yy tt+ss ] o Also unconditional variance depends on time and it explodes as t o Also autocovariances and autocorrelations display pathological patterns: As t, there is perfect memory, 6
The Random Walk Process o The ACF of a RW shows, especially in small samples, a slight tendency to decay that would make one think of a stationary AR(p) process with a sum of the AR coefficients close to 1 In realistic samples, not possible to use the ACF to distinguish between a unit root process and a stationary, near I(1) process Suppose you have already established that a time series contains a trend: how do you estimate (remove) it/them to apply the decomposition 7
De-Trending a Series: Deterministic vs. Stochastic In trend-stationary case, de-trending is simply done by OLS For stochastic trends, consider the RW with drift process and take its first difference: The result is a white noise plus a constant intercept (the drift) This approach is more general:
De-Trending a Series: Deterministic vs. Stochastic o If you want to see what a I(2) process may look like, see Appendix A De-trended series Simulated quadratic trend o The two de-trended series are identical in the two plots because they had been generated to be identical, and they are white noise
Pitfalls in De-Trending Applications Serious damage in a statistical sense can be done when the inappropriate method is used to eliminate a trend 1 When a time series is I(d) but an attempt is made to remove its stochastic trend by fitting deterministic time trend functions, the OLS residuals will still contain one or more unit roots o Deterministic de-trending does not remove the stochastic trends o For instance: o Even when μμ = δδ, the stochastic trend remains Linear trend 10
Pitfalls in De-Trending Applications 2 When a time series contains a deterministic trend but is otherwise I(0), (trend-stationary) and an attempt is made to remove the trend by differentiating the series d times, the resulting differentiated series will contain d unit roots in its MA components o It will therefore be not invertible o Differentiating a trend- stationary series, creates new stochastic trends that are shifted inside the shocks of the series 11
Pitfalls in De-Trending Applications o Even when the trend-stationary component is absent, if the time series is I(0) but it is incorrectly differenced d times, the resulting differentiated series will contain d unit roots in its MA components What if yy tt ~II(dd) but by mistake we differentiate it d + r times? 3a -- If r > 0, we are over-differencing the series, and as such 2 applies, that is, the resulting over-differentiated series will contain r unit roots in its MA components and will therefore be not invertible 3b -- If r < 0, we are not differencing the series enough and the resulting series will still contain d r and will remain nonstationary Why is it that we care so much for isolating and removing trends? It turns out that, at least in general, using I(d) series with d > 0 in standard regression analysis, in general exposes us to the peril of invalid inferences We speak of spurious regressions Suppose that yy tt ~II(1) and xx tt ~II(1), e.g., stock prices and GDP 12
The Spurious Regression Problem You estimate a regression of yy tt on xx tt,, expecting the errors (say, ηη tt ) to be white noise, as required by OLS, but instead: yy tt ~II(1) xx tt ~II(1) o The very error terms of a regression are I(1)! o This occur unless very special conditions occur, see below A spurious regression has the following features: 1 The residuals are I(1) and as such any shock is a permanent change of the intercept of the regression, in no way news 13
The Spurious Regression Problem 2 Standard OLS estimators are inconsistent and the associated inferential procedures are invalid and statistically meaningless 3 The regression has a high R 2 and t-statistics that appear to be significant, but the results are void of any economic meaning o Do not fall in the spurious regression trap, do not just boast huge R- squares, in a finance they are more often symptoms of problems o This is not a small sample problem; in fact, these issues worsen as the sample size grows o These ideas generalize, at the cost of technical complexity when one would try and regress an I(d) series on another I(d) series o Or when we regress a deterministic trend on another trend The cure of the problem is to work with stationary first/ddifferenced series o E.g, we generate two independent sets of IID white noise variables and use them to simulate 1000 observations from two driftless RWs o The two RWs are expected to be unrelated 14
The Spurious Regression Problem o If you object the series seem to be trending in similar directions, you know that it is not it just chance and a good dose of visual illusion o The estimated regression of one RW on the other gives Regression residuals o When the series are differentiated, a regression provides no explanatory power: 15
The Spurious Regression Problem o The sample ACF of the regression residuals are: I(1)! White noise 16
Testing for Unit Roots: the Dickey-Fuller Test SACFs are downward biased Box-Jenkins cannot be used o Dickey and Fuller (1979) offer a test procedure to take the bias in due account, based on a Monte Carlo design, under the null of a RW Their method boils down to estimate by OLS the regression: α/se(α ) = number of standard deviations away from 0 o The one-sided t-statistic of the OLS estimate of α is then compared to critical values found by DF by simulations under the null of a RW E.g., if the estimated is 0.962 with a standard error of 0.013, then the estimated α is -0.038 and the t-statistic is -0.038/0.013 = -2.923 According to DF s simulations, this happens in less than 5% of the time, under the null of an RW, but in more than 1% of the simulations This is a rather unlucky event under the null of a RW and this may lead to a rejection of the hypothesis, with a p-value btw. 0.01 and 0.05 We therefore use a standard t-ratio taking into account that under the null of a RW, its distribution is nonstandard and cannot be analytically evaluated 17
Testing for Unit Roots: Augmented DF Test The classical DF test suffers from one rigidity: given the null hypothesis of a RW, the alternative hypothesis is specified as an AR(1) stationary process It is possible to use the DF tests in more general cases o Appendix B shows that through a sequence of adding-and-subtract operations, it is possible to re-write a general AR(p) process as: We can test for the presence of a unit root using the DF test, although this is called augmented Dickey-Fuller (ADF) test o ADF implements a parametric correction for high order correlation o DF show that the asymptotic distribution of the t-ratio for α is independent of the number of lagged first differences included In fact, the appropriate tables (simulated statistic) to use depend on the deterministic components included in the regression 18
Unit Root Tests on US Real Stock Prices and Earnings Monthly, 1871 2016 US S&P real stock prices, dividends & earnings o When the number of lags is selected with SBIC we have p = 7 and o A positive estimate of α with a t-statistic of 0.565 implies a failure to reject the null of a unit root (the ADF 5% critical value is negative) o But ΔPP tt is then stationary (α is estimated at -0.56) o US real stock prices contain a unit root efficient markets hypothesis 19
Other Unit Root Tests o In the case of real earnings, ADF test that includes an intercept gives an estimate of α of 0 with a t-ratio of 10.196 which leads to a failure to reject the null of a unit root o The presence of a time trend cannot be ruled out on theoretical grounds an ADF test also including a linear time trend, gives an estimate of α of -0.002 which is -1.900 standard deviations away from 0 and that does not allow us to reject the null of a unit root Phillips and Perron (1988) propose a nonparametric method of controlling for serial correlation when testing for a unit root that is an alternative to the ADF test o Classical DF test + modify the t-ratio of α so that serial correlation in the residuals does not affect the asymptotic distribution of the test o See lecture notes for PP test statistic o Null hypothesis remains a unit root Kwiatkowski, Phillips, Schmidt, and Shin (1992) have proposed a testing strategy under the null of (trend-) stationarity 20
Other Unit Root Tests KPSS statistic is based on residuals from a regression of the series on exogenous, deterministic factors: KPSS test is: Re-examine whether S&P real stock prices, aggregate earnings, and aggregative dividends give evidence of a unit root, with PP tests: t Intercept and trend All series contain a unit root and this should be taken into account KPSS tests lead to the same conclusion even though the null differs Although rejecting the null of a unit root does not imply accepting the alternative hypothesis of stationarity, ADF-type and KPSS tests are sufficiently different to occasionally contradict each other 21