A New Solution to Spurious Regressions *

A New Solution to Spurious Regressions * Shin-Huei Wang a Carlo Rosa b Abstract This paper develops a new estimator for cointegrating and spurious regressions by applying a two-stage generalized Cochrane-Orcutt transformation based on an autoregressive approximation framework, even though the exact forms of error terms are unknown in practice. We prove that our estimator is consistent, and its rate of convergence is substantially improved when compared to existing estimation methods. By using our new estimation, we further show that a convergent usual t statistic can be constructed for the spurious regression cases analyzed by Granger and Newbold (1974) and Granger et al. (2001). The implementation of our methodology is extremely easy since it does not rely on the long-run variance framework. The simulation results indicate that the finite sample performance of our methodology is promising even in sample sizes as small as 50 observations. JEL classification: C10; C15 Keywords: Spurious regression; Cointegration; Generalized Cochrane-Orcutt estimation; AutoRegressive approximation * We appreciate the useful comments from Richard Baillie, George Dotsis, Cheng Hsiao, Wolfgang Hardle, Hong Yong Miao, John Nankervis, Hashem Pesaran, Giovanni Verga, Mark Watson, Qiwei Yao and the seminar participants at CORE, University of Nottingham and Humbolt University. a Corresponding author. CORE, Université Catholique de Louvain and CeReFim, FUNDP, Belgium. Email: shinhuei.wang@uclouvain.be. b University of Essex, Colchester, United Kingdom. Email: crosa@essex.ac.uk. Webpage: http://sites.google.com/site/carlorosa1. 1

1. Introduction The long-run relation between economic time series often plays an important role in macroeconomics and finance. Additionally, most macroeconomic and financial models often imply that certain variables are cointegrated as defined by Engle and Granger (1987). Empirical tests, however, often fail to reject the null hypothesis of no cointegration for these variables. One possible explanation of these test results is that the error term has a unit-root. 1 In these cases, when the error term is a non-stationary I(1) process but structural parameters can be recovered, the regression is called a structural spurious regression. Another related issue that is pervasive in the time series literature is the danger of obtaining spurious correlation findings. A spurious correlation occurs when a pair of independent series, each of them nonstationary or strongly autoregressive, are found apparently to be related according to standard inference in an OLS regression. 2 The purpose of this work is to propose a new estimator for cointegrating and spurious regressions that can successfully address both problems described above. More specifically, we propose a two-stage generalized Cochrane-Orcutt transformation estimator based on an autoregressive AR(k) approximation framework (henceforth CO-AR estimator) that can cover a wide class of regression models. This CO-AR estimator turns out to be a robust procedure with respect to error specifications even when the regressors and regressand are highly persistent (or possibly unit-root) processes. We now briefly describe the implementation of the two-stage CO-AR estimator. First, we start by taking the first difference of both the dependent and explanatory variables of our regression model, and use standard OLS to estimate the slope coefficients. Then, we construct fitted residuals as the difference between the level of the dependent variable and the level of the explanatory variables multiplied by the estimated regression coefficients. Second, we fit an AR(k) model to these residuals, where the order k of the 1 For instance, the error term may contain a unit-root because of a nonstationary measurement error in one variable or nonstationary omitted variables (see Choi et al, 2008). 2 Since the seminal contribution by Yule (1926) on nonsense correlations between time series, it has been shown that spurious regression results may occur not only for a pairs of independent unit root processes (see Granger and Newbold 1974, for a simulation study and Phillips, 1986, for a theoretical explanation) but also for other persistent processes, such as I(2) (Haldrup, 1994), or even positively autocorrelated (stationary) autoregressive series (Granger et al., 2001). Some important applications of spurious regressions in economics and finance, though this list is by no means exhaustive, include Plosser et al. (1982), Plosser and Schwert (1978), Ferson et al. (2003), Hendry (1980), and Valkanov (2003). 2

approximation is determined by minimizing the value of an information criterion. Berk (1974) and Bauer and Wagner (2008) show that an AR model can well approximate an I(0) and I(1) process respectively. By doing so, we can filter the error term by an AR(k) model no matter what the properties of the error are, and even if some I(0) or I(1) omitted variables are included in the error term. Third, we conduct a generalized Cochrane-Orcutt transformation (of order k) of both the dependent and independent variables. Finally, we run a standard OLS regression on the transformed variables. Note that as a matter of fact, the spirit of our methodology follows the suggestion of Granger (2001) who states that the proper reaction to having a possible spurious relationship is to add lagged dependent and independent variables until the errors appear to be white noise. The main findings of the paper can be summarized as follows. First, we derive the consistency and the limit distribution of the CO-AR estimator when a) the dependent, the explanatory variables and the innovations are stationary, but potentially highly persistent (or near I(1)), b) the dependent, the explanatory variables and the innovations are unit-root non-stationary series, and c) the dependent and the explanatory variables are unit-root non-stationary series and the innovations are stationary. As a corollary, we show that the t statistic of the slope coefficients is convergent and follows the standardized Gaussian N(0,1) distribution, even when the regressors and regressand are highly persistent (or possibly unit-root) processes. In sum, our proposed CO-AR estimator represents a new solution to the spurious regression problem, as examined in the influential articles of Granger and Newbold (1974) and Granger et al. (2001). 3 Second, we investigate the finite sample performance of our CO-AR estimator in a simulation exercise. The Monte Carlo experiments confirm our theoretical findings. We find that in spurious regressions the conventional significance tests based on standard least squares estimation and inference are seriously biased towards rejection of the null hypothesis of no relationship (and hence acceptance of a spurious relationship), even when the series are generated as statistically independent. The t test based on our CO-AR estimator, however, remains approximately distributed as a standardized Normal N(0,1) without using the long-run variance framework, even in samples as small as 50 observations. Hence, the simulation results indicate the size control of our methodology is excellent even in small samples. We also find that in 3 Interestingly, our paper provides an asymptotic theory for McCallum s intuition (2010) that by properly taking into account the residuals autocorrelation it is possible to uncover spurious relationships. 3

cointegrating relationships the CO-AR is as good as the other three conventional estimators, dynamic OLS (Saikkonen, 1991, and Stock and Watson, 1993), corrected GLS, and Feasible GLS (Choi et al., 2008, hereafter CHO-FGLS) when the innovation follows either a white noise or an AR(1) process, and significantly outperforms the CHO-FGLS when the innovation follows a stationary ARMA process or nonstationary ARIMA dynamics. Several papers have studied the problem of spurious regressions. Two recent contributions analyze an approach to correcting spurious regressions involving nonstationary regressors and error terms. Choi et al. (2008) propose the CHO-FGLS estimator that is consistent not just when the regression error is stationary but also when it is unit-root nonstationary. Our CO-AR estimator generalizes the CHO-FGLS estimator along two dimensions. First, the slope coefficients of the preliminary step are recovered using the corrected GLS estimator rather than the standard OLS regression. By doing so, the CO-AR estimator remains consistent, but is more robust than the CHO-FGLS when the innovations are highly persistent. Second, we approximate the unknown error term using an AR(k) model. This second generalization of the CHO-FGLS implies that the CO-AR estimator can be successfully applied to a much wider class of regression models. In particular, contrary to the CHO-FGLS that is only valid when the error term follows an AR(1) process, the CO-AR estimator provides consistent and accurate estimation results when the error term follows a stationary (but potentially highly persistent) ARMA(p,q) process as well as a nonstationary ARIMA(p,1,q) process. The fact that the CO-AR estimator can cover general regression models is especially appealing in empirical work where the determination of the exact form of the error terms is often unknown, and the errors rarely follow a simple AR(1) process. Deng (2005) considers the spurious regression issue when the regressors follow an ARMA(1,1) process where both the AR root and the MA root are close to unity, and derives a t test with a well defined distribution under the null by employing the fixed-bandwidth long-run variance framework. Since the limiting distribution of Deng s t statistic critically depends on the long-run variance estimator, its implementation requires the selection of a suitable bandwidth parameter and a kernel function. Instead, the CO-AR version of the usual t test follows the standardized Gaussian N(0,1) distribution, and does not rely on the long-run variance estimator. Therefore, we can completely ignore the procedure of choosing a suitable 4

bandwidth parameter and a kernel function, and in applied work the CO-AR t test can be straightforwardly implemented. The rest of the paper is organized as follows. Section 2 starts by establishing the asymptotic properties of the new CO-AR estimator for cointegrating and spurious regressions, and constructing a convergent usual t statistic in spurious regressions. Section 3 investigates the finite sample performance of the CO-AR estimator through Monte Carlo experiments. Section 4 contains some concluding comments. An Appendix provides proofs of the results given in the paper. 2. The statistics and the main results The objective of this section is to establish the asymptotic properties of the CO-AR estimator in both spurious and cointegrating regressions, and to construct a convergent usual t statistic in spurious regressions as analyzed by Granger and Newbold (1974) and Granger et al. (2001). 2.1. Spurious regressions with highly persistent regressors and errors Consider the regression Model I: y t = β'x t + u t where the regressors x t are highly persistent with positive autocorrelations, and the error term u t in Model I satisfies the following Assumption 1: Assumption 1. u t is generated as: φ(l) u t = θ(l) e t (1) where (i) the autoregressive (AR)- and moving average (MA)-polynomials φ(l) and θ(l) are assumed to have all roots outside the unit circle; (ii) φ(l) and θ(l) have no common roots; (iii) e t is an i.i.d. process with E(e t ) = 0, E(e 2 t ) = σ 2 e, and E(e 4 t ) <. 5

Assumption 1 guarantees that the conditions in Theorem 2 of Berk (1974) hold and allows us to represent the ARMA process u t as: and to approximate u t by a fitted AR(k) as follows: 0 (2) 1 We are now in the position to illustrate our two-stage Cochrane-Orcutt GLS estimator as follows: Stage I. We start by taking the first difference of the dependent and explanatory variables, and compute the standard OLS estimate of the coefficients β, denoted as, for the following equation: y t = β' x t + u t. The symbol stands for 1 L, where L is the lag operator. This procedure can be viewed as GLS corrected estimation described in Choi et al. (2008). Note that when x t and u t are uncorrelated, the estimator is consistent. Stage II. Construct the series of fitted residuals. Then, approximate the errors by an AR(k) model, i.e. transformation of the regressors y t and x t :. Subsequently, conduct the following Cochrane-Orcutt y Consider OLS estimation of the regression, x., where the OLS estimator of β is computed as β. Therefore, when x t and u t are uncorrelated, the asymptotic properties of the CO-AR estimate of is given in Theorem 1. 6

THEOREM 1. If the data generating processes satisfy Model I and Assumption 1, x t and u t are uncorrelated, then as T and /, the CO-AR estimate of for the model I is consistent at the convergence rate /. Moreover, the following Theorem 2 pertains to the regression where a highly persistent (or near I(1)) variable y t is regressed on an independent highly persistent (or near I(1)) variable x t and the coefficient β equals zero. THEOREM 2. Assume that x t and y t are a pair of highly persistent (or near I(1)) variables and independent of each other, as T, β /. Theorem 1 shows that the CO-AR estimate of β is consistent and converges at rate of T 1/2 when the Data Generating Process (DGP) satisfies Model I. Theorem 2 indicates that when a highly persistent (or near I(1)) variable is regressed on another independent highly persistent (or near I(1)) variable, the CO-AR estimate of the coefficient β is consistent at the rate T 1/2. This asymptotic result improves significantly when compared to the previous studies of Phillips (1998) and Deng (2005) where 1. Frankly speaking, the spirit of our methodology follows the suggestion of Granger (2001) saying that the proper reaction to having a possible spurious relationship is to add lagged dependent and independent variables until the errors appear to be white noise. 2.2. A Convergent t test for the spurious regression between two highly persistent variables In this section, we build up the convergent usual t statistic by using the CO-AR slope estimate of β and the resulting standard error, where, = - and the null hypothesis tested is H 0 : β = 0. We then conclude our findings in Theorem 3. 7

THEOREM 3. Assume that x t and y t are a pair of highly persistent (or near I(1)) variables and indepent of each other, as T and /. β 0,1. The most important result from Theorem 3 consists in building up a convergent t statistic that is asymptotically normally distributed without using an estimate of the long-run variance. The associated finite sample performance will be analyzed in Section 3. 2.3. Spurious regressions with I(1) regressors and I(1) errors Most macroeconomic and financial models often imply that certain variables are cointegrated as defined by Engle and Granger (1987). 4 However, cointegration tests often fail to reject the null hypothesis of no cointegration for these variables. Choi et al. (2008) pointed out one possible explanation of these empirical results is that the error is unit-root non-stationary due to a non-stationary measurement error in one variable or non-stationary omitted variables. Recall that a regression with nonstationary stochastic errors is defined spurious in the time series literature. Consider the following Model II: y t = β'x t + u t where x t and u t are nonstationary I(1) processes satisfying the following Assumption 2: Assumption 2. x t and u t are generated as: φ(l) (1 L) u t = θ(l) e t (3) where (i) the autoregressive (AR)- and moving average (MA)-polynomials φ(l) and θ(l) are assumed to have all roots outside the unit circle; (ii) φ(l) and θ(l) have no common roots; (iii) e t is an i.i.d. process with E(e t ) = 0, E(e 2 t ) = σ 2 e, and E(e 4 t ) <. 4 There are many examples where cointegration is expected to hold, including for instance the long-run relationship linking real money balances to real income and interest rate, the ratio of relative prices and an exchange rate (i.e. purchasing power parity), the relation between spot and futures prices, and equity prices and dividends. 8

By applying Theorem 3 of Bauer and Wagner (2008) that shows that an ARIMA(p, 1, q) can also be approximated by an AR model as (2), we summarize the asymptotic properties of the CO-AR estimate of β in Model II in Theorem 4. THEOREM 4. If the data generating processes satisfy Model II and Assumption 2, x t and u t are uncorrelated, then as T and log /, the CO-AR estimate of β for the Model II is consistent at the convergence rate /. Theorem 4 implies that the CO-AR estimator is not only consistent (while the limiting distribution of the OLS estimator is inconsistent), but also its rate of convergence is substantially higher than the convergence rates of both the GLS and CHO-FGLS estimators that are of order (see Choi et al., 2008). Furthermore, the CO-AR estimator is generally more accurate than the GLS and CHO-FGLS estimators. On the one hand, the corrected GLS is the ideal cure for spurious regression when the error term follows a random walk process. However if the error term does not follow a pure random walk process, differencing the data can result in a misspecified model. On the other hand, the CHO-FGLS estimator has promising performance for regressions with a stationary AR(1) error term. However in empirical work the determination of the exact form of the error terms is often unknown, and the errors rarely follow a simple AR(1) process. Therefore in practice both the GLS and the CHO-FGLS estimators should be used with extreme care. Instead, the performance of the CO-AR estimator is extremely good not only for stationary AR(1) or random walk error terms but also for much more general innovations, such as stationary ARMA(p,q) processes as well as nonstationary ARIMA(p,1,q) processes. 2.4. A convergent t test for spurious regression by Granger and Newbold (1974) Since the Monte Carlo study by Granger and Newbold (1974) and the asymptotic theory developed by Phillips (1986) it is well known that the usual t statistic for a regression between independent I(1) processes does not have a limiting distribution but diverges at the rate of T 1/2 as the sample size T increases. 9

In this section we establish a convergent standard t statistics by using the CO-AR estimation for a regression between two independent I(1) processes. In effect, we permit y t and x t to be rather general integrated processes (of order one) whose differences are weakly dependent and possibly heterogeneously distributed innovations. This includes a wide variety of data-generating mechanisms, such as the ARIMA (p, 1, q) model. The following Theorem 5 shows that by using the CO-AR estimator, a convergent t statistic can be constructed and is asymptotically normal distributed for a regression between two independent I(1) processes. On the other hand, using our CO-AR methodology to build up the convergent standard t statistic can avoid the difficult issues of choosing a suitable bandwidth parameter and a kernel function. THEOREM 5. Assume that x t and y t are a pair of I(1) variables and independent of each other, then as T and log /, β 0,1. 2.5. Regressions with I(1) regressor and I(0) error In this Section we consider the asymptotic distribution of the CO-AR estimator under the assumption of cointegration, i.e., the error term u t in Model II is an I(0) process. THEOREM 6. If the data generating processes follow Model II but u t is an I(0) process satisfying assumption 1, x t and u t are uncorrelated, then as T and /, the CO-AR estimate of β is consistent at the convergence rate T, i.e. β. Theorem 6 shows that the CO-AR estimation method is also extremely useful for cointegration analysis. For instance, in a cointegration model, the CHO-FGLS estimator performs very well when the error term follows an AR(1) process, but the CHO-FGLS estimator is consistent at the convergence rate T 1/2 only. Similarly, the OLS estimator is consistent at the convergence rate T, however, when the error term follows an AR(1) process or an ARMA (p,q) process, the OLS estimator causes inaccurate estimation results easily. 10

3. Simulation results From the above analysis, we show that CO-AR estimation is a robust procedure with respect to error specifications even when the regressors and regressand are highly persistent (or possibly unit-root) processes. We further derive a convergent t statistic for the spurious regression cases analyzed by Granger and Newbold (1974) and Granger et al. (2001). In this section we use simulations to show, for several typical cases, that the test-rejection probabilities of the CO-AR estimator are very close to the true values even for sample sizes as small as 50 observations. Moreover, we analyze the finite-sample properties of our CO-AR estimator in cointegrating relationships with serially correlated errors and spurious regressions, and compare its performance to the other estimators (OLS, GLS and CHO-FGLS) discussed in Choi et al. (2008). 3.1 Spurious regressions In this section we use simulations to study the finite sample performance of the t statistic of our CO- AR estimator compared to the t statistic of the standard least squares estimation and inference (OLS). By adopting the same experimental design of Granger et al. (2001), suppose that X t and Y t are generated by the following independent processes: DGP 1: DGP 2: DGP 3: DGP 4: x t = e t x t = 0.95 x t-1 + e t x t = 0.99 x t-1 + e t x t = x t-1 + e t where x t stands for either X t or Y t, and e t is drawn from independent N(0, 1) populations, but as shown later normality is not a particularly relevant feature. In words, X t or Y t are generated respectively by a white noise process (DGP 1), a strongly positively autocorrelated autoregressive series (DGP 2 and 3), and a random walk process (DGP 4). The number of iterations in each simulation is 20,000. To avoid the problem of fixing X 0 and Y 0, in each replication the first 100 observations are discarded, and X -100 = Y -100 = 0. We consider sample sizes of 50, 100 and 500 observations. The lag length of the AR(k) approximation of the error term is determined by applying the Bayesian information criterion (BIC) to each replication with k ranging from 0 11

to some maximal order k max. 5 According to Theorem 3 and Theorem 5, k max increases with the sample size, and we can set / and log /. Therefore, the maximum number of lags to be used in the selection procedure is set respectively at 3 (when the sample consists of 50 observations), 4 (when the sample is 100), 8 (when the sample is 500). Table 1 shows the percentage of rejection of the null hypothesis of no linear relationship between Y t and X t at the 5% critical value, i.e. absolute value of t statistic greater than 1.96, using both OLS (left) and the CO-AR estimator (right). The series X t and Y t are generated independently employing DGP 1 to 4. In the interest of brevity we summarize the most interesting aspects of the results in the table. First, the t test based on the OLS estimate is well-behaved as long as either X t or Y t are white noise processes. Second, as long as both X t and Y t display serial correlation over time, the OLS t test is spuriously biased towards rejection. This issue becomes very serious when both pair of series have strong temporal properties. Moreover, the percentage of spurious relationships tend to increase with the sample size. Third, and this represents the novel aspect of Table 1, the empirical size of the t test based on the CO-AR estimator is very close to the theoretical size of 5%. Put it differently, our estimator can control almost perfectly the spurious regression problem, especially for a sample size of 500 observations. Table 1 here To assess the robustness of the previous conclusions, we also use different distributions for the error terms. In particular, each pair of shocks remains i.i.d. and independent of each other, but the distribution is drawn from N(0, 3), Student t with five degrees of freedom, and Laplace (with mean equals zero and variance equals two) populations respectively. All gave very similar results compared to Table 1. More specifically, the t test based on OLS is seriously biased towards rejection, while the t test based on the CO- AR estimator can greatly control the spurious regression problem. Table 2 illustrates these findings for the case of the Student-t distribution with 5 degrees of freedom (additional results are available upon request). Table 2 here 5 The BIC is given by 2 ln T where stands for the value of the log of the likelihood function with parameters estimated using observations. 12

Finally, it is interesting to notice that even in finite samples, for all combinations of DGPs 1 to 4, and all error distributions, the t statistics of the CO-AR estimator is almost perfectly distributed as a N(0, 1). For instance, Figure 1 plots the empirical distribution of the OLS and CO-AR estimators for four different cases. First, when both Y and X series are serially uncorrelated the finite sample distribution of both OLS and CO-AR estimators is very close to the asymptotic N(0, 1) (top panel of Figure 1). Second, when both X and Y are positively autocorrelated autoregressive series (middle panels), the OLS estimator is still centered at zero (unbiased) but displays fat tails. On the other hand, the CO-AR estimator remains distributed as a N(0, 1). Finally, the t statistic of the OLS estimator does not converge to a standard normal distribution as the sample size increases (from 50 observations to 500 observations, cf. bottom panel). Hence, we conclude by emphasizing that our CO-AR estimator tends to have the appropriate rejection rate for all critical values, not only for the t-value of 1.96. Figure 1 here 3.2 Finite sample performance of the four estimators Consider the following regression model: y t = β x t + γ v t + e t (4) where v t = ( x 1,t-k,, x 1,t,, x 1,t+k, x m,t-k,, x m,t,, x m,t+k ), γ = (γ 1,-k,, γ 1,0,, γ 1,k,, γ m,-k,, γ m,0,, γ m,k ), and x t is an m-vector integrated of order 1, i.e. I(1), process. Note that the inference procedure about the coefficients β substantially differs according to the different assumptions on the error term e t. When the error term e t is stationary, the Equation (4) is a cointegrating regression with potentially serially correlated error. When the error term e t is a unit-root process, the Equation (4) is a spurious regression. Choi et al. (2008) discuss three methods to estimate the structural parameters β: 1. Dynamic Ordinary Least Squares (DOLS): Regress y t on x t and v t to get. 2. Corrected Generalized Least Squares (GLS): Regress y t on x t and v t to get. 3. Feasible Generalized Least Squares (CHO-FGLS): Let the residual from the DOLS regression be denoted by. Then run the following AR(1) regression, and compute the OLS coefficient. Apply the Cochrane-Orcutt transformation to the data: 13

Finally, regress on and to get. This section presents results from a Monte Carlo simulation study about the finite sample properties of the estimators compared to the other three estimators, and. For the ease of comparability, we adopt the same experimental design as Choi et al. (2008). More specifically, in the simulation we consider the case when x t is an I(1) scalar variable, and generate v t and u t from two independent standard normal distributions. The structural parameter is set to β = 2 and γ v t = 0.5v t. The error process e t is specified as follows:: DGP a: DGP b: DGP c: DGP d: DGP e: e t = u t e t = 0.95 e t-1 + u t e t = e t-1 + u t e t = 0.5 e t-1 + 0.5 e t-2 + u t e t = 0.95 e t-1 + u t where u t N(0,1). In words, the error term is a white noise (DGP a), an AR(1) process with autoregressive coefficient set to 0.95 (DGP b), a random walk (DGP c), an AR(2) that has a unit root (DGP d), and an ARIMA(1,1,0) (DGP e). Note that the first two cases correspond to a cointegrating relationship. The number of iterations in each simulation is 5,000, and in each replication 100+n observations are generated (n = 50, 100, and 500), of which the first 100 observations are discarded. The lag length of the AR(k) approximation of the error term u t is determined by applying the Bayesian information criterion to each replication. The maximum number of lags to be used in selection procedure is set respectively at 3 (when n = 50), 4 (when n = 100), 8 (when n = 500). Table 3 shows the bias and the root mean square error (RMSE) of all four estimators. As we would expect, the DOLS estimator is the best one when the error process is DGP a, while the GLS is the best estimator when the error process is DGP c. As shown by Choi et al. (2008), the CHO-FGLS estimator is almost as good as the DOLS estimator in cointegrating relationships and significantly outperforms the DOLS estimator in spurious regressions. Hence, the CHO-FGLS estimation is a robust procedure with respect to error specifications. The novel feature of Table 3, however, is the finite sample performance of the CO-AR 14

estimator (see column heading CO-AR ). This new estimator is extremely robust with respect to the order of integration of the error terms. If the error follows a white noise process, the RMSE of the CO-AR is similar to the CHO-FGLS and DOLS estimators. If the error follows a very persistent AR(1) process (DGP b) or a unit-root process (DGP c), the RMSE of the CO-AR tends to be slightly smaller than the CHO-FGLS and GLS estimators, and much smaller than the DOLS procedure. Not surprisingly, when the error process follows a complex unit-root dynamics (cases DGP d and DGP e), the CO-AR estimator performs very well compared to all the three other estimators proposed by Choi et al. (2008), especially for large sample sizes. Table 3 here The adequacy of an approximate model for the DGP of the error term depends on the choice of the order of the AR approximation. In particular, different lag selection methods for the order of the AR approximation may potentially lead to drastically different conclusions regarding the finite sample performance of the CO-AR estimator. Therefore, we also investigate the sensitivity of our Monte Carlo results to the choice of the lag selection criterion by applying the Akaike information criterion (AIC), the Hannan-Quinn information criterion (HQIC), 6 and by using the true order of the approximation (i.e. zero lags for the error process specified in DGP a, one lag for DGP b and DGP c, and two lags for DGP d and DGP e). Table 4 reports the bias and the RMSE of the CO-AR estimator when the error process e t is specified in DGP a to e, and the selection of the lag length of the AR approximation is determined by AIC, HQIC and the true lag length. In the interest of brevity we only discuss the most interesting aspects of Table 4. Overall we find that the results are broadly in line with the findings displayed in Table 3. First, the performance of the CO-AR estimator is remarkably robust with respect to the different information criteria employed. However, when the error term follows a white noise process, the RMSE of the CO-AR estimator using the true lag length is halved compared to the BIC lag selection. Second, the performance of the BIC is broadly similar to the HQIC, and both criteria slightly dominate in terms of the RMSE metric the AIC lag selection. Table 4 here 6 See Ng and Perron (2005) for further details. 15

To sum up, this paper develops a new robust estimator for structural parameters in dynamic regressions. The simulation study indicates that the CO-AR estimator is particularly useful in all applied situations in macroeconomics and finance when the researcher is not fully sure whether the error term follows a persistent stationary ARMA process (cointegrating relationship) or a unit-root dynamics (spurious regression). 4. Concluding remarks This paper proposes a new estimator for cointegrating and spurious regressions by applying a twostage generalized Cochrane-Orcutt transformation based on the autoregressive approximation framework developed by Berk (1974) and extended by Bauer and Wagner (2008). We prove that our estimator is consistent, and its rate of convergence is substantially improved compared to existing estimation methods. We further show that a convergent usual t statistic, asymptotically distributed as N(0,1), can be constructed for the spurious regression cases analyzed by Granger and Newbold (1974) and Granger et al. (2001). Importantly, the CO-AR version of the t test does not rely on the long-run variance estimator, and therefore it can be easily implemented. Moreover, the CO-AR estimation method turns out to be a robust procedure with respect to error specifications even when the regressors and regressand are highly persistent (or possibly unit-root) processes. The fact that the CO-AR estimator can cover general regression models is especially appealing in empirical work where the determination of the exact form of the error terms is often unknown, and the errors rarely follow a simple AR(1) process. Finally, the simulation results indicate that the finite sample performance of the CO-AR methodology is promising even though the sample size is as small as 50 observations. Of course, several important issues are not considered in this paper and deserve further study. For instance, in future work it will be important to investigate the choice of k, the order of the AR model for approximating I(1) processes, allowing alternative selection criteria, such as the modified AIC and BIC developed in Bauer and Wagner (2008), or a different maximum number of lags to be used in the selection procedure. Moreover, we believe that the AR approximation framework could be applied to the issue of the spurious regressions involving other persistent processes, such as I(2) series. 16

References Bauer, D., Wagner, M. (2008). Autoregressive Approximations of Multiple Frequency I(1) Processes. Working Paper, University of Wien. Berk, K. N. (1974). Consistent Autoregressive Spectral Estimates. The Annals of Statistics 2, 489-502. Brockwell, P. J., Davis, R. A. (1991). Time Series: Theory and Methods, 2nd Edn. New York: Springer- Verlag. Choi, C.-Y., Hu, L., Ogaki, M., 2008. Robust estimation for structural spurious regressions and a Hausmantype cointegration test. Journal of Econometrics 142, 327-351. Cochrane, D., Orcutt, G.H., 1949. Application of least squares regression to relationships containing autocorrelated error terms. Journal of the American Statistical Association 44, 32-61. Deng, A., 2005. Understanding spurious regression in financial economics. Working paper, Boston University. Engle, R.F., Granger, C.W.J., 1987. Cointegration and error correction. representation, estimation, testing. Econometrica 55 (2), 251-276. Ferson, W.E., Sarkissian. S., Simin, T., 2003. Spurious regressions in financial economics? Journal of Finance 58 (4), 1393-1414. Granger, C.W.J., 2001. Spurious regressions. In A Companion to Theoretical Econometrics, ed. B. Baltagi. Blackwell Publishers. Granger, C.W.J., Hyung, N., Jeon, H., 2001. Spurious regressions with stationary series. Applied Economics 33, 899-904. Granger, C.W.J., Newbold, P., 1974. Spurious regressions in econometrics. Journal of Econometrics 74, 111-120. Haldrup, N., 1994. The asymptotics of single-equation cointegration regressions with I(1) and I(2) variables. Journal of Econometrics 63, 153 81. Hamilton, J.D., 1994. Time series analysis. Princeton University Press, New Jersey, USA. Hendry, D.F., 1980. Econometrics: alchemy or science? Economica 47, 387 406. McCallum, B.T., 2010. Is the spurious regression problem spurious? Economics Letters 107, 321-323 Nabeya, S., Perron, P., 1994. Local asymptotic distributions related to the AR(1) model with dependent errors, Journal of Econometrics 62, 229-264. Ng, S., Perron, P. (2005). A Note on the Selection of Time Series Models. Oxford Bulletin of Economics and Statistics 67 (1), 115-134. Phillips, P.C.B., 1986. Understanding spurious regressions in econometrics. Journal of Econometrics 33, 311-340. Phillips, P.C.B., 1998. New tools for understanding spurious regressions. Econometrica 66 (6), 1299-1325. 17

Plosser, C.I., Schwert, G.W., White, H., 1982. Differencing as a test of specification. International Economic Review 23, 535-552. Plosser, C.I., Schwert, G.W., 1978, Money income and sunspots: measuring economic relationships and the effects of differencing. Journal of Monetary Economics 4, 637-660. Robinson, P.M., 1998. Inference-without-smoothing in the presence of nonparametric autocorrelation. Econometrica 66, 1163-1182. Saikkonen, P., 1991. Asymptotically efficient estimation of cointegration regressions. Econometric Theory 7, 1-21. Stock, J.H., Watson, M.W., 1993. A simple estimator of cointegrating vectors in higher order integrated systems. Econometrica 61, 783 820. Valkanov, R., 2003. Long-horizon regressions: theoretical results and applications. Journal of Financial Economics 68, 201-232. Yule, G.U., 1926. Why do we sometimes get nonsense correlations between time series? A study in sampling and the nature of time series. Journal of the Royal Statistical Society 89, 1 64. 18

Figure 1 Empirical distribution of OLS and CO-AR estimators compared to the N(0,1) theoretical probability density function OLS CO-AR Regression: Y1 on Constant and X1 (n = 50).4 Kernel N(0,1).4 Kernel N(0,1) Density.3.2 Density.3.2.1.1.0-4 -2 0 2 4.0-3 -2-1 0 1 2 3 t-statistics t-statistics Regression: Y2 on Constant and X2 (n = 50).4 Kernel N(0,1).4 Kernel N(0,1) Density.3.2 Density.3.2.1.1.0-12 -8-4 0 4 8 12.0-3 -2-1 0 1 2 3 t-statistics t-statistics Regression: Y4 on Constant and X4 (n = 50).4 Kernel N(0,1).4 Kernel N(0,1) Density.3.2 Density.3.2.1.1.0-15 -10-5 0 5 10 15.0-3 -2-1 0 1 2 3 t-statistics t-statistics Regression: Y4 on Constant and X4 (n = 500).4 Kernel N(0,1).4 Kernel N(0,1) Density.3.2 Density.3.2.1.1.0-40 -30-20 -10 0 10 20 30 40 t-statistics.0-3 -2-1 0 1 2 3 t-statistics NOTE: Finite sample distributions versus asymptotic distributions: t-statistics for the slope coefficient. The tables in the first three rows use 50 observations as sample size, while the bottom tables uses 500 observations. The series X and Y are generated independently. Their DGPs (respectively 1, 2, and 4) are specified in Section 3.1 where each pair of shocks is drawn from independent N(0,1) populations. The number of iterations is 20,000. To avoid the problem of fixing X 0 and Y 0, 100 pre-samples are generated and let X -100 = Y -100 = 0. The order of the AR approximation is selected using BIC. The maximum number of lags to be used in the selection procedure is set respectively at 3 (when n = 50), 4 (when n = 100), 8 (when n = 500). The densities are calculated using kernel estimate. 19

Table 1 Spurious regression with Normal distributions, percentage of t > 1.96 OLS CO-AR sample size = 50 sample size = 50 Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 X1 5.2 5.4 5.7 5.5 X1 5.3 6.3 6.8 6.9 X2 5.7 56.5 60.5 61.6 X2 5.8 6.3 7.0 6.6 X3 5.8 60.1 65.0 65.5 X3 5.7 6.1 6.0 6.7 X4 5.5 61.2 65.5 67.2 X4 5.8 6.1 6.0 6.3 sample size = 100 sample size = 100 Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 X1 5.2 5.2 5.5 5.5 X1 5.1 5.5 5.6 6.1 X2 5.4 61.5 66.9 67.8 X2 4.9 5.9 6.0 6.1 X3 5.3 67.3 72.9 74.6 X3 5.2 5.7 5.6 6.0 X4 5.4 67.3 75.3 76.8 X4 5.6 5.6 5.6 5.6 sample size = 500 sample size = 500 Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 X1 4.8 5.1 5.0 5.0 X1 4.9 4.9 3.9 4.1 X2 5.2 65.5 72.4 73.8 X2 5.1 5.0 5.5 5.6 X3 4.9 72.2 82.4 85.8 X3 4.7 5.2 5.4 5.1 X4 4.8 73.7 85.5 89.6 X4 4.9 4.8 4.9 5.2 NOTE: The table displays the percentage of rejection, i.e. absolute value of t-value greater than 1.96. The left-hand side employs the standard least squares estimation and inference. The right-hand side employs our CO-AR estimation and inference. The top tables use 50 observations as sample size, the tables in the middle use 100 observations, while the bottom tables use 500 observations. The error terms are drawn from independent N(0,1) populations. The number of iterations is 20,000. To avoid the problem of fixing X 0 and Y 0, 100 pre-samples are generated and let X -100 = Y -100 = 0. The order of the AR approximation is selected using BIC. The maximum number of lags to be used in the selection procedure is set respectively at 3 (when n = 50), 4 (when n = 100), 8 (when n = 500). 20

Table 2 Spurious regression with t Student (5 degrees of freedom) distributions, percentage of t > 1.96 OLS CO-AR sample size = 50 sample size = 50 Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 X1 5.8 5.5 5.4 5.6 X1 5.3 6.3 6.9 6.9 X2 5.7 56.9 60.1 61.3 X2 6.0 6.5 6.3 6.5 X3 5.4 60.1 64.0 65.4 X3 6.1 6.1 6.2 6.2 X4 5.5 61.4 66.0 67.6 X4 5.7 6.0 6.6 6.4 sample size = 100 sample size = 100 Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 X1 5.2 5.3 5.5 5.2 X1 5.1 5.3 5.7 6.1 X2 5.5 61.3 66.8 68.1 X2 5.6 5.8 6.3 6.0 X3 5.4 68.2 73.2 74.8 X3 5.4 5.4 5.7 5.9 X4 5.5 68.2 75.0 77.2 X4 5.2 5.7 5.4 5.8 sample size = 500 sample size = 500 Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 X1 4.9 5.1 4.7 5.3 X1 5.0 4.6 3.8 4.2 X2 4.9 64.6 71.9 73.7 X2 5.0 5.0 5.3 5.8 X3 4.9 72.4 82.3 85.1 X3 5.2 5.0 5.2 5.1 X4 5.0 74.2 85.8 89.5 X4 5.2 5.0 4.8 5.0 NOTE: The table displays the percentage of rejection, i.e. absolute value of t-value greater than 1.96. The left-hand side employs the standard least squares estimation and inference. The right-hand side employs our CO-AR estimation and inference. The top tables use 50 observations as sample size, the tables in the middle use 100 observations, while the bottom tables use 500 observations. The error terms are drawn from independent Student t populations with 5 degrees of freedom. The number of iterations is 20,000. To avoid the problem of fixing X 0 and Y 0, 100 pre-samples are generated and let X -100 = Y -100 = 0. The order of the AR approximation is selected using BIC. The maximum number of lags to be used in the selection procedure is set respectively at 3 (when n = 50), 4 (when n = 100), 8 (when n = 500). 21

Table 3 The bias and RMSE of four estimators DGP n DOLS GLSC CHO-FGLS CO-AR (sample size) Bias RMSE Bias RMSE Bias RMSE Bias RMSE 50 0.0005 0.0473-0.0019 0.2104 0.0007 0.0485-0.0002 0.0837 a 100 0 0.0237-0.0036 0.1443 0.0001 0.0239-0.0008 0.0452 500 0 0.0048-0.0003 0.0632 0 0.0048 0 0.0128 50-0.0034 0.5121-0.0044 0.1801-0.0043 0.2326-0.0034 0.1816 b 100 0.0016 0.3163-0.0049 0.128-0.004 0.1425-0.0043 0.1245 500-0.0008 0.0839-0.0006 0.0612-0.0006 0.0483-0.0006 0.0495 50 0.012 2.1758-0.0045 0.1802-0.0095 0.6394-0.0047 0.1853 c 100 0.0078 1.6748-0.0052 0.1279-0.0068 0.3664-0.0051 0.1308 500-0.0237 1.1293-0.0009 0.061-0.002 0.1139-0.0007 0.0616 50 0.0086 1.4469-0.0036 0.1706-0.008 0.5008-0.006 0.2078 d 100 0.0058 1.1142-0.0046 0.1168-0.0037 0.3233-0.0045 0.1085 500-0.0161 0.7522-0.0006 0.0517-0.003 0.1388-0.0002 0.0479 50 0.3054 38.928-0.006 0.7436-0.0647 10.4763-0.0021 0.1981 e 100 0.1248 30.7221-0.0112 0.5539-0.0152 5.6082-0.0014 0.1352 500-0.4597 21.9171-0.0059 0.2678-0.0234 1.088-0.0008 0.0626 Note: The order of the AR approximation is selected using BIC. The maximum number of lags to be used in the selection procedure is set respectively at 3 (when n = 50), 4 (when n = 100), 8 (when n = 500). The number of replications is 5,000. 22

Table 4 The bias and RMSE of the CO-AR estimator DGP n CO-AR (True) CO-AR (AIC) CO-AR (HQIC) (sample size) Bias RMSE Bias RMSE Bias RMSE 50 0.0005 0.0473-0.0003 0.0851-0.0007 0.0823 a 100 0 0.0237-0.0008 0.0471 0.0004 0.0468 500 0 0.0048 0.0001 0.0132-0.0001 0.0131 50-0.0038 0.1801-0.0039 0.1853 0.0032 0.1831 b 100-0.0041 0.124-0.0034 0.1272 0.0027 0.129 500-0.0006 0.0494-0.0006 0.05 0.0011 0.0496 50-0.005 0.1835-0.0043 0.1904 0.0027 0.1886 c 100-0.0052 0.13-0.005 0.1346 0.0019 0.1358 500-0.0007 0.0615-0.0006 0.0624 0.0019 0.0608 50-0.0044 0.1553-0.005 0.1676 0.0006 0.1696 d 100-0.0045 0.1082-0.0049 0.1093 0.0004 0.1112 500-0.0003 0.0479-0.0004 0.0481 0.0011 0.0472 50-0.0019 0.1911-0.0019 0.2029 0.0004 0.2033 e 100-0.001 0.1322-0.0003 0.1402 0.0012 0.1371 500-0.0009 0.0625-0.0011 0.0637 0.0018 0.0631 Note: This table considers two criteria for choosing a suitable AR(k) model: the true order of the approximation and the Akaike (AIC) or Hannan-Quinn (HQ) information criterion. The true order of the approximation is 0 (for DGP a), 1 (for DGP b), 1 (for DGP c), 2 (for DGP d), and 2 (for DGP e). The maximum number of lags to be used in the AIC or HQ selection procedure is set respectively at 3 (when n = 50), 4 (when n = 100), 8 (when n = 500). The number of replications is 5,000. 23