Abstract The purpose of this paper is to study the exact finite sample properties of estimators and test statistics for regression coefficients of spu

Size: px

Start display at page:

Download "Abstract The purpose of this paper is to study the exact finite sample properties of estimators and test statistics for regression coefficients of spu"

Joel Bridges
6 years ago
Views:

1 THE GAUSS-MARKOV THEOREM AND SPURIOUS REGRESSIONS Masao Ogaki Department of Economics, Ohio State University and Chi-Young Choi Λ Department of Economics, University of New Hampshire Ohio State University Department of Economics Working Ppaer #01-13 September 27, 2001 Keywords: Conditional Gauss-Markov Theorem, Spurious regression, Generalized Least Squares JEL Classification: C10, C15 Λ , Ogaki: Choi: We thank seminar participants at Northwestern, Ohio State, Tokyo Universities and the 2001 Far Eastern Econometric Society Meeting for helpful comments. Special thanks go to In Choi, Fumio Hayashi, Donggyu Sul, and Sam Yoo. All remaining errors are ours.

2 Abstract The purpose of this paper is to study the exact finite sample properties of estimators and test statistics for regression coefficients of spurious regressions with unit root nonstationary variables. The conditional probability version of the Gauss-Markov theorem is used to find efficient estimators. Then with an additional assumption that the error is normally distributed conditional on the regressors, we show that the usual test statistics have the usual unconditional distributions when the efficient estimator is used. 1 Introduction Monte Carlo simulations have often been used to show that the spurious regression phenomenon occurs with regressions involving unit root nonstationary variables (see, e.g., Granger and Newbold (1974), Nelson and Kang (1981, 1983)). Asymptotic properties of estimators and test statistics for regression coefficients of these spurious regressions have been studied by Phillips (1986, 1998) and Durlauf and Phillips (1988) among others. However, no extensive studies of the exact finite sample properties based on the Gauss Markov-Theorem have been conducted in the literature of unit root nonstationarity. 1 The purpose of this paper is to study the exact finite sample properties of estimators and test statistics for regression coefficients of spurious regressions with unit root nonstationary variables. The conditional probability version of the Gauss-Markov theorem is used to find efficient estimators under the assumption that the error is strictly exogenous. Then with an additional assumption that the error is normally distributed conditional on the regressors, we show that the usual test statistics have the usual unconditional distributions when the efficient estimator is used. For the classical linear regression model with nonstochastic regressors, the Gauss- Markov theorem is used to show conditions under which the Ordinary Least Squares (OLS) and Generalized Least Squares (GLS) estimators are the Best Linear Unbiased Estimator (BLUE). Then with an additional assumption of normality, the exact small sample properties are studied. Even though the assumption of nonstochastic 1 A companion paper, Ogaki and Choi (2001), studies exact small sample properties for cointegrating regressions. 1

3 regressors and the normality assumption are too stringent inmany applications, these results are useful for many purposes. For example, Monte Carlo simulations are useful for studying small sample properties, but they obviously cannot prove that the GLS estimator is BLUE under any set of conditions. The result that the t-ratio follows a t-distribution provides a useful comparison between the true finite sample distribution and the asymptotic standard normal distribution which is obtained under general conditions for stationary variables. The comparison tells us how close the true critical value and the nominal critical value are in the case of the normally distributed error term. For spurious regressions with two independent random walks, we show that the only assumption of the conditional Gauss-Markov theorem they violate is the assumption about the error covariance. Thus the OLS estimator is unbiased, but is not as efficient as the GLS estimator. The spurious regression problem of rejecting the true null hypothesis too frequently with the t-test can be solved by applying GLS estimation when the error is normally distributed. Spurious regressions which do not satisfy the strict exogeneity assumption will be analyzed with a method similar to those used by Phillips and Loretan (1991), Saikkonen (1991), and Stock and Watson (1993) for cointegrating regressions. We consider a new regression with leads and lags of first differences of the regressors. Assuming that the strict exogeneity assumption is satisfied by the new regression, we apply the conditional version of the Gauss-Markov theorem. As in the case for two independent random walks, the OLS estimator is unbiased, but is not as efficient as the GLS estimator. The spurious regression problem of rejecting the true null hypothesis too frequently with the t-test can be solved by applying GLS estimation when the error is normally distributed. 2 Stochastic Regressors This section derives results for OLS estimators and test statistics when regressors are stochastic. These results can be applied to any regression with stochastic regressors when certain assumptions are satisfied. We will apply them to spurious 2

4 regressions. In Section 2.1, we will formally state the conditional Gauss-Markov theorem. The theorem is well known, but can be stated in different versions depending on how we define conditional expectation. The textbook version of the theorem (see, e.g., Greene, 1993) uses the standard measure theory definition of conditional expectation. It turns out that this version is not convenient for our purpose, thus we use a definition of conditional expectation based on the conditional probability. In Section 2.2, we will derive distributions of some usual test statistics with an additional assumption that the error is normally distributed conditional on the regressors. 2.1 Definitions of Conditional Expectation Intuitively, the conditional Gauss-Markov theorem is obtained by stating all assumptions and results of the Gauss-Markov theorem conditional on the stochastic regressors. Formally, it is necessary to make sure that the conditional expectations of the relevant variables are well defined. Let S be a probability space, F be a ff-field of S, andpr be a probability measure defined on F. The random variables we will consider in this section are defined on this probability space. Let X = (X 1 ;X 2 ;:::;X T ) 0 be a T K matrix of random variables, which will be the regressor matrix of the regression to be considered. Let y = (y 1 ;y 2 ;:::;y T ) and e = (e 1 ;e 2 ;:::;e T ) be T 1 vectors of random variables. We are concerned with a linear model of the form: y = Xb 0 + e, where b 0 is a K 1 vector of real numbers. For s such that X(s) 0 X(s) is nonsingular, the OLS estimator is b T =(X 0 X) 1 X 0 y: (1) In order to apply a conditional version of the Gauss-Markov Theorem, it is necessary to define the expectation and variance of b T conditional on X. Let Z be an integrable random variable (namely, E(jZj) < 1), and ff(x) be the smallest ff-field with respect to which the random variables in X are measurable. The standard definition of the expectation of Z given X is obtained by applying the Radon-Nikodym theorem (see, e.g., Billingsley (1986)). Throughout this paper, we 3

5 use the notation E[Zjff(X)] to denote the usual conditional expectation of Z conditional on X as defined by Billingsley (1986) for a random variable Z. 2 E[Zjff(X)] is a random variable, and E[Zjff(X)] s denotes the value of the random variable at s in S. It satisfies the following two properties: (i) E(Zjff(X)) is measurable ff(x) and integrable. (ii) E(Zjff(X)) satisfies the functional equation Z G E(Zjff(X))dPr = Z G Z dpr, G 2 ff(x): (2) There will in general be many such random variables which satisfy these two properties; any one of them is called a version of E(Zjff(X)). Any two versions are equal with probability 1. It should be noted that this definition is given under the condition that Z is integrable, namely E(jZj) < 1. This condition is too restrictive when we define the conditional expectation and variance of the OLS estimator in many applications 3 because the moments of (X 0 X) 1 may not be finite even when X has many finite moments. For this reason, it is difficult to confirm that E(b T jff(x)) can be defined in each application even if X is normally distributed. Thus, Judge et al. (1985) conclude that the Gauss-Markov theorem based on E( jff(x)) is not very useful. We avoid this problem by adopting a different definition of the conditional expectation based on the conditional distribution. For this purpose, we first define conditional probabilities following Billingsley (1986). Given A in F, define a finite measure v on ff(x) by v(g) = Pr(A G) for G in ff(x). Then Pr(G) =0implies that v(g) = 0. The Radon-Nikodym theorem can be applied to the measures v and Pr, and there exists a random variable f, measurable ff(x) and integrable with respect to Pr, suchthatpr(a G) = R G fdpr for all G in ff(x). Denote this random variable 2 If z is a vector, the conditional expectation is taken for each element inz. 3 Loeve (1978) slightly relaxes this restriction by defining the conditional expectation for any random variable whose expectation exists (but may not be finite) with an extension of the Radon- Nikodym theorem. This definition can be used for E( jff(x)), but this slight relaxation does not solve our problem. 4

6 by Pr(Ajff(G)). This random variable satisfies these two properties: (i) Pr(Ajff(X)) is measurable ff(x) and integrable. (ii) Pr(Ajff(X)) satisfies the functional equation Z G Pr(Ajff(X))dPr =Pr(A G); G 2 ff(x): (3) There will in general be many such random variables, but any two of them are equal with probability 1. A specific such random variable is called a version of the conditional probability. Given a random variable Z, whichmay not be integrable, we define a conditional distribution μ( ;s) given X for each s in S. Let R 1 be the ff-field of the Borel sets in R 1. By Theorem 33.3 in Billingsley (1986, p.460), there exists a function μ(h; s), defined for H in R 1 and s in S, with these two properties: (i) For each s in S; μ(h; s) is, as a function of H, a probability measure on R 1. (ii) For each H in R 1 ; μ(h; s) is, as a function of s; a version of Pr(Z 2 Hjff(X)) s. For each s in S, we define E(ZjX) s to be R R1 zμ(dz; s). It should be noted that E(ZjX) s does not necessarily satisfy the usual properties of conditional expectation such as the law of iterated expectations. In general, E(ZjX) s may not even exist for some s. If R R 1 jzjμ(dz; s) is finite, then, E(ZjX) s is said to exist and be finite. Given a T K matrix of real numbers x; E(ZjX) s is identical for all s in X 1 (x). Therefore, we define E(ZjX = x) as E(ZjX) s for s in X 1 (x). This is the definition of conditional expectation of Z given X = x in this paper. 5

7 2.2 The Conditional Gauss-Markov Theorem We are concerned with a linear model of the form: Assumption (A1): y = Xb 0 + e where b 0 is a K 1 vector of real numbers. Given a T K matrix of real numbers x, we assume that the conditional expectation of e given X = x is zero: Assumption (A2): E[ejX = x] = 0: Next, we assume that e is homoskedastic and e t is not serially correlated given X = x: Assumption (A3): E[ee 0 jx = x)] = ff 2 I T : The OLS estimator can be expressed by (2.1.1) for all s in X 1 (x) when the next assumption is satisfied: Assumption (A4): x 0 x is nonsingular. Under Assumptions (A1) (A4), E[b T jx = x] = b 0 and E[(b T b 0 ) 0 (b T b 0 )jx = x] = ff 2 (x 0 x) 1. The conditional version of the Best Linear Unbiased Estimator (BLUE) given X = x can be defined as follows: An estimator b T for b 0 is BLUE conditional on X = x if (1) b T is linear conditional on X = x, namely, b T canbewritten as b T = Ay for all s in X 1 (x) where A is a K T matrix of real numbers; (2) b T is unbiased conditional on X = x, namely, E(b T jx = x) =b;(3)forany linear unbiased estimator b Λ conditional on X = x, E[(b T b 0 )(b T b 0 ) 0 jx = x]» E[(b Λ b 0 )(b Λ b 0 ) 0 jx = x], namely, E[(b Λ b 0 )(b Λ b 0 ) 0 jx(s) =x] E[(b T b 0 )(b T b 0 ) 0 jx(s) =x] is apositive semidefinite matrix. With these preparations, the following theorem can be stated: The Conditional Gauss-Markov Theorem: Under Assumptions (A1) (A4), the OLS estimator is BLUE conditional on X = x. 6

8 Applying any of the standard proofs of the (unconditional) Gauss-Markov theorem can prove this theorem by replacing the unconditional expectation with E( jx = x). Modifying some assumptions and adding another yields the textbook version of the conditional Gauss-Markov theorem based on E( jff(x)). Assumption (A2 0 ):E[ejff(X)] = 0: Since E[ejff(X)] is defined only when each element of e is integrable, Assumption (A2 0 ) implicitly assumes that E(e) exists and is finite. It also implies E(e) = 0 because of the law of iterated expectations. Given E(e) = 0, a sufficient condition for (A2 0 ) is that X is statistically independent of e. Because Assumption (A2 0 )does not imply that X is statistically independent of e, Assumption (A2 0 ) is weaker than the assumption of independent stochastic regressors. With the next assumption, we assume that e is conditionally homoskedastic and e t is not serially correlated: Assumption (A3 0 ):E[ee 0 jff(x)] = ff 2 I T : The next assumption replaces Assumption (A4). Assumption (A4 0 ): X 0 X is nonsingular with probability one. From Assumption (A1), b T = b 0 +(X 0 X) 1 X 0 e. Hence we can prove a version of the conditional Gauss-Markov theorem based on E( jff(x)) when the expectations of (X 0 X) 1 X 0 e and (X 0 X) 1 X 0 ee 0 X(X 0 X) 1 exist and are finite. For this purpose, we consider the following assumption: Assumption (A5) : E[trace((X 0 X) 1 X 0 ee 0 X(X 0 X) 1 )] exists and is finite. The problem with Assumption (A5) is that it is not easy to verify the assumption for many distributions of X and e which are often used in applications and Monte Carlo studies. However, a sufficient condition for Assumption (A5) is that the distributions of X and e have finite supports. 7

9 Under Assumptions (A1), (A2 0 ) (A4 0 ), and (A5), E(b T jff(x)) = b 0 +E[(X 0 X ) 1 X 0 ejff(x)] = b 0. Moreover, E[(b T b 0 ) 0 (b T b 0 )jff(x)] can be defined, and E[(b T b 0 ) 0 (b T b 0 )jff(x)] = E[(X 0 X ) 1 X 0 ee 0 X(X 0 X) 1 jff(x)] = (X 0 X ) 1 X 0 E[ee 0 jff(x)]x(x 0 X) 1 = ff 2 (X 0 X) 1. We now consider a different definition of the conditional version of the Best Linear Unbiased Estimator (BLUE). The Best Linear Unbiased Estimator (BLUE) conditional on ff(x) is defined as follows. An estimator b T for b 0 is BLUE conditional on ff(x) in H if (1) b T is linear conditional on ff(x), namely, b T can be written as b T = Ay where A is a K T matrix, and each element of A is measurable ff(x); (2) b T is unbiased conditional on ff(x) ing, equivalently, E(b T jff(x)) = b 0, (3) for any linear unbiased estimator b Λ conditional on ff(x) for which E(b Λ b Λ0 ) exists and is finite, E[(b T b 0 )(b T b 0 ) 0 jff(x)]» E[(b Λ b 0 )(b Λ b 0 ) 0 jff(x)] with probability 1, namely, E[(b Λ b 0 )(b Λ b 0 ) 0 jff(x)] E[(b T b 0 )(b T b 0 ) 0 jff(x)] is a positive semidefinite matrix with probability 1. Proposition 1. Under Assumptions (A1); (A2 0 ) (A4 0 );and(a5), the OLS estimator is BLUE conditional on ff(x). Moreover, it is unconditionally unbiased and has the minimum unconditional covariance matrix among all linear unbiased estimators conditional on ff(x). Proof: The proof of this proposition is given in Greene (1993, chapter 6.4) In this proposition, the covariance matrix of b T is ff 2 E[(X 0 X ) 1 ], which is different from ff 2 [E(X 0 X )] 1. This may seem to contradict the standard asymptotic theory, but it does not. Asymptotically, (1=T )X 0 X converges almost surely to E[X 0 t X t] if X t is stationary and ergodic. Hence the limit of the covariance matrix of p T (b T b 0 );ff 2 E[f(1=T )(X 0 X )g 1 ], is equal to the asymptotic covariance matrix, ff 2 [E(X 0 t X t)] Unconditional Distributions of Test Statistics In order to study the distributions of the t ratios and F test statistics, we need an additional assumption: 8

10 Assumption (A6): Conditional on X; e follows a multivariate normal distribution. Given a 1 K vector of real numbers R, consider a random variable N R = R(b T b 0 ) ff[r(x 0 X) 1 R] 1=2 (4) and the usual t ratio for Rb 0 t R = R(b T b 0 ) : (5) ^ff[r(x 0 X) 1 1=2 R] Here ^ff is the positive square root of ^ff 2 =(y Xb T ) 0 (y Xb T )=(T K). With the standard argument, N R and t R can be shown to follow the standard normal distribution and Student's t distribution with T K degrees of freedom with appropriate conditioning, respectively, under either Assumptions (A1) (A6) or Assumptions (A1); (A2 0 ); (A3 0 ), and (A5) (A6). The following proposition is useful in order to derive the unconditional distributions of these statistics. Proposition 2: If the probability density function of a random variable Z conditional on a random vector Q does not depend on the values of Q, then the marginal probability density function of Z is equal to the probability density function of Z conditional on Q. This proposition is obtained by integrating the probability density function conditional on Q over all possible values of the random variables in Q. Because N R and t R follow a standard normal distribution and a t distribution conditional on X, respectively, Proposition 2 implies the following proposition: Proposition 3: Suppose that Assumptions (A1), (A5);and(A6) are satisfied and that Assumptions (A2) and (A3) are satisfied for all x in a set H such thatpr(x 1 (H)) = 1. Then N R is a standard normal random variable and t R is a t random variable with T K degrees of freedom. 9

11 Alternatively, the assumptions for Proposition 1 with Assumption (A6) can be used to obtain a similar result: Proposition 3 0 : Suppose that Assumptions (A1), (A2 0 ) (A3 0 ), (A5), and (A6) are satisfied for s and that Assumptions (A2) and (A3) are satisfied for all x in a set H such that Pr(X 1 (H))=1. Then N R is a standard normal random variable and t R is a t random variable with T K degrees of freedom. Similarly, the usual F test statistics also follow (unconditional) F distributions. These results are sometimes not well understood by econometricians. For example, a standard textbook, Judge et al. (1985, p.164), states that our usual test statistics do not hold in finite samples" on the ground that the (unconditional) distribution of b 0 T s is not normal. It is true that b T is a nonlinear function of X and e, soitdoesnot follow a normal distribution even if X and e are both normally distributed. However, the usual t and F test statistics have the usual (unconditional) distributions as a result of Proposition 2. 3 Spurious Regressions The Conditional Gauss Markov theorem and Proposition 3 in Section 2 provide us with the tools to understand the exact small sample properties of the spurious regression of a random walk onto an independent random walk as discussed in the Introduction. The Gauss Markov theorem indicates a simple GLS solution to the problem. The asymptotic theories of Phillips (1987; 1998) have been used to understand the spurious regression problem, but have not been used to provide the GLS solution to the problem. 10

12 3.1 Spurious Regression with Two Random Walks Let Y t be a random walk which is generated from Y t = ffl t (6) with the initial condition that Y 0 =0 4 and ffl t is an i.i.d. with E(ffl t )=0andE(ffl 2 t )= ff 2. Let X t be another random walk which is generated from X t = v t (7) with an initial random variable X 0, where v t is i.i.d. with E(v t )=0. We assume that fffl t g T t=1 are independent offv t g T t=1 and X 0,sothatX t and Y t are independent random walks. The regressor matrix X is given by X = (X 1 ;X 2 ; ;X T ) 0. Let y = (Y t ) T t=1 and e = (e t ) T t=1, and consider the OLS estimator for y = Xb 0 + e. Then the true value of the regression coefficient is zero: b 0 =0. Assumptions (A1) and (A2) hold for the spurious regression. However, Assumption (A3) is violated because E(ee 0 jx) =ff 2 Φ (8) where ff 2 = E(ffl 2 t ), and Φ= ::: : : : : : : : : : : : : : : : ::: T 1 T ::: T 1 T 1 C A (9) Thus the spurious regression violates Assumption (A3). However, when a realization 4 This assumption is innocuous. If the original series does not satisfy this condition, then Y 0 can be subtracted from the series. 11

13 of X satisfies Assumption (A4), all the other assumptions are satisfied. Therefore, the OLS estimator is still unbiased conditional on X = x. One can apply a GLS correction and obtain a more efficient estimator. If we assume that e t is normally distributed, Assumption (A6) is satisfied. In this case, X'X= ± T X 2 t=1 t, thus Assumption (A4 0 ) is satisfied because the probability that X 1 = X 2 = ::: = X T = 0 is zero. Hence Proposition 3 can be applied to test statistics formed by the GLS estimator. Then by applying GLS to the spurious regression, we can solve the spurious regression problem. We can obtain the exact (unconditional) t distribution for the usual t statistic. 3.2 Spurious Regressions with Time Trends Another type of spurious regression is the regression of a random walk onto a polynomial of a time trend. In this case, the regressors are nonstochastic; hence the standard Gauss-Markov theorem applies. The covariance matrix of the error term has the form (9). Thus the GLS estimator is more efficient than the OLS estimator. As a consequence, the spurious regression problem can be solved by applying GLS. 3.3 The Case with Endogeneity In this section, we consider a data generation process with particular short-run and long-run properties. We show that the data generation process leads to a spurious regression without the strict exogeneity assumption. Then we study the exact small sample properties of spurious regression estimators. Let e t and Z t be two time series of dimensions 1 and k, respectively, that are generated from e t = ffl t ; t =1; 2; 3; (10) Z t = c + v t ; t = q 1; q; ; 1; 0; 1 (11) where (ffl t ; vt) 0 0 is a covariance stationary series, c is a k-dimensional vector of real numbers, e 0 = 0, and Z q is a given random vector. We assume that the long-run 12

14 covariance matrix of v t, is nonsingular. Ψ= lim j!1 jx j E(v t v 0 t j); (12) We assume that Z t is strictly exogenous in the time series sense with respect to e t : that is, E(e t jz q 1 ; Z q ; ; Z t 2 ; Z t 1 ; Z t ; Z t+1 ; Z t+2 ; )=0. Similarly, Z t is said to be strictly exogenous with respect to e t if e t is independentoffz q 1 ; Z q ; ; Z t 2 ; Z t 1 ; Z t ; Z t+1 ; Z t+2 ; g. Given E(f t ) = 0, if e t is strictly exogenous, then e t is strictly exogenous in the time series sense, but not vice versa. Thus the strict exogeneity in the time series sense is a weaker concept, which we employ in this paper. We call this the strict exogeneity assumption throughout this paper. An implication of the strict exogeneity assumption is that e t and Z t are not stochastically cointegrated (see Ogaki and Park (1998) for the definition of stochastic cointegration): that is, there is no nonzero vector fi such that e t fi 0 Z t stationary. This is because the assumption implies that lim j!1 Consider a series y t that is generated from jx j is trend E(ffl t v 0 t j) =0: (13) y t = 0 0 d t + ff 0 0 Z t + fl(l 1 ) Z t + (L) Z t + e t (t=1,2,3,...), (14) where fl(l 1 )=fl 1 L 1 +fl 2 L 2 +:::+fl p L p, (L) = L+ 2 L 2 +:::+ q L q, and d t is a vector of deterministic variables. Here fl 1 ;:::;fl p ; 0 ;:::; q are k K matrices, and we assume that at least one of them is nonzero. For example, d t = (1;t) 0 or d t =1. Under these assumptions, consider a regression of y t onto d t and Z t : y t = Λ 00 dt + ff Λ 00 Zt + f t: (15) This regression is a spurious regression: that is, for any vector ff Λ 0, f t is unit root nonstationary. To see this, assume that f t is stationary for a vector ff Λ 0. Then 13

15 (14) implies that e t (ff Λ 0 ff 0 ) 0 Z t is trend stationary. This implies a stochastic cointegrating relationship between e t and Z t, and contradicts the strict exogeneity assumption. Given that e t in (14) satisfies the exogeneity condition, ff 0 can be considered the true value of the spurious regression coefficient ff Λ 0 in (15). With this interpretation, one problem with (15) is that the strict exogeneity assumption is violated. Let X be a matrix whose t-th row isgiven by (d 0 ; Z0 ; Z0 ; Z0 ;:::; Z0 ; t t t+p t+p 1 t Z 0 ;:::; Z0 t 1 t q), y =(Y t ) T t=1, and e =(e t ) T t=1. When E(e Λ e Λ0 jx) =ff 2 Ψ (16) with a known matrix Ψ and a possibly unknown number ff, then the GLS can be applied to (14). For example, if e t is a random walk, then Φ is given by (9) and ff 2 = E(ffl 2 t ). Just as in Section 3.1, the finite sample properties of the GLS estimators and test statistics based on GLS can be analyzed with the results in Section 2. 4 Applications This section explains possible applications of the GLS correction for spurious regressions proposed in the previous section. We consider a simplified version of Cooley and Ogaki's (1996) model of consumption and leisure as an illustration. In the model, the representative household maximizes U = E 0 [ 1X t=0 fi t u(t)]; (17) where E t denotes the expectation conditional on the information available at t. Consider a simple intra-period utility function of the form u(t) = C(t)1 ff 1 1 ff + v(l(t)) (18) where v( ) is a continuously differentiable concave function, C(t) is nondurable consumption, and l(t) is leisure. 14

16 The usual first order condition that equates the real wage rate with the marginal rate of substitution between leisure and consumption is: W (t) = v0 (l(t)) C(t) ff (19) where W (t) is the real wage rate. We assume that the stochastic process of leisure is strictly stationary in the equilibrium as in Eichenbaum, Hansen, and Singleton (1988). Then an implication of the first order condition is that ln(w (t)) ff ln(c(t)) = ln(v 0 (l(t)) is stationary. When we assume that the log of consumption is difference stationary, this implies that the log of the real wage rate and the log of consumption are cointegrated with a cointegrating vector (1; ff) 0. Now assume that ln(w (t)) and ln(c(t)) are measured with errors. Imagine that the ln(c(t)) is measured with a stationary measurement error, ο(t), and that ln(w (t)) is measured with a difference stationary measurement error, ffl(t) (perhaps because of the difficulty in measuring fringe benefits). Assume that ffl(t) is independent of ln(c(t)) and ο(t) at all leads and lags. Consider a regression ln(w m (t)) = a + ff ln(c m (t)) + e(t): (20) where W m (t) is the measured real wage rate, C m (t) is measured consumption, and e(t) = ffl(t)+ffο(t)+ln(v 0 (l(t))) a. If ffl(t) is stationary, then e(t) is stationary, and Regression (20) is a cointegrating regression. If ffl(t) is unit root nonstationary, then Regression(20) is a spurious regression because e(t) is nonstationary in this case. Hence the standard methods for cointegrating regressions cannot be used. The preference parameter ff can still be estimated by the GLS correction method. Suppose that ln(c m (t)) is strictly exogenous with respect to e(t). In this case, the GLS correction is very similar to taking the first difference of both variables. However, it is not realistic to assume that consumption is strictly exogenous with respect to the marginal utility of leisure. Under a more realistic assumption that the strict exogeneity assumption is violated, one can add leads and lags of the regressor to correct for endogeneity. First differencing does not work in this case, but the GLS correction can still be used to estimate ff. If e(t) is a random walk, 15

17 then the GLS correction formula given in the previous section can be used. In the more realistic case of difference stationary e(t) with unknown serial correction, then feasible GLS must be used. For this case, asymptotic theory needs to be developed for feasible GLS, but it is beyond the scope of this paper. 5 Monte Carlo Simulations In order to give quantitative illustrations of our main results, we carry out two different types of Monte Carlo experiments. The first one is concerned with conventional spurious regressions which satisfy exogeneous regressors and the second one for spurious regressions with endogeneous regressors. More extensive Monte Carlo results are presented in Choi and Ogaki (2001). In the conventional spurious regression with exogeneous regressors, two Data Generating Processes (DGP) are considered depending on the inclusion of the time trend in the regressor. Model 1 is a spurious regression of a random walk onto an independent random walk and model 2 is a spurious regression of a random walk onto a constant and a time trend. With data generated from the corresponding DGPs, we estimate the following two spurious regression models. y t = X t+1 fi + e t for Model 1 (21) X t = X t 1 + ν t (22) e t = e t 1 + ffl t ; e 0 = 0 (23) y t = μ t + e t for Model 2 (24) e t = e t 1 + ν t ; e 0 =0 (25) where ffl t is i:i:d: with N(0;ffffl 2 ). Here X t is a random walk and ν t is i:i:d: and independent of ffl t. For the distribution of ν t, we have used the standard normal distribution and the t-distributions with 3, 4, 5, 6, 7, 8, and 9 degrees of freedom. For each series generated, y 0, the initial value of y, is taken to be zero. From the nature of the spurious regressions, the true value of both fi and 0 should be zero. 16

18 In the second simulation, data are generated from the following DGP into which the endogeneity is incorporated by adding leads and lags of the first differences of the regressor to the regression as explained in Section 3.3. y t = fl 1 X t+1 + fl 2 X t 1 + fl 3 X t 2 + e t (26) where X t and the error term, e t, are generated from random walks of X t = X t 1 + ν t, e t = e t 1 + ffl t. with ν it ο N(0,1) or ν it ο t n,andffl t ο N(0,1). For the distribution of ν t, we have used the standard normal distribution and the t-distributions with 3, 4, 5, 6, 7, 8, 9, and 10 degrees of freedom. For each series generated, the initial value of y, y 0, is taken to be zero. Here we use 0.3,-0.5, 0.1 for fl 1, fl 2 and fl 3, respectively, in equation (22). With the generated data, the following regression is estimated in the framework of a spurious regression. y t = fl 1 X t+1 + fix t + fl 2 X t 1 + fl 3 X t 2 + e t (27) We consider the sample size of T = 25 with 20,000 replications. In each case, we report (i) the standard deviation of the OLS and GLS estimators, (ii) kurtosis, (iii) QQ-plot, and (iv) t-ratios. The simulation results of the spurious regression are summarized in Tables 1 and 2 where the OLS and GLS estimators are evaluated. Table 1 reports the small sample properties of least squares estimators in the spurious regression under Model 1 and Model 2, while Table 2 shows the corresponding results of a spurious regression in the presence of endogenous regressors. The true values of some of the statistics reported in the tables are known from the theory in Section 3 as described below. Any deviation from the true value is caused by themonte Calro simulation error. These results are reported here to show the magnitude of the error. In all these cases, the deviation is negligible. From the theory in Section 3, both the OLS and the GLS estimators are unbiased. For Model 1 in Table 1, the GLS estimator is much more efficient than the OLS estimator on the basis of standard deviations. The efficiency gain from the GLS 17

19 correction is much smaller for Model 2. From the theory, the OLS and GLS estimators are not normally distributed for Model 1, while they are normally distributed for Model 2. In each column for Model 1 in Table 1, the OLS estimator departs considerably from the normal distribution in terms of kurtosis and QQ-plots. In contrast, the GLS estimator is quite close to the normal distribution especially when the regressor is generated from the normal distribution or the t-distribution with high degrees of freedom. The kurtosis of the OLS estimator is much larger than 3, indicating that it has fat tails in the distribution. This can also be seen in the QQ-plot. On the other hand, the GLS estimator's kurtosis is close to 3. Thus we can see that the nonnormality of the OLS estimator is mitigated when the GLS transformation is applied to the data. In the last panel of Table 1, we report the rejection frequency for t-ratios of OLS and GLS estimators. The row labeled GLS(T) reports the results when the true value of ffffl 2 is used to calculate the t-ratio. The row labeled GLS(E) reports the results when the estimated value of ffffl 2 is used. The rows labeled t-5% and t-10% use the t-distribution critical values for the 5% and 10% significance level, respectively. The rows labeled n-5% and n-10% use the standard normal distribution critical values. From the theory, the t-ratios of GLS(T) follow the exact standard normal distribution and those of GLS(E) follow theexact t-distribution. Hence the true values of the n- 5% and n-10% rows for GLS(T) are 0.05 and 0.10, respectively. Similarly, those of the t-5% and t-10% rows for GLS(E) are 0.05 and 0.10, respectively. For all cases, the deviation from the true value is negligible. The t-ratios of the OLS estimator substantially over-reject. The qualitative results for the endogenous regressors in Table 2 are similar to those for Model 1 in Table 1: the GLS estimator is much more efficient than the OLS estimator, the GLS correction mitigates the non-normality of the OLS estimator, and the t-ratios of the OLS estimator substantially over-rejects. 18

20 6 Concluding Remarks This paper studied the exact finite sample properties of spurious regression estimators and test statistics based on them. The first step was to find an efficient estimator by applying a conditional version of the Gauss-Markov Theorem. For this purpose, the textbook version of the theorem with the usual definition of conditional expectations was found to be problematic. The definition requires the OLS estimator to have unconditional finite second moments, and this requirement is too restrictive because it is often very difficult to verify how many moments (X 0 X) 1 has from assumptions on the distribution of X. We used a definition of tje conditional expectation based on the conditional probability to derive the conditional probability version of the Gauss-Markov Theorem. The theorem indicates that the GLS estimator can be used for spurious regressions. The GLS estimator can be applied to the case with endogenous regressors by adding leads and lags of first differences of regressors to the spurious regression. When the error is normally distributed, the usual t statistics based on GLS estimators have the exact t distributions. The GLS correction is basically the same as taking first differences for the case of strictly exogenous regressors. The GLS correction, however, can be useful in applications for which the strict exogeneity assumption is violated. 19

21 References Billingsley, P. (1986): Probability and Measure, Second Edition, NewYork: Wiley. Choi, C.Y. and M. Ogaki (2001): A GLS Solution to the Spurious Regression Problem," manuscript in progress, Ohio State University. Cooley, T.F. and M. Ogaki (1996): A Time Series Analysis of Real Wages, Consumption, and Asset Returns," Journal of Applied Econometrics, 11; Durlauf, S. N. and P. C. B. Phillips (1988): Trends versus Random Walks in Time Series Analysis," Econometrica, 56; Eichenbaum, M., L.P. Hansen, and K.J. Singleton (1988): A Time Series Analysis of Representative Agent Models of Consumption and Leisure Choice under Uncertainty," Quarterly Journal of Economics, 103; Engle, R. F. and W. J. Granger (1987): Co-Integration and Error Correction: Representation, Estimation, and Testing," Econometrica, 55; Engel, R. F., D. F. Hendry, and J. Richard (1983): Exogeneity," Econometrica, 51; Granger, C.W.J. and P. Newbold (1974): Spurious Regressions in Econometrics," Journal of Econometrics, 74; Greene, W. H. (1993): Econometric Analysis, Second Edition, Macmillan. Judge, G.G., W.E. Griffiths, R.C. Hill, H. Lutkepohl, and T. Lee (1985): The Theory and Practice of Econometrics, Second Edition, Wiley: New York. Loeve, M. (1978): Probability Theory II, 4th Edition, NewYork: Springer-Verlag. Nelson, C.R. and H. Kang (1981): Spurious Periodicity in Inappropriately Detrended Time Series," Econometrica 49; (1983): Pitfalls in the Use of Time as an Explanatory Variable in Regression," Journal of Business and Economic Statistics 2; Ogaki,M., and C.Y. Choi (2001): The Gauss-Marcov Theorem and Cointegrating Regressions," manuscript in progress, Ohio State University. 20

22 Ogaki, M. and J.Y. Park (1998): A Cointegration Approach to Estimating Preference Parameters," Journal of Econometrics, 82; Phillips, P.C.B. (1986): Understanding Spurious Regressions in Econometrics," Journal of Econometrics 33; (1998): New Tools for Understanding Spurious Regression," Econometrica 66; Phillips, P.C.B. and S.N. Durlauf (1986): Multiple Time Series Regression with Integrated Processes," Review of Economic Studies 53; Phillips, P.C.B. and M. Loretan (1991): Estimating Long-run Economic Equilibria," Review of Economic Studies 58; Saikkonen, P. (1991): Asymptotically Efficient Estimation of Cointegrating Regression," Econometric Theory, 7; Stock, J. H. and M.W. Watson (1993): A Simple Estimator of Cointegrating Vectors in Higher Order Integrated Systems," Econometrica 61 :

23 Table 1: Spurious Regressions with Exogenous Regressors Model 1 t-3 t-4 t-5 t-6 t-7 t-8 t-9 Normal Model 2 OLS s.d.(b) GLS s.d.(b) OLS Kurtosis GLS Kurtosis % % QQ-Plot 25% OLS 75% % % % % QQ-Plot 25% GLS 75% % % t-5% OLS t-10% n-5% n-10% t-5% GLS (T) t-10% n-5% n-10% t-5% GLS (E) t-10% n-5% n-10% Notes: 1) t n means that the regressors are generated from the t-distribution with n degrees of freedom. 2) s.d.(b) represents the standard deviation of the LS estimators. 3) The QQ-plot is obtained from Pr(b < c), where b is estimator and c is critical value from normal distribution at the level of 5%, 10%, 25%, 50%, 75%, 90% and 95%. 4) GLS (T) means that the test statistic is calculated from the true value of σ ε 2 and GLS (E) means that it is calculated from the estimated value of σ ε 2. 22

24 Table 2: Spurious Regressions with Endogeneity Corrections t-3 t-4 t-5 t-6 t-7 t-8 t-9 t-10 Normal OLS s.d.(b) GLS s.d.(b) OLS Kurtosis GLS Kurtosis % % QQ-plot 25% OLS 50% % % % % % QQ-plot 25% GLS 50% % % % t-5% OLS t-10% n-5% n-10% t-5% GLS (T) t-10% n-5% n-10% t-5% GLS (E) t-10% n-5% n-10% Notes: 1) t n means that the regressors are generated from t-distribution with n degrees of freedom. 2) s.d.(b) represents the standard deviation of the LS estimators. 3) The QQ-plot is obtained from Pr(b < c), where b is estimator and c is critical value from normal distribution at the level of 5%, 10%, 25%, 50%, 75%, 90% and 95%. 4) GLS (T) means that the test statistic is calculated from the true value of σ ε 2 and GLS (E) means that it is calculated from the estimated value of σ ε 2. 23

This chapter reviews properties of regression estimators and test statistics based on

Chapter 12 COINTEGRATING AND SPURIOUS REGRESSIONS This chapter reviews properties of regression estimators and test statistics based on the estimators when the regressors and regressant are difference