Predictive Regressions with Imperfect Predictors

Size: px

Start display at page:

Download "Predictive Regressions with Imperfect Predictors"

Doreen Hodge
5 years ago
Views:

1 Predictive Regressions with Imperfect Predictors Raymond Kan and Shengzhe Tang JEL classification: C1; C13 Keywords: Predictive regression; Stochastic regressor; Model misspecification. Current version: February 17 Kan and Tang are from the University of Toronto. Corresponding author: Raymond Kan, Joseph L. Rotman School of Management, University of Toronto, 15 St. George Street, Toronto, Ontario, Canada M5S 3E6; We thank Peter Christoffersen, Tom McCurdy, Chayawat Ornthanalai, Mikhail Simutin and Liyan Yang for their valuable comments. Kan gratefully acknowledges financial support from the Social Sciences and Humanities Research Council of Canada. Tang gratefully acknowledges the financial support from the James C. Hickman Scholar program at the Society of Actuaries.

2 ABSTRACT Standard predictive regression assumes the conditional expected return of a financial asset is linear in the predictive variable. In most real world situations, this is a rather unreasonable assumption. In this paper, we consider a predictive regression with an imperfect predictor in the sense that the predictor does not fully explain the conditional expected return. Both asymptotic and finite sample analyses of this predictive regression are provided. Our analyses show that imperfect predictor has an important impact on the inference of predictive regressions. Compared with the case of a perfect predictor, the estimated slope coefficient in the predictive regression exhibits a larger bias and higher volatility in the presence of an imperfect predictor. By assuming the predictor is perfect, one can draw erroneous inference even after correcting for finite sample bias of the predictive regression. In empirical results, we find that it is important to allow for potential misspecification in the predictor. Accounting for model misspecification, we find that many of the popular predictors do not have statistically significant predictability on market returns. JEL classification: C1; C13. Keywords: Predictive regression; Stochastic regressor; Model misspecification.

3 Stock return predictability has long been a major research subject in financial economics. From a theoretical viewpoint, predictable stock returns have important implications for asset pricing and portfolio management. A substantial empirical literature has identified numerous variables that appear to have some forecasting ability for future stock returns, lending support to the assumption of predictable stock returns in theory. 1 However, the reliability of statistical inferences from these predictive regressions has often been called into questions when the candidate predictors are highly persistent. This paper studies the question of making inferences about return predictability based on stochastic regressors. In particular, we lay out a predictive model framework which affords a comprehensive analysis of finite sample properties of predictive regressions, allowing for true predictors, imperfect predictors as well as useless predictors. Understanding these finite sample behaviours is important. The highly persistent stochastic regressors in these regressions, together with their dependence structure in relation to that of the regressand, substantively alter the standard results from the classical regression inference. Under a general variance-covariance structure of the returns and predictor, a complete analysis is conducted of the finite sample distribution and moments of the slope coefficient estimate, its t-ratio, and the coefficient of determination in these predictive regressions. An important issue when making inferences about return predictability in finite samples is spurious regression bias, a phenomenon in financial economics first studied by Ferson, Sarkissian, and Simin (3) (FSS hereafter). Specifically, they find that standard inference tools can lead to erroneous inferences when the predictor is actually totally independent of the return. Under the model setting of FSS, a necessary condition for the spurious regression bias to be important is the high persistence of the conditional expected return. Although conditional expected return is unobservable, its level of persistence can be inferred from the return data. Calibration suggests that the conditional expected return can be weakly autocorrelated, i.e., not as persistent as one may think. Therefore, under the setting of FSS, the deviation from the standard inferences is not too serious. However, with sizable contemporaneous correlation between the candidate predictor and the return, there is substantial departure from standard inferences even when the conditional expected return is not persistent, as long as the predictor exhibits high persistence. This new finding is empirically relevant because many popular price-scaled variables such as dividend-price ratio, 1 See Welch and Goyal (8) for a comprehensive study of empirical evidence of stock return predictability. Bollerslev, Tauchen, and Zhou (8) and Drechsler and Yaron (11) studied variance risk premium. Li, Ng, and Swaminathan (13) studied aggregate implied cost of capital. 1

4 earnings-price ratio and book-to-market ratio have high persistence and a sizable correlation with contemporaneous returns. This paper has two main contributions. First, this paper contributes to a vast literature that studies the sampling properties of predictive regressions. Goetzmann and Jorion (1993), Nelson and Kim (1993) and Stambaugh (1999) study small sample biases due to persistent stochastic regressors. Cavanagh, Elliot, and Stock (1995) and Lanne () develop inference methods in the presence of near unit roots. Torous, Valkanov, and Yan (4) study the effects of uncertainty about the order of integration on inferences drawn in predictive regressions. Phillips (15) provides an excellent overview of recent development in the finite sample and asymptotic inferences for predictive regressions. A maintained assumption in this literature is that the predictor is perfectly correlated with the conditional expected return. In contrast, we lay out a general framework that enables a comprehensive analysis of predictive regressions, allowing for true predictor, imperfect predictors as well as useless predictors. Second, this paper complements the spurious regression bias literature. FSS first point out this problem in financial economics. Deng (14) provides an asymptotic theory for spurious predictive regression when expected returns are nearly unit root. Assuming stationarity for the conditional expected return, we show that even with weakly autocorrelated expected returns, finite sample departure from standard inferences can still be substantial. The remainder of the paper is organized as follows. Section I provides a finite sample analysis of ordinary least squares regression with stochastic regressors. Section II presents a model for stock returns and potentially imperfect predictors, and characterizes the exact distributions of the key regression statistics. In Section III, the predictive model for stock returns is calibrated to real data, followed by a discussion of potential spurious regression bias. Section IV concludes. I. Finite Sample Analysis of Simple Regression with Stochastic Regressors To prepare for our analysis of predictive regressions, we first provide a finite sample analysis of ordinary least squares (OLS) regression with stochastic regressors. Our analysis allows for very general covariance structure between the regressor and regressand, and contains the predictive regression as a special case. Most of the finite sample analysis in this section is

5 not available in the literature, and it is of independent interest on its own. We consider two time series x t and y t, for t = 1,..., T, and run the following linear regression y t = α + βx t + ɛ t, t = 1,..., T. (1) The OLS estimate of the regression slope coefficient is given by ˆβ = T t=1 (x t x)(y t ȳ) T t=1 (x t x), () where x = T t=1 x t/t and ȳ = T t=1 y t/t. For such a regression, we are often interested in computing the sample correlation coefficient between x t and y t, denoted as ˆR = T t=1 (x t x)(y t ȳ) [ T t=1 (x t x) ] 1 [ T t=1 (y, (3) t ȳ) ] 1 and ˆR is often used a measure of goodness-of-fit for the regression. In addition, one is often interested in computing the t-ratio of ˆβ, which is given by where s ( ˆβ) = t( ˆβ) = ˆβ, (4) s( ˆβ) T t=1 (y t ˆβx t ) (T ) T t=1 (x t x). (5) Note that t( ˆβ) and ˆR are monotonic transformations of each other based on the following relations: t( ˆβ) = (T ) 1 ˆR (1 ˆR, (6) ) 1 ˆR = t( ˆβ) [T + t( ˆβ). (7) ] 1 In order to conduct finite sample analysis, we need to make some distributional assumptions on x and y. We assume (x, y ) follows a multivariate normal distribution [ x y ] N ([ µx 1 T µ y 1 T 3 ], [ Σx Σ xy Σ yx Σ y ]), (8)

6 where µ x and µ y are the unconditional mean of x t and y t, and 1 T stands for a T -vector of ones. In our analysis of the finite sample properties of ˆβ, ˆR and t( ˆβ), we only assume the covariance matrix of (x, y ), Σ, is positive definite, and we do not make any additional assumption on the structure of the covariance matrix. However, in many situations, researchers often assume the time series {x t, y t } are jointly covariance stationary. Under such an assumption, Σ will be a symmetric Toeplitz matrix, i.e., elements in each descending diagonal from left to right are constant. As a result, we can define σ x = Var[x t ], σ y = Var[y t ], σ xy = Cov[x t, y t ], and we can also define the population counterpart of ˆβ and ˆR as β = σ xy, (9) σx R = σ xy. (1) σ x σ y It is important to emphasize that we do not make the usual OLS assumption that Var[y x] is proportional to an identity matrix, so the residuals in the OLS regression can have timevarying variance and can also be autocorrelated. Finite sample analysis of OLS regression under this general setting is not available in the literature. For the analysis of the distribution and moments of ˆβ, it is rather straightforward because ˆβ can be written as a ratio of two quadratic forms in normal random variables, and existing results can be easily applied to obtain the exact distribution and moments for ˆβ. For ˆR and t( ˆβ), the only case that we have finite sample result is when (x t, y t ) are independent and identically distributed as bivariate normal, i.e., Σ x = σ xi T, Σ y = σ yi T, and Σ xy = Σ xy = σ xy I T. The finite sample distribution of ˆR for this case was first derived by Fisher (195). In particular, when σ xy =, it is well known that ˆR has a beta distribution with parameters 1/ and (T )/, with E[ ˆR ] = 1/(T 1), and t( ˆβ) follows a t-distribution with T degrees of freedom. 3 This null distribution is often used by researchers to test the null hypothesis H : β =. However, in regressions that involve stochastic regressors, the assumptions of Σ x = σxi T, Σ y = σyi T, and Σ xy = Σ xy = σ xy I T rarely hold, so it is important for us to derive the finite sample distributions of ˆR and t( ˆβ) under a more general assumption of Σ. See Johnson, Kotz, and Balakrishnan (1995, Chapter 3) for a review and summary of this result. 3 When x and y are independent of each other, we do not need to assume multivariate normality for these results to hold. Kariya and Eaton (1977) show that as long as either one of x or y follows a spherical distribution, the null distributions of ˆR and t( ˆβ) remain the same. 4

7 A. Simplification of the Problem In this subsection, we simplify the problem so that we can write various statistics in terms of ratios of quadratic forms in independent standard normal random variables. Let M = I T 1 1 T T 1 T and P be a T (T 1) orthonormal matrix with its columns orthogonal to 1 T such that P P = M. Let u 1 = P x and u = P y. We have u = [ u1 u ] ( [ ]) Σx Σxy N T,, (11) Σy where Σ x = P Σ x P, Σ y = P Σ y P, and Σ xy = P Σ xy P. Using u, we can write T (x t x)(y t ȳ) = u A 1 u, (1) t=1 Σ yx T (y t ȳ) = u A u, (13) t=1 T (x t x) = u Bu, (14) t=1 where [ (T 1) (T 1) I T 1 / A 1 = I T 1 / (T 1) (T 1) [ (T 1) (T 1) (T 1) (T 1) A = (T 1) (T 1) I T 1 [ I T 1 (T 1) (T 1) B = (T 1) (T 1) (T 1) (T 1) ], (15) ], (16) ]. (17) We now consider a transformation of u to ũ, where ] ũ = [ ũ1 ũ N ( T, I T ). (18) Let Q 1 D 1 Q 1 be the spectral decomposition of Σ x, where D 1 is a diagonal matrix of the eigenvalues of Σ x, and Q 1 is a matrix of the corresponding eigenvectors of Σ x. Similarly, 5

8 we let Q D Q be the spectral decomposition of Σy x, where Σ y x Consider the following transformation = Σ y Σ yx Σ 1 x Σ xy. u 1 = Q 1 D 1 1 ũ 1, (19) u = Σ yx Q 1 D 1 1 ũ 1 + Q D 1 ũ, () which implies E[u 1 u 1] = Σ x, E[u u ] = Σ y and E[u 1 u ] = Σ xy as desired. With this transformation, we can then write u A 1 u = ũ Ã 1 ũ, (1) u A u = ũ Ã ũ, () u Bu = ũ Bũ, (3) where Ã 1 = Ã = B = ( 1 D 1 1 Q Σ 1 xy Q 1 D D 1 1 Q Σ ) 1 yx Q 1 D D [ 1 Q 1 Σ xy Σyx Q 1 D D 1 1 Q 1Q D 1 D 1 Q Q 1 D 1 1 (T 1) (T 1) 1 D 1 1 Q 1 Σ xy Q D 1, (4), (5) D 1 Q Σ yx Q 1 D 1 1 D ] D 1 (T 1) (T 1). (6) (T 1) (T 1) (T 1) (T 1) It is then straightforward to verify that ˆβ, ˆR and t( ˆβ) can be written as ˆβ = ũ Ã 1 ũ, ũ Bũ (7) ũ ˆR Ã 1 ũ = (ũ Ã ũ) 1 (ũ Bũ), 1 (8) t( ˆβ) = By defining two ratios of quadratic forms in ũ as (T ) 1 ũ Ã 1 ũ [(ũ Ã ũ)(ũ Bũ) (ũ Ã. (9) 1ũ) ] 1 R 1 = ũ Ã 1 ũ, (3) ũ Bũ 6

9 we can express all three statistics in terms of R 1 and R : R = ũ Ã ũ, (31) ũ Bũ ˆβ = R 1, (3) ˆR = R 1 R 1, (33) t( ˆβ) = (T ) 1 R 1. (34) (R R1) 1 B. Finite Sample Distribution and Moments of ˆβ The exact distribution of ˆβ is relatively easy to obtain as it involves only a single ratio of quadratic forms in ũ. The cumulative distribution function of R 1 is available from Gil-Palaze (1951) and Imhof (1961): P[R 1 < r 1 ] = P[ũ 1 (Ã1 r 1 B)ũ < ] = + 1 Im(φ X1 (t)) dt, (35) π t where Im(x) stands for the imaginary part of x, and φ X1 (t) is the characteristic function of X 1 = ũ (Ã1 r 1 B)ũ, which is given by φ X1 (t) = I T it(ã1 r 1 B) 1, (36) and i = 1. Numerical methods for evaluating this integral are given by Imhof (1961) and Lu and King (). For the density function of R 1, we can use the result from Geary (1944) to show that f R1 (r 1 ) = 1 π ) 1 Re (φ X1 (t)tr((i T it(ã1 r 1 B)) B) dt, (37) where Re(x) stands for the real part of x. Broda and Paolella (9) provide an algorithm for computing this integral using real arithmetic. The moments of ˆβ are also easy to obtain. Using Magnus (1986, Theorem 7), we can easily show that the p-th moment of ˆβ exists if and only if p < T 1. When p < T 1, we 7

10 can use Theorem 6 of Magnus to show that 4 E[R p 1] = 1 Γ(p) t p 1 I T + t B 1 E[(w Hw) p ] dt, (38) where Γ( ) is the gamma function, w N ( T, I T ) and H = L Ã 1 L, with L being a lower triangular matrix such that LL = (I T +t B) 1. In evaluating this integral, we need to compute E[(w Hw) p ]. When p is small, we have explicit expressions for E[(w Hw) p ]. For example, when p = 1, we have E[w Hw] = tr(h), where tr(h) stands for the trace of H. However, when p is large, it is very tedious and computationally expensive to use the explicit formula. Based on an algorithm by Brown (1986), Bao and Kan (13) propose a fast recursive algorithm for computing E[(w Hw) p ]. eigenvalues of H, and define Bao and Kan (13) show that where u i,k is obtained by using the following recursive relation Let λ i, i = 1,..., T be the d k = E[(w Hw) k ], (39) k k! d k = 1 T u i,k, (4) k i=1 u i,k = λ i (u i,k 1 + d k 1 ), i = 1,..., T, (41) with the initial conditions of d = 1 and u i, =. C. Finite Sample Distributions of ˆR, ˆR, and t( ˆβ) Computing the exact distribution of ˆR and t( ˆβ) is challenging because it involves two ratios of quadratic forms of multivariate normal random variables, R 1 and R. Our first task is to obtain the joint cumulative distribution and density functions of (R 1, R ). For the joint cumulative distribution function of (R 1, R ), we can write F R1,R (r 1, r ) = P[R 1 < r 1 ; R < r ] = P[ũ (Ã1 r 1 B)ũ < ; ũ (Ã r B)ũ < ] = P[X 1 < ; X < ], (4) 4 See Stambaugh (1999) for a similar analysis. 8

11 where X 1 = ũ (Ã1 r 1 B)ũ and X = ũ (Ã r B)ũ. Then using Theorem 5 of Shephard (1991), we can show that F R1,R (r 1, r ) = F R 1 (r 1 ) + F R (r ) Re(φ X1,X (s, t) φ X1,X (s, t)) ds dt π st = 1 Im(φ X1 (t) + φ X (t)) dt π t 1 Re(φ X1,X (s, t) φ X1,X (s, t)) ds dt, (43) π st where φ X1 (t) and φ X (t) are the characteristic functions of X 1 and X, respectively, and φ X1,X (s, t) is the joint characteristic function of (X 1, X ). The explicit expression of φ X1 (t) is given in (36) and φ X (t) is similarly defined by replacing Ã1 in (36) by Ã. For the joint characteristic function of X 1 and X, its explicit expression is given by (see, for example, Magnus (1986, Lemma 5)) φ X1,X (s, t) = C 1, (44) where C C(s, t) = (I T is(ã1 r 1 B) it( Ã r B)) 1. (45) It is straightforward to show that Using these identities, we obtain φ X1,X (s, t) = isφ X1,X r (s, t)tr(c B), 1 (46) φ X1,X (s, t) = itφ X1,X r (s, t)tr(c B), (47) tr(c B) = ittr(c r BC B). (48) φ X1,X (s, t) [ = st C 1 tr(c r 1 r B) + tr(c ] BC B). (49) Differentiating F R1,R (r 1, r ) and using (49), we obtain the joint density function of (R 1, R ) as f R1,R (r 1, r ) = 1 π Re(h(s, t) + h(s, t)) ds dt, (5) 9

12 for r r 1 r and r, where h(s, t) = C 1 [ tr(c B) + tr(c ] BC B). (51) In addition, we can make a polar transformation of s = q sin θ and t = q cos θ to obtain f R1,R (r 1, r ) = 1 π π Re(h(q sin θ, q cos θ) + h(q sin θ, q cos θ))q dθ dq, (5) which can speed up the numerical computation of the joint density. In order to obtain the density function of ˆR, we note that since R, we have P[ ˆR < r] = r r r f R1,R (r 1, r ) dr 1 dr for 1 r 1. (53) Differentiating the above expression with respect to r and applying the Leibniz integral rule, we obtain the density function of ˆR as f ˆR(r) = Similarly for the density function of ˆR, we note that P[ ˆR < c] = r f R1,R ( r r, r ) dr for 1 r 1. (54) r c r c f R1,R (r 1, r ) dr 1 dr for c 1. (55) Differentiating the above expression with respect to c and applying the Leibniz integral rule, we obtain the density function of ˆR as f ˆR(c) = 1 c r [f R1,R ( cr, r ) + f R1,R ( cr, r )] dr for c 1. (56) For the density function of t( ˆβ), we note that P[t( ˆβ) < c] = c r T +c r f R1,R (r 1, r ) dr 1 dr. (57) 1

13 Then differentiating the above expression with respect to c, we obtain the density function of t( ˆβ) as f t( ˆβ) (c) = (T ) (T + c ) 3 ( c ) r r f R1,R, r dr. (58) T + c Therefore, all we need is to numerically evaluate a single integral on the joint density function of f R1,R (r 1, r ) to obtain the density functions of ˆR, ˆR, and t( ˆβ). While we can in principle use (53) to compute the cumulative distribution function of ˆR, it is very time consuming because it involves a double integral on f R1,R (r 1, r ). To overcome this problem, we develop a more efficient approach for computing the cumulative distribution function of ˆR. For this purpose, we define G(r 1, r ) = F R 1,R (r 1, r ) r = 1 f R (r ) 1 π Im(g(s, t) + g(s, t)) s ds dt, (59) where g(s, t) = C 1 tr(c B), (6) and this explicit expression is obtained by using (47). 5 Note that using the Leibniz integral rule, we can show that G(r 1, r ) = r1 r f R1,R (u, r ) du = P[R 1 < r 1 R = r ]f R (r ). (6) Using G(r 1, r ), we can obtain the cumulative distribution function of ˆR = R 1 /R 1 using F ˆR(r) = P[R 1 < r r R = r ]f R (r ) dr = G( r r, r ) dr. (63) By making a polar transformation of s = q sin θ and t = q cos θ in (61), we can then write F ˆR(r) = 1 1 π π Im(g(q sin θ, q cos θ) + g(q sin θ, q cos θ)) sin θ dθ dq dr, (64) 5 Using the L Hôpital s rule, we can show that as s, the integrand in the double integral converges to ) Re ( C 1 [tr(c B)tr(C (Ã1 r 1 B)) + tr(c BC (Ã1 r 1 B))], (61) where C = (I T it(ã r B)) 1. 11

14 where the C matrix in g(s, t) is given by C = (I T is(ã1 r r B) it(ã r B)) 1. (65) As t( ˆβ) is a monotonic transformation of ˆR, we can easily obtain the cumulative distribution function of t( ˆβ) using [ P[t( ˆβ) < c] = P ˆR < ] c. (66) T + c Finally, using a similar derivation, we can also obtain the cumulative distribution function of ˆR using P[ ˆR < c] = D. Moments of ˆR and t( ˆβ) [G( r c, r ) G( r c, r )] dr. (67) The moments of ˆR are harder to obtain than the moments of ˆβ. By writing ˆR p = Rp 1 R p = (ũ Ã 1 ũ) p (ũ Ã ũ) p (ũ Bũ), (68) p we can see that E[ ˆR p ] is the moment of ratio of multiple quadratic forms in ũ. Using Theorem 1 of Meng (5) and following a similar proof as in Bao and Kan (13), we can show that E[ ˆR p ] = 1 Γ ( ) p s p 1 t p 1 C 1 E[(w Hw) p ] ds dt, (69) where w N( T, I T ), C = (IT + sã + t B) 1 and H = L Ã 1 L with L being a lower triangular matrix such that L L = C. The numerical evaluation of this integral can be facilitated by making a polar transformation of s = q sin θ, t = q cos θ as well as adopting the fast recursive algorithm discussed in (39) (41). For the moments of t( ˆβ), we can use Proposition 1 of Bao and Kan (13) to show that E[t( ˆβ) p ] exists if and only if p < T. When p < T, we can use (6) to obtain t( ˆβ) ˆR p = (T ) p p (1 ˆR = (T ) p ) p ˆRp ( p ) k= k k! ˆR k. (7) 1

15 It follows that for p < T, we have E[t( ˆβ) p ] = (T ) p ( p ) k E[Rk+p ], (71) k! k= and we can sum this infinite series until it converges. 6 II. Predictive Regression Model for Stock Returns In this section, the finite sample analytical tools derived in the previous section are used to study predictive regression models for stock returns. Consider a situation in which an analyst investigates whether the aggregate stock return is predictable based on a lagged predictor across time. Let y t be the stock return in excess of the risk-free rate in period t, and let x t be the candidate predictor under investigation. Then the time-series regression equation commonly used is a simple univariate one: y t = α + βx t 1 + e t, t = 1,..., T. (7) Suppose the excess return is actually determined by the following equation: y t = µ y + x t 1 + u t, (73) where the return in period t is the sum of the conditional expected return based on some market information set at time t 1, and the unexpected shock u t. We assume that u t is i.i.d. normal with mean zero and variance σ u. The conditional expected return has two parts. The first is the unconditional expected return µ y. The second, denoted by x t 1, is the time-varying part of the conditional expected return, which is assumed to follow a stationary AR(1) process: x t = φ x t 1 + v t, φ < 1, (74) where v t is the innovation term with mean zero and variance σv. The candidate predictor is also assumed to have a stationary AR(1) dynamic. Let µ x = E[x t ], and let x t be the 6 When Σ xy = (T 1) (T 1), we can easily show that E[t( ˆβ) p ] = for odd p, and a faster algorithm is available to directly evaluate E[t( ˆβ) p ] for even p. 13

16 demeaned predictor, i.e., x t = x t µ x. Then, x t = φ x t 1 + w t, φ < 1, (75) where w t is the innovation term with mean zero and variance σw. Let σx = Var[x t ], and σx = Var[x t ]. Then, σx = σ v/(1 φ ), and σx = σw/(1 φ ). It is further assumed that the innovation triplet (u t, v t, w t ) is i.i.d. trivariate normal with contemporaneous covariances σ uv, σ uw and σ vw, and correlation coefficients ρ uv, ρ uw and ρ vw. Hence, the data-generating process (DGP) is characterized by the trivariate equation system (73) (75) with 1 free parameters µ y, σ u, σ x, φ, σ uv, σ x, φ, σ uw, σ vw, µ x. (76) The current setup generalizes the one considered by Stambaugh (1999). In that model, x t is linear in x t, i.e. it is the case of a true predictor with ρ vw = 1 (and therefore ρ uv = ρ uw ). Another nested model is the one with a useless predictor, i.e., ρ vw =. In their simulation-based analysis, FSS address a special case of this nested model by imposing the restriction ρ uw =. That is, their candidate predictor under the null hypothesis is independent of the return process. 7 The generality of the current trivariate model (73) (75) opens the gate to a spectrum of candidate predictors, including imperfect predictors with interim values of correlation as well as the aforementioned polar cases. In order to make finite sample inferences on the predictive regression in (7), it is important to derive the sampling properties of the slope estimate, its standard error and associated t-ratio, and the sample R-squared. In what follows, we focus on the OLS slope estimate ˆβ, its t-ratio t( ˆβ), and the coefficient of determination ˆR based on the OLS estimates. But first, it is helpful to introduce some useful parameterizations for the trivariate model. 7 FSS further restrict ρ uv to be zero in their study. 14

17 A. Some Useful Parameterizations We define a series of quantities as functions of the base parameters in (76). 8 denote the unconditional variance of the stock return. Then, Let σ y σ y = σ x + σ u. (77) Let R = σ x /σ y. Then R, whose value lies between and 1, captures the extent of true predictability of the stock return. Let c k = Cov[y t, x t+k ], for any integer k. Then, c 1 = σ vw 1 φφ, (78) c = σ uw + φc 1. (79) In addition, for any positive integer j, c j = φ j c, (8) c j = φ (j 1) c 1. (81) Let R = c 1 /(σ x σ y ) be the coefficient of correlation between x t 1 and y t. Then R is the population R-squared of the predictive regression in (7). It follows that the population slope and intercept in this regression are respectively given by β = c 1 σ x = Rσ y σ x, (8) α = µ y βµ x. (83) Let ρ k denote the k-th order autocorrelation of the excess return process. Then, ρ 1 = φ σx + σ uv. (84) σy ρ k = φ (k 1) ρ 1. (85) Finally, let ρ yx = c /(σ x σ y ) be the coefficient of correlation between x t and y t. 8 A proof of the expressions in (77) (84) is provided in the Appendix. 15

18 B. Sampling Properties Let x = [x,..., x T 1 ], and y = [y 1,..., y T ]. Recall M = I T 1 T 1 T /T. We then have ˆβ = x My x Mx, (86) x ˆR = My (x Mx )(y My), (87) t( ˆβ) = T ˆR 1 ˆR. (88) Since (x, y ) are jointly normal, the general computational results in Section I on the probability distributions and moments of these statistics can be readily specialized to the current case of predictive regression. Notwithstanding, an important question is what are the parameters that determine the sampling distribution of ˆβ, t( ˆβ), ˆR. proposition gives the answer. Proposition 1. The following (i) Let S 11 be a symmetric Toeplitz matrix with its first row being [1, φ,..., φ T 1 ], let S be a symmetric Toeplitz matrix with its first row being [1, ρ 1,..., ρ 1 φ (T ) ], and let S 1 be a Toeplitz matrix, of which the first column is [R, Rφ,..., Rφ (T 1) ] and the first row is [R, ρ yx,..., ρ yx φ T ]. Let ( x, ỹ ) be distributed as x N T, P S 11 P ỹ P S 1 P P S 1 P, (89) P S P where P is the T (T 1) orthonormal matrix such that P P = M. Then, ( ) σy ˆβ x ỹ = σ x ˆR =, (9) x x x ỹ ( x x)(ỹ ỹ). (91) (ii) ˆβ only depends on T, φ, φ, ρ 1, R, ρ yx and σ y /σ x. ˆR, ˆR and t( ˆβ) only depend on T, φ, φ, ρ 1, R and ρ yx. The bias of ˆβ is often of great interest in making inferences about return predictability. The next proposition provides an analytical expression for computing the mean of ˆβ. 16

19 Proposition. Let H be a T T symmetric matrix with elements h ij = (φ i j )/(1 φ ). Let G be a T T upper triangular Toeplitz matrix with elements g ij = φ j i 1 if j > i, and let F be a T T Toeplitz matrix with elements f ij = φ j i /(1 φ ) if j i and f ij = φ (i j) /(1 φ ) if i > j. Then, where z N ( T, H). [ ] E[ ˆβ] z MF H 1 z = βe z Mz + σ uw E σw [ z MGH 1 z z Mz ], (9) It is then clear from Proposition that when the stock return is regressed on a lagged predictor, the bias of ˆβ only depends on β, φ, φ, σ uw /σ w and T. As examples, two special cases are discussed in the following. Case 1: True predictor. If x t is a true predictor, then it is perfectly correlated with the time-varying conditional mean of the return. That is, x t is a linear transform of x t. Indeed, we have x t = β(x t µ x ). (93) where β is the regression slope as defined in (8). 9 It follows that φ = φ, v t = βw t, σ uv = βσ uw, σ vw = βσw and σx = β σx. With these restrictions, the DGP with a true predictor has 7 free parameters: µ y, σu, σx, φ, σ uw, β, µ x. (94) The number of parameters determining the sampling distributions also decreases. To see this, note that ρ yx = c σ x σ y = Cov[y t, βx t ] βσ x σ y = Cov[y t, y t+1 ] βσ x σ y = ρ 1σ y βσ x σ y = ρ 1 R. (95) The last equality is due to the definition of β in (8). Therefore, we have the following result on the key parameters. Corollary 1. In the true predictor case, ˆβ only depends on T, φ, ρ1, R and σ y /σ x, or alternatively T, φ, ρ 1, R and β. ˆR, ˆR and t( ˆβ) only depend on T, φ, ρ 1 and R. 9 See Appendix for a proof of this result. 17

20 Proposition 3. In the true predictor case, ˆβ β only depends on T, σ u /σ w, ρ uw and φ. As regards the bias of ˆβ, note that F = H, since φ = φ. The next result immediately follows from Proposition. This bias formula corresponds to the one studied by Stambaugh (1999). Corollary. In the true predictor case, the bias of ˆβ is E[ ˆβ] β = σ uw h(φ), (96) σw where [ ] z MGH 1 z h(φ) = E, z N ( z T, H). (97) Mz Case : Useless predictor. If x t is a useless predictor, then it is uncorrelated with the conditional mean of the stock return. That is, σ vw =. It follows that c 1 =, β =, R =, and c = σ uw. The DGP with a useless model has 9 free parameters: µ y, σu, σx, φ, σ uv, σx, φ, σ uw, µ x. (98) With these restrictions, the next two results immediately follow from Propositions 1 and. Corollary 3. In the useless predictor case, ˆβ only depends on T, φ, φ, ρ 1, ρ yx and σ y /σ x. ˆR, ˆR and t( ˆβ) only depend on T, φ, φ, ρ 1 and ρ yx. Corollary 4. In the useless predictor case, the bias of ˆβ is again given by (96). III. Inferences about Return Predictability This section addresses a key issue in making inferences about stock return predictability, that is, testing the stock return predictability based on a given predictor. From the previous section, it is shown that the statistics ˆβ, t( ˆβ), and ˆR in the regression model with a stochastic predictor are parameter-dependent. Hence, this section proceeds with model calibration, followed by a discussion of predictability tests. Finally, some of the popular predictors identified in the existing literature are reviewed. 18

21 A. Model Calibration In practice, the time-varying component of the expected turn x t is unobservable. Therefore, it is difficult to ascertain the exact value of its degree of persistence as measured by its first-order autocorrelation coefficient φ. However, that the return y t follows an ARMA(1,1) process facilitates model calibration using the time-series data on the stock return and candidate predictors. The sampling distributions derived in the previous section only depend on a subset of parameters in the model. Therefore, we focus on estimating these key parameters. We use maximum likelihood (ML) estimation to infer the parameters of the return and predictor processes separately. The only remaining key parameter is σ uw. Noting the relation in (79), we estimate σ uw using sample estimates of c and c 1. The return data that calibrate the model are monthly S&P 5 value-weighted returns, in excess of the 1-month T-bill return. The predictor is the dividend-price ratio (DP) of the S&P5 stock portfolio. 1 The sample period is from January 196 to December 15, a total of 18 months. Table I reports the parameters in the calibrated model. The predictor has a first order autocorrelation of.989, which shows a high degree of persistence. The ML estimates of the autocorrelation of the return and that of the conditional expected return are.933 and., respectively. The calibration results suggest that in contrast to the highly persistent predictor, the unobserved conditional expected returns can be weakly autocorrelated. In fact, the point estimate of the autocorrelation coefficient is virtually zero. Note that these estimates are only dependent on the return data, and they do not depend on whether the predictor is useless or not. With these parameter values, we consider tests of predictability making use of the finite sample results on the predictive regression with a useless predictor. The null hypothesis is that the excess stock return is unpredictable by the candidate predictor under investigation. First, we consider the test in FSS s setting, then consider the test in the more general setting. B. Test in the FSS Setting Due to the restriction under FSS s assumption that σ uw =, the null distribution of t( ˆβ) and ˆR, and ˆβσ x /σ y, i.e., the normalized ˆβ only depend on T, φ, φ and ρ 1 (Corollary 3). Table II summarizes results for t-ratio and R-squared. With φ fixed at.989, this table 1 DP is from Amit Goyal s website ( 19

22 reports the 97.5 percentile of the t-ratio and 95 percentile of the R-squared for different values of ρ 1 and φ. We report results for three sample sizes: T = 66, T = 84 and T = 18 in Panels A, B and C. The first two sample sizes are considered by FSS. The last sample size matches that of the sample used for model calibration. When ρ 1 =, the distributions of t-ratio and R-squared follow the standard result in the classical setting. For example, for a 5% test with T = 66, the t-distribution with T degrees of freedom implies a critical value of.. When ρ 1 =.1, the departure from the standard inferences becomes more serious when φ increases. For T = 84, the critical value of the t-ratio reaches 4.44 when φ =.98. This pattern is consistent with FSS s findings: the spurious regression bias becomes serious with highly persistent expected return when the true R is moderate (1%). When the autocorrelation of the conditional expected return is zero and ρ 1 is less than.1, the departure from the standard cutoff seems small. For T = 18, the critical t-ratio is.15 as compared with the standard value of In other words, the calibration results suggest that with weakly autocorrelated conditional expected return, the concern of spurious regression bias raised by FSS is not a serious issue. Using the standard cutoffs does not lead to serious over-rejection problems when testing the null hypothesis. C. Test in the General Setting We now consider the predictability test in the general setting with contemporaneously correlated return and predictor processes. Corollary 3 shows that T, φ, φ, ρ 1 and ρ yx jointly determine the distribution of t( ˆβ) and ˆR. With ρ 1 and φ fixed at.933 and respectively, Table III reports the 97.5th percentile of the t-ratio, and the 95th percentile of R-squared for different combinations of ρ yx and φ. The extreme values of ρ yx and φ are set to their respective values in the calibrated model. 11 Panels A, B and C give the results for sample sizes T = 66, 84, and 18, respectively. The first row (ρ yx = ) effectively gives the cutoffs in the FSS setting. The parameter pair in the top left cell (ρ yx =, φ = ) implies the usual sampling distribution for the t-ratio and R-squared. They are respectively t-distribution and Beta distribution. When going across the columns in the first row, there is a small departure from theses standard results. Nonetheless, as discussed in Table II, spurious regression bias does not arise to any serious degree, even when regressors are highly persistent. However, when the magnitude of ρ yx is 11 Under the null hypothesis, ρ yx = σ uw /(σ x σ y ) since c = σ uw due to (79).

23 large, there are substantial departures from standard inferences when regressors are highly persistent. For example, for T = 66, when ρ yx =.1691 and φ =.989, the critical t-ratio is This cutoff decreases somewhat as T increases to 18. But it is still quite large (>.5). Similar pattern is observed for R-squared. D. Empirical Results Table IV revisits 15 popular predictors in the existing literature and examine their predictive abilities for the future market excess return. Among these, 13 predictors are from Welch and Goyal (8). Short interest index (SII) is studied by Rapach, Ringgenberg, and Zhou (16), and variance risk premium is studied by Bollerslev et al. (8) and Zhou (9). 1 Specifically, these predictors are defined as: 1. Dividend-price ratio (DP): 1-month moving sum of dividends paid on the S&P 5 index divided by the index level.. Dividend yield (DY): 1-month moving sum of dividends paid on the S&P 5 index divided by the lagged index level. 3. Earnings-price ratio (EP): 1-month moving sum of earnings on the S&P 5 index divided by the index level year-Earnings-price ratio (E1P): 1-year moving average of real annual earnings on the S&P 5 index divided by the real index level. 5. Dividend-payout ratio (DE): 1-month moving sum of dividends dividend by the 1- month moving sum of earnings. 6. Book-to-market ratio (BM): book-to-market value ratio for the Dow Jones Industrial Average. 7. Net equity expansion (NTIS): 1-month moving sum of net issues by NYSE-listed stocks divided by the total end-of-year market capitalization of NYSE stocks. 8. Treasury bill rate (TBL): interest rate on the three-month Treasury bill. 9. Long-term yield (LTY): long-term government bond yield. 1. Long-term return (LTR): return on long-term government bonds. 11. Term spread (TMS): LTY minus TBL. 1. Default yield spread (DFY): difference between Moody s BAA- and AAA-rated corporate bond yields. 1 The data on the first 13 predictors are from Amit Goyal s website ( SII is from David Rapach s website ( VRP is from Hao Zhou s website ( 1

24 13. Default return spread (DFR): long-term corporate bond return minus the long-term government bond return. 14. Short interest index (SII): detrended log aggregate short interest. 15. Variance risk premium (VRP): difference between the risk-neutral and objective expectations of realized variance The predictive regression investigated involve monthly excess returns on the S&P 5 valueweighted portfolio and lagged predictors. The return data starts from January 196 and ends in December 15. The first order autocorrelation of the excess return is fixed at its ML estimate.933. Three values of φ are considered for testing with each predictor. The remaining three key parameters, namely T, φ and ρ yx, are reported in Columns 4 6. When φ is zero, as exhibited by its ML estimate, the departure from standard inferences is not serious in the FSS setting. As a result, the p-value of a two-sided test is close to the one obtained from the t-distribution in the standard setting. However, in the general setting considered in this paper, the p-values are conspicuously elevated compared with their standard counterparts as long as the predictor is highly persistent and ρ yx is sizable. These features are common to price-scaled variables such as DP, EP, E1P, and BM. In fact, the level of statistical significance decreases when DP, EP or BM is the predictor. Based on the t-ratio test, DY, E1P, and VRP are significant at 1% level. DP, BM, NTIS, and SII are significant at the 5% level. If the conditional expected return is highly persistent (φ =.98), the departure from standard inferences is substantial provided the predictor is also persistent. In such cases, using standard cutoffs leads to the over-rejection problem. However, when the predictor is weakly autocorrelated, spurious regression bias is less of a concern. Three (DFR, LTR, VRP) out of the fifteen predictors exhibit such a feature. As a result, their p-values stay almost flat under different values of φ. Table V reports the predictability test results based on ˆβ. It follows from Proposition 1 that the normalized slope estimator ˆβσ x /σ y only depend on φ, φ, ρ 1 and ρ yx under the null hypothesis. The p-values associated with a two-sided predictability test are reported in this table. The p-values are much larger than those based on the t-ratio in Table IV. Based on this test, DY and VRP produce statistically significant predictive slope coefficients. In summary, there are two main findings on drawing inferences about stock return predictability. First, based on the calibrated model parameters, the expected returns are not as persistent as one may think. Therefore, in the FSS setting, the spurious regression bias is not a serious concern. Second, allowing for contemporaneous correlation between the return

25 and predictor processes, spurious regression bias can arise even when the expected return is weakly autocorrelated provided that the predictor is persistent. The empirical analysis suggests that relying on standard inference tools leads to the over-rejection problem when conducting predictability test with price-scaled regressors such as dividend-price ratio and earnings-price ratio. IV. Conclusion The finance literature has suggested many variables that possess predictability on stock market returns, especially when the inferences are conducted in-sample. Given that there are so many potential candidates for predicting stock market returns, it is quite unlikely that any of the proposed predictive variables is capable of fully explaining the conditional expected return. At the same time, it is also unlikely that these predictive variables are totally independent of returns. In most realistic situations, a predictive variable is imperfectly correlated with conditional expected return. As a result, we would need to understand how to conduct statistical inference on predictive regressions when the predictors are imperfect. However, existing analyses on predictive regressions do not consider such a situation. In many cases, researchers conduct statistical inferences by assuming the predictive variable is perfect. While FSS consider misspecified predictor, they only study the extreme case where the return is independent of the return process. In this paper, we fill this void in the literature by laying out a general predictive model framework for stock returns, allowing for true predictors, imperfect predictors, as well as useless ones. We provide analytically tractable formulas for computing the probability distribution and the moments of the regression slope estimate, its OLS t-ratio and the coefficient of determination in simple regressions. We confirm the spurious regression bias reported by FSS when the predictor is independent of the return process. However, under the setup of FSS, this phenomenon exists only when the persistence of the conditional expected return is high. Allowing for contemporaneous correlation between the return and the predictor, we show that even with weakly autocorrelated expected return, there can still be substantial departure from the standard inferences in finite samples. The severity of spurious regression bias in this more general case increases with the persistence of the useless predictor, and with the contemporaneous correlation between the return and the predictor. Model misspecification has an important impact on the statistical inferences of predictive regressions. In particular, the distribution of the estimated slope coefficient in a predictive 3

26 regression is heavily influenced by the quality of the predictor. Comparing with the case of a prefect predictor, an imperfect predictor leads to larger bias and more volatility for the estimated slope coefficient. As a result, ignoring model misspecification can lead to erroneous inferences. Our general setting is empirically relevant. Many popular regressors exhibit sizable contemporaneous correlation with the stock return, and in the meantime they are highly autocorrelated. Relying on standard inference tools can lead to serious over-rejection problem. We revisit the popular predictors from Welch and Goyal (8), Bollerslev et al. (8) and from Rapach et al. (16). Allowing for imperfect predictor and conducting analysis based on finite sample distribution, we find that over the sample period from January 196 to December 15, only the dividend yield (DY) produces statistically significant predictive slope coefficient at the 5% level. However, this statistic significance does not exist in the second half of the period. In the second half of the sample period, only the variance risk premium (VRP) appears to be a strong predictor for the future market excess return. There are a number of directions for future work. The first one is to extend the analysis to predictive regression on multi-period returns. This extension is relatively straightforward. Such an analysis could allow us to speak to the optimal choice of return horizon when it comes to detecting return predictability. The second one is to extend our finite sample analysis to predictive regressions with multiple predictors. This task is extremely challenging and we hope to develop approximation formulas for the finite sample distributions of the relevant test statistics in predictive regressions with multiple predictors. The third direction is to extend our finite sample analysis to deal with distribution of out-of-sample R. This analysis would allow us to shed some light as to whether in-sample R or out-of-sample R are better suited for model comparison. Eventually, we would like to provide a toolkit for selecting predictive variables for predicting stock returns. For such a task, it requires a better understanding of predictive regressions that contain potentially imperfect predictors. 4

27 Table I Parameters in the Calibrated Trivariate Model This table reports the parameters in the calibrated trivariate model with return variable y t, conditional expected return x t, and predictor x t. The assumed DGP implies that x t follows an AR(1) process, and y t follows an ARMA(1,1) process. For model calibration, y t is the S&P 5 valueweighted return in month t, in excess of the one-month T-bill return. The predictor x t is the dividend-price ratio (D/P) on the S&P 5 index portfolio at the end of month t. The data sample covers the period January 196 December 15. µ y σy φ ρ 1 σx φ σ uw µ x ( 1 ) ( 1 4 ) ( 1 4 ) ( 1 4 ) Table II Critical Values for t-ratio and R-squared in Regressions with an Independent Predictor This table reports the 97.5th percentile of the distribution of the t-ratio, and the 95th percentile of the distribution of the estimated coefficient of determination ˆR, obtained from regressing the excess return y t on a lagged independent predictor x t 1, t = 1,..., T. The first-order autocorrelation coefficient, φ, of x t is fixed at.989. ρ 1 (φ ) is the first-order autocorrelation coefficient of the excess return y t (the conditional expected return x t ). Panel A tabulates the percentiles for T = 66 observations, Panel B for T = 84, and Panel C for T = 18. Panel A: T = 66 ρ 1 \ φ Critical t( ˆβ) Critical ˆR (%)

28 Table II (Cont d) Critical values for t-ratio and R-squared in regressions with an independent predictor. Panel B: T = 84 ρ 1 \ φ Critical t( ˆβ) Critical ˆR (%) Panel C: T = 18 ρ 1 \ φ Critical t( ˆβ) Critical ˆR (%)

29 Table III Critical Values for t-ratio and R-squared in Regressions with a Useless Predictor This table reports the 97.5th percentile of the distribution of the t-ratio, and the 95th percentile of the distribution of the estimated coefficient of determination ˆR, obtained from regressing the excess return y t on a lagged useless predictor x t 1, t = 1,..., T. Parameters ρ 1 and φ are set to their respective values in the calibrated model (Table I). ρ yx is the correlation coefficient between the contemporaneous return and the predictor. φ is the first-order autocorrelation coefficient of the predictor x t. Panel A tabulates the percentiles for T = 66 observations, Panel B for T = 84, and Panel C for T = 18. Panel A: T = 66 ρ yx \ φ Critical t( ˆβ) Critical ˆR (%)

30 Table III (Cont d) predictor. Panel B: T = 84 Critical values for t-ratio and R-squared in regressions with a useless ρ yx \ φ Critical t( ˆβ) Critical ˆR (%) Panel C: T = 18 ρ yx \ φ Critical t( ˆβ) Critical ˆR (%)

31 9 Table IV Common Predictors: OLS Regression Results and Test of Predictability Based on t-ratio This table reviews common predictors used in the literature to predict stock returns, and reports the predictability test results based on OLS t-ratio using the finite sample analysis derived in Section II. The dependent variable in the predictive regression is the monthly S&P 5 value-weighted portfolio return in excess of the 1-month T-Bill return. The individual predictors are listed in Column 1. The first 13 predictors are from Welch and Goyal (8). DP is the dividend-price ratio, DY is the dividend yield, EP is the earnings-price ratio, E1P is the 1-year moving average earnings to price ratio, DE is the dividend-payout ratio, BM is the book-to-market ratio for the Dow Jones Industrial Average, NTIS is net equity expansion, TBL is the interest rate on the three-month T-Bill, LTY is the long-term government bond yield, LTR is the return on long-term government bonds, TMS is the long-term government bond yield minus the T-Bill rate, DFY is the difference between Moody s BAA- and AAA-rated corporate bond yields, DFR is the long-term corporate bond return minus the long-term government bond return. SII is the short interest index from Rapach et al. (16). VRP is the variance risk premium from Zhou (9). The monthly data on all predictors except SII end in December 15, and monthly SII end in December 14. The p-values associated with two-sided tests based on standard regression inferences (OLS), based on the null distribution in the FSS setting (FSS), and based on the null distribution in the general setting (G) are all reported in percentage terms; *, **, and *** indicate significance at the 1%, 5%, and 1% levels, respectively. (1) () (3) (4) (5) (6) (7) (8) (9) (1) (11) (1) (13) φ = φ =.5 φ =.98 Predictor t( ˆβ) ˆR T φ ρ yx p-value p-value p-value p-value p-value p-value p-value (%) (OLS) (FSS) (G) (FSS) (G) (FSS) (G) DP *** 1.41** 3.43**.** 4.87** DY ***.15***.17***.3***.33*** EP **.6** 6.9* 3.78** 9.8* E1P ***.13***.35***.7***.65*** DE BM ***.84***.34** 1.43** 3.51** NTIS ** 4.1** 4.14** 5.75* 5.75* TBL * 9.99* 8.73* LTY LTR TMS DFY * DFR SII **.51**.76** 3.61** 3.96** VRP ***.***.***.***.***.***.***

Multivariate GARCH models.

Multivariate GARCH models. Financial market volatility moves together over time across assets and markets. Recognizing this commonality through a multivariate modeling framework leads to obvious gains