Bias Reduction and Likelihood Based Almost-Exactly Sized Hypothesis Testing in Predictive Regressions using the Restricted Likelihood

Size: px

Start display at page:

Download "Bias Reduction and Likelihood Based Almost-Exactly Sized Hypothesis Testing in Predictive Regressions using the Restricted Likelihood"

Maria Leonard
5 years ago
Views:

1 Bias Reduction and Likelihood Based Almost-Exactly Sized Hypothesis Testing in Predictive Regressions using the Restricted Likelihood Willa W. Chen Rohit S. Deo April 1, 7 Abstract: The restricted likelihood, which has small curvature, is derived for the bivariate predictive regression problem. The bias of the Restricted Maximum Likelihood (REML) estimates is shown to be approximately % less than that of the OLS estimates near the unit root, without loss of efficiency. The error in the chi-square approximation to the distribution of the REML based Likelihood Ratio Test (RLRT) for no predictability is shown to be 3/4 ρ n 1 (G 3 ( ) G 1 ( )) + O n, where ρ < 1 is the correlation of the innovation series and G s ( ) is the c.d.f. of a χ s random variable. This very small error, free of the AR parameter, implies that the RLRT for predictability has very good size properties even when the regressor is nearly integrated, a fact borne out by our simulation study. The Bartlett corrected RLRT achieves an O n error. The power of the RLRT under a sequence of local Pitman alternatives is obtained and shown to be always identical to that of the Wald test and higher than that of the Rao score test for empirically relevant regions. Some extensions to the case of vector AR(1) regressors and more general univariate regressors are provided. The RLRT is found to work very well in simulations and to be robust to non-normal errors. Keywords: Bartlett correction, likelihood ratio test, curvature Department of Statistics, Texas A&M University, College Station, Texas 77843, USA New York University, 44 W. 4 th Street, New York NY 11, USA; rdeo@stern.nyu.edu 1

2 1 Introduction A question of interest in financial econometrics is whether future values of one series {Y t } can be predicted from lagged values of another series { t }. The hypothesis of no predictability is commonly tested under the assumption that the two series {Y t } n t=1 and { t} n t= following model: obey the Y t = c + β t 1 + u t, (1) t = µ + α t 1 + v t, () where α < 1, u t = φv t + e t, (e t,v t ) N, diag σe,σ v are an i.i.d. series and ³ N µ (1 α) 1,σv 1 α 1. Interest generally centers on the case where the regressor series { t } possesses a strong degree of autocorrelation with the autoregressive parameter α lying close to the unit root. It is well known (Stambaugh, 1999) that the standard ordinary least squares (OLS) estimate of β is biased when the errors {u t,v t } are contemporaneously correlated, with the amount of bias increasing as α gets closer to unity. This bias results in the corresponding t-statistic being biased with poor size properties. However, Stambaugh (1999) provided a simple estimable expression for the bias in the OLS estimate of β that allows the researcher to compute a bias-corrected OLS estimate as well as a t-statistic based on the it. See, for example, Amihud and Hurvich (4). There is currently, however, no known theoretical justification that such a t-statistic based on the bias-corrected OLS estimate will have improved size properties relative to the test based on the uncorrected OLS estimate. Indeed, Sprott and Viveros-Aguilera (1984) and Sprott (199) point out that if inference is the goal of the researcher, computing pivotal t statistics from bias corrected point estimates is not necessarily guaranteed to improve finite sample performance, even providing an example to make this point. Instead, following Fisher (1973), Sprott argued in a series of papers (1973, 197, 198, 199) that from an inferential point of view, issues such as the bias of point estimates may be irrelevant in small samples, and he stressed the importance of examining the likelihood. In brief, one of Sprott s (197, 198) key ideas can be summarised roughly as follows: (i) There may exist some 1 1 transformation of the parameters such that the likelihood ratio

3 for the transformed parameters is approximately quadratic in the pivotal statistic based on that transformed parameter. (ii) The conditions that allow for this quadratic approximation to be adequate also improve the normal approximation to the distribution of the pivotal statistic. It follows from these two observations that since the likelihood is invariant under 1 1 reparametrisation, the likelihood ratio will be well approximated by a chi-square variable in finite samples as long as some parametrisation satisfying (i) exists, even if one does not know what that parametrisation is. Sprott (1973, 197, 198) showed that such a parametrisation would exist if the curvature of the log-likelihood 1, as measured by a function of its higher order derivatives, was small. The use of such a likelihood would then result in a well behaved likelihood ratio test (LRT) in finite samples. See also Efron (197) and McCullagh and Cox (1986) for a geometrical approach to curvature and likelihood ratio based hypothesis testing. A curvature related approach to tackling the predictive regression problem for the model in (1) and () was taken by Jansson and Moreira (6), assuming that the error covariance parameters (φ, σ e,σ v ) are known, that the series { t } has zero intercept (i.e. µ =)and that the initial value is known to be. Under these assumptions, Jansson and Moreira (6) noted that though the joint likelihood of (y t y t 1,x t ) was curved exponential, they could remove the curvature and obtain a linear exponential family by conditioning on a statistic that is specific ancillary for β. The linear exponential family obtained by conditioning enabled them to obtain tests with certain finite sample optimality properties. All the finite sample properties that Jansson and Moreira (6) obtain, stated in their Theorems and and Lemmas 1, 3 and 4, are under the assumptions stated above, viz. that (φ, σ e,σ v ) are known, that the series { t } has zero intercept (i.e. µ =)and that the initial value is known to be. Such assumptions would normally not be satisfied in practice. Also, we will see below in Section that the assumption µ =removes a major source of the problem since the likelihood already has smaller curvature when µ =. In their Theorem 6, Jansson and Moreira (6) allow for an unknown µ taking 1 There are several formal measures of curvature based on higher order derivatives of the likelihood in the literature. See, for example, Kass and Slate (1994). We will not define any such measure explicitly since that is not the focus of our work here. 3

4 potentially non-zero values. However, in this Theorem 6 they do not provide any finite sample results on the size or power properties of their test statistic but pointwise asymptotic results along a sequence of local-to-unity parameter values. This pointwise asymptotic result in their Theorem 6, which allows for arbitrary unknown µ, is obtained under the assumption that the error covariance parameters (φ, σ e,σ v ) are known. The approach we take in this paper is to work with an unconditional likelihood, called the restricted likelihood, that already has very small curvature without making any assumptions about knowledge of any nuisance parameters. As a result, we are able to obtain theoretical results that demonstrate that the LRT based on the restricted likelihood (RLRT) has good finite sample performance in the predictive regression problem. Interestingly, we will see that the restricted likelihood also has an interpretation as a conditional likelihood, though the conditioning in this case is done with respect to a statistic that is complete sufficient for (c, µ). In Section, we provide motivation for considering the restricted likelihood. We then obtain the restricted likelihood for the bivariate predictive regression modelandprovideresultsonthebiasofthe REML estimates. In Section 3, we state our result on the finite sample behaviour of the RLRT for β and compare its power under a sequence of Pitman alternatives to that of the restricted likelihood based Wald and Rao score test. Section 4 provides results on extensions of the REML method to higher order AR processes for the regressor series, as well as multivariate AR(1) regressors. The finite sample performance of the REML estimates and the RLRT is studied through simulations in Section. All proofs are relegated to the Appendix at the end of the paper. Restricted Maximum Likelihood Estimation To understand why the restricted likelihood may yield well behaved LRT s in the predictive regression context, it is instructive to consider LRT s for α in the univariate AR(1) model given in (). If LRT α,µ denotes the LRT for testing H : α = α versus H : α 6= α in that model, 4

5 then Theorem of van Giersbergen (6) in conjunction with the results of Hayakawa (1977, 1987), Cordeiro (1987), Barndorff-Nielsen and Hall (1988) and Chesher and Smith (199) yields P (LRT α,µ x) G 1 (x) =. (1 + 7α )(1 α ) 1 n 1 (G 3 (x) G 1 (x)) + O n, (3) where G s (x) is the c.d.f. of a χ s random variable. On the other hand, if LRT α denotes the LRT for testing H : α = α versus H : α 6= α in the univariate AR(1) model given in () with µ known to be, then it follows from Theorem 1 of van Giersbergen (6) that P (LRT α x) G 1 (x) =.n 1 (G 3 (x) G 1 (x)) + O n, (4) It is obvious from (3) that LRT α,µ is very unstable when the autoregressive parameter α is close to the unit root with the leading error term in the Edgeworth expansion going to infinity. In stark contrast, we see from (4) that LRT α is very well behaved with the leading term in its Edgeworth expansion being both very small and free of α. Simulation results in Figure 1 of van Giersbergen (6) confirm the accuracy of the standard chi-square approximation for LRT α, even when α is close to unity. Not surprisingly, van Garderen (1999) has found that for the univariate AR(1) model in () with µ known to be, the Efron (197) curvature (one of the standard measures of curvature of the likelihood) is very small 3, being of the order O n and converging to zero as α 1 for every fixed n. These results indicate that the culprit in the finite sample failure of the LRT in the univariate AR(1) model is the unknown intercept µ. Since the bivariate prediction model in (1) and () is a vector AR(1) (with the first column of the coefficient matrix restricted to zero), one is led to suspect that the LRT for β may perhaps be well behaved if the intercept vector (c, µ) were van Giersbergen (6) provides the expected value of the LRT, while the combined results from the remaining references show that the leading term in the Edgeworth expansion (3) is half that expected value. 3 Interestingly and somewhat surprisingly, van Garderen (1999) found that if the innovation variance σv is knowntheefroncurvatureofthemodelisn 1 + O n, which, though still small, is larger than when σv is unknown and he provided a geometrical explanation for this phenomenon. Correspondingly, results in van Giersbergen (6) imply that the coefficient of the leading term in (4) increases from.n 1 to 1n 1 when σv is known, indicating that the LRT is better approximated by the limiting chi-square distribution when the innovation variance is unknown than when it is known, though, of course, both approximations are very good by themselves.

6 known. Since the assumption that (c, µ) is known is extremely unrealistic, we are prompted to seek a likelihood that does not involve the location parameters and yet possesses small curvature properties similar to those of the model with known location parameters. The restricted likelihood turns out to be the one that has such properties and we turn next to defining it and stating some of its properties. The idea of restricted likelihood was originally proposed by Kalbfleisch and Sprott (197) precisely as a means of eliminating the effect of nuisance location (more generally, regression coefficient) parameters when estimating the parameters of the model covariance structure. In general, the restricted likelihood is defined as the likelihood of linearly transformed data where the linear transformation is chosen to be orthogonal to the regression design matrix (and orthogonal to the vector of ones in the case of a location parameter). Harville (1974) provided a simple formula for computing the restricted likelihood and pointed out that the restricted likelihood would change only by a multiplicative constant for different choices of the linear transformation. Harville (1977) showed that REML estimates of the parameters of the covariance structure do not suffer any loss of efficiency due to the linear transformation of the data. Harville (1974) also provided a Bayesian interpretation of the restricted likelihood, while Smyth and Verbyla (1996) showed that the restricted likelihood is also the exact conditional likelihood of the original data given the complete sufficient statistic for the regression coefficient parameters. Though the restricted likelihood has been studied primarily in the context of variance component models, there has also been some work on it in the context of time series models. See, for example, Tunnicliffe Wilson (1989) and Rahman and King (1997), among others. Francke and de Vos (6) studied unit root tests for AR(1) models based on the restricted likelihood, while Chen and Deo (6 a, b) showed that confidence intervals for the sum of the autoregressive coefficients of univariate AR(p) processes based on the restricted likelihood have good coverage properties, even when the series is close to a unit root process. The restricted maximum likelihood (REML) estimates also are less biased than regular ML estimates in nearly integrated univariate AR models with intercept (Cheang and Reinsel, ) and with trend (Kang, Shin and Lee, 3). When =(,..., n ) follows the univariate AR(1) model in (), it follows from Harville 6

7 (1974), that the restricted log-likelihood is given by L R σ v,α, ³ n = log σv + 1 µ log 1+α 1 (n 1) (1 α)+ σv Q (α), () where Q (α) = Σ 1 Σ Σ 1 1 (6) and Σ Var(), whereas the regular likelihood of for model () with known µ (set, w.l.o.g., to zero) is L σv,α, µ n +1 = log σv + 1 log 1 α 1 σv Σ 1. (7) On comparing () and (7) and noting that the second term in Q (α) is O (1) whereas Σ 1 is O (n), it immediately becomes apparent that the restricted likelihood in this case differs on a relative scale by only an order O n 1 from the likelihood of the AR(1) process with known intercept µ. Due to this, it can be shown that the curvature properties of the restricted likelihood for the AR(1) as well as the Edgeworth expansion of the LRT are the same as those of the AR(1) model with known intercept, up to order O n 1. As a matter of fact, the restricted likelihood for the AR(1) also provides REML estimates of α whose bias is αn 1 + O n and thus is identical, up to order O n 1, to the bias of the maximum likelihood estimate when the intercept is known. See Marriott and Pope (194) and Cheang and Reinsel (). Since the bias of the maximum likelihood estimate of α when the intercept is not known is (1 + 3α) n 1 + O n, the REML estimate is able to achieve a bias reduction of approximately % when the autoregressive parameter is close to the unit root. All of these results show that the restricted likelihood provides a great advantage for both hypothesis testing and estimation for the univariate AR(1) model through the elimination of the nuisance intercept parameter. We now show that the restricted likelihood continues to do so in the case of the bivariate predictive model. We first define some quantities that will be useful for us in stating our results. For the observed data (Y, ) =(Y 1,...,Y n,,..., n ) define 1 =( 1,..., n ) and =(, 1,..., n 1 ). Define the sample means Ȳ =n 1 1 Y, =(n +1) 1 1, 1 =n and =n 1 1 and the sample mean corrected data Y c = Y 1Ȳ, c = 1 1 1, 1.Define S (φ, β, α) =(Y c φ c,1 (β φα) c, ) (Y c φ c,1 (β φα) c, ) (8) 7

8 and note that for computational purposes Q (α) givenin(6)canbewrittenas Q (α) = n t n 1 + α t n 1 α t t+1 t= t=1 (1 α) α + n (n 1) (1 α)+. We now can state the following theorem. t= Theorem 1 For the model given by (1) and (), the REML log-likelihood up to an additive constant is given by L µ β,α,φ,σv,σ e n 1 = log σe 1 σe S (φ, β, α) (9) ³ n log σv + 1 µ log 1+α 1 (n 1) (1 α)+ σv Q (α). The REML estimates ˆψ ³ = ˆβ, ˆα, ˆφ, ˆσ v, ˆσ e are given by ½ µ ¾ n log Q (α) log, and The bias in ˆα =argmin α ³ ˆα, ˆβ is given by 1+α (n 1) (1 α)+ ³ˆφ, ˆβ = 1 1 c c c Y c, ˆα 1 ³ S ˆφ, ˆβ, ˆα ˆσ e = n 1 ˆσ v = Q (ˆα) n. E (ˆα α) = α n 1 + o n 1 and and ³ E ˆβ β = φe (ˆα α) µ α n 1 = φ ³ E ˆφ φ =. + o n 1, 8

9 Remark 1 It is obvious that obtaining the REML estimates is computationally easy since all the estimates are obtained in succession after the optimisation of a one-dimensional function which is almost quadratic. Remark Note that the restricted likelihood in (9) is well defined at the unit root α =1, without having to assume that the initial value is fixed when α < 1. It is interesting to compare the bias in ˆβ with the bias in the OLS estimate ˆβ OLS, given by (see Stambaugh 1999) ³ µ 1+3α E ˆβOLS β = φe (ˆα OLS α) = φ + o n 1, n 1 where ˆα OLS is the OLS estimate of α. Thus, the bias in ˆβ depends upon the bias in ˆα in a manner identical to the way the bias in ˆβ OLS depends on the bias in ˆα OLS. Consequently, the approximately % reduction in bias that ˆα achieves compared to ˆα OLS closetotheunitrootis inherited by ˆβ, relative to ˆβ OLS. The bias expression in Theorem 1 also suggests a bias corrected version of the REML estimate of β that may be computed as µ ˆβ c = ˆβ + ˆφ ˆαc, n 1 where ˆα c =ˆα µ n +1 n 1 is the bias corrected REML estimate of α. Since the bias correction term φα (n 1) 1 is smaller than φ (1 + 3α)(n 1) 1, one would expect the bias corrected REML estimate ˆβ c to have both less bias and a smaller variance than its bias corrected OLS counterpart, µ ˆβ OLS,c = ˆβ OLS + ˆφ 1+3ˆαc, n 1 particularly since the parameter φ can take any value in (, ). (Amihud and Hurvich (4) find empirical estimates of φ that are as large as 94). This is indeed the case, as we see in the simulations reported in Section below. The restricted likelihood given in (9) is derived for the situation where we assume that ³ the initial value comes from the stationary distribution N µ (1 α) 1,σv 1 α 1. A 9

10 similar argument can be used to obtain the restricted likelihood for the model in (1) and () where the regressor series follows an asymptotically stationary process, given by t = µ + e t, where e t = α e t 1 + v t for t 1 and e = v 1. Under this assumption, the restricted likelihood is given by where el µ β,α,φ,σv,σ e n 1 = log σe 1 σe S (φ, β, α) (1) ³ n log σv + 1 µ log 1 n (1 α) 1 eq +1 σv (α), eq (α) =( ˆµ (α)) + n ( t ˆµ (α) α ( t 1 ˆµ (α))) t=1 and ˆµ (α) = +(1 α) P n t=1 t 1+n (1 α). The bias results of Theorem 1 continue to hold for the REML estimates under this initial value condition. In the next section, we provide a theorem that shows that the REML based LRT has very good finite sample properties, in that its finite sample distribution approaches the limiting one very quickly and is practically unaffected by nuisance parameters, while also maintaining power against local alternatives when compared to the Wald and score test. 3 REML Likelihood Ratio Test One standard method for testing the composite hypothesis H : β =vs. H a : β 6= is the likelihood ratio test (LRT) which compares the log-likelihood evaluated at the unrestricted estimates of the parameters to the log-likelihood evaluated at the parameter estimates obtained under the restriction that the null hypothesis H : β =is true. Using the quantities defined in (8) and just above it, it can be easily verified that under H : β =the restricted estimates ³ ˆψ =, ˆα, ˆφ, ˆσ v, ˆσ e are obtained as µ n log Q (α) log +(n 1) log R (α), ˆα =argmin α 1+α (n 1) (1 α)+ 1

11 where R (α) =Y c ³ I Z c (α) Z c (α) Z c (α) 1 Z c (α) Y c, Z c (α) = α 1, ˆφ = Z c (ˆα ) Z c (ˆα ) 1 Z c (ˆα ) Y c, ˆσ v, = Q (ˆα ), ˆσ e, = 1 n n 1 R (ˆα ). Just as in the unrestricted case, it is obvious that obtaining the restricted estimates requires only the optimisation of a nearly quadratic one dimensional function. The REML based likelihood ratio test (RLRT) for testing H : β =vs. H a : β 6= is now given by ³ ³ R T = L ˆψ +L ˆψ, (11) where L ( ) is the REML log-likelihood presented in (9). Under H : β =, the asymptotic distribution of R T is χ 1, the chi-square distribution with one degree of freedom. In finite samples, however, the distribution of a LRT may deviate from the limiting χ distribution, particularly in the presence of nuisance parameters. The first step towards attempting to improve this approximation of the finite sample distribution by the limiting one was taken by Bartlett (1937). Bartlett argued that if a LRT is asymptotically distributed as χ p and E (LRT )=p(1 + cn 1 )+ O n for some constant c, then 1+cn 1 1 LRT would be better approximated by the limiting χ p distribution. The quantity 1+cn 1, which may depend on the underlying parameter values, has since become known as the Bartlett correction to the likelihood ratio test. There is a large literature that has developed subsequently that deals with such corrections. This literature includes Lawley (196), who derived a general formula for the Bartlett correction, Hayakawa (1977, 1987), who provided an asymptotic expansion of the distribution of the LRT and Barndorff-Nielsen and Hall (1988), who showed that for regular models the Bartlett corrected LRT is approximated by the limiting χ distributionwithanerrorofo n compared to an error of O n 1 for the uncorrected test statistic. For the predictive regression model in this paper, we are able to obtain the following theorem on the finite sample behaviour of the RLRT. Theorem Under H : β =in the model given by (1) and (), we have P (R T x) =P χ 1 x + 3/4 ρ n 1 P χ 3 x P χ 1 x + O n, (1) 11

12 where ρ = Corr(u t,v t ). Theorem in conjunction with the result of Barndorff-Nielsen and Hall (1988) yields the following corollary. Corollary 1 If the Bartlett corrected RLRT is defined as R TB = 1+ 3/4 ˆρ n 1 1 RT, where ˆρ = ³ˆφˆσ v +ˆσ e 1 ˆφˆσ v, then P (R TB x) =P χ 1 x + O n. Remark 3 The results of Theorem and Corollary 1 continue to hold for the restricted likelihood given in (1) when the initial condition is not from the stationary distribution. The results of Theorem obviously imply that the χ 1 approximation to R T is very good and almost unaffected by the nuisance parameters. Most importantly, the leading term in the error is free of the AR parameter, which most affects the finite sample performance of tests particularly when it is close to unity. Theorem also suggests two very simple ways in which one could adjust the p value when carrying out a test of H : β =. Onewouldbetousethefirst two terms on the right hand side of (1), with ρ replaced by ˆρ. The other is to use the Bartlett corrected statistic R TB. However, (1) suggests that the original test R T used in conjunction with the standard χ 1 distribution should be very well behaved and any improvements will be minimal at best. This belief is supported by the simulations that we provide in Section. It is worth noting that though the REML likelihood does not provide an unbiased estimate of β (indeed, the bias of ˆβ can be arbitrarily large due to the fact that φ is unbounded, as noted below Theorem 1), the REML likelihood yields a very well behaved test for β, irrespective of how large φ is. This result serves to illustrate the point that it may be more desirable at times 1

13 to carry out tests of hypothesis using appropriate likelihoods rather than using parameter point estimates and supports the idea that bias can be irrelevant in inference, as described in the Introduction. It is interesting to compare the quality of the chi-square approximation in Theorem, where all four parameters α, φ, σv,σ e are nuisance parameters, with the approximation in the case where the parameters φ, σv,σ e are known and α is the only nuisance parameter. From equation (31) and the discussion below it in the proof of Theorem, it can be seen that if φ, σv,σ e are known the relevant RLRT for testing H : β =, denoted by R T,1, would satisfy P (R T,1 x) =P χ 1 x ρ n 1 P χ 3 x P χ 1 x + O n. By comparing this result with that in Theorem, one sees that the quantity 3/4 is a measure of the extent to which lack of knowledge of the innovation parameters (φ, τ v,τ e ) affects the finite sample distribution of R T. Furthermore, since sup ρ > sup ρ 3/4 for ρ ( 1, 1), the chi-square approximation to the RLRT is better in a "minimax" sense for the case when the innovation covariance parameters are unknown than when they are known. This finding is very much in keeping with the results of van Giersbergen (6) for the LRT in the univariate AR(1) model that we described in the footnote at the beginning of section. The result in Theorem guarantees that the RLRT will yield a test that is almost of exact size in finite samples. However, one would also like to ensure that this is not achieved at the expense of loss of power. The obvious tests that are competitors to the RLRT are the Wald and Rao score test based on the restricted likelihood. It is a well known fact that just as these three tests share the identical limiting distribution under the null hypothesis, they also have the same power properties to first order. Hence, in order to distinguish between them one has to consider a sequence of local Pitman alternatives given by H a : β = β + ξn 1/. The next Theorem obtains the power function of the RLRT, Wald and Rao score test against such local alternatives. Theorem 3 Let RLRT, W and RS denote the LRT, Wald test and Rao score test respectively of H : β = β based on the restricted likelihood in (9) of the model in (1) and (). Assume that 13

14 the true value of β is given by β = β + ξn 1/. Define = 1 σv 1 α φ σv + σe ξ + O n 1, αφσ 4 C 1 = vξ 3 (1 α ) (φ σv + σe) and 3αφσ C = vξ (1 α )(φ σv + σe). Let Ḡs, (x) denote the survival function of a non-central χ random variable with s degrees of freedom and non-centrality parameter. Then, and P (RLRT > x) =Ḡ1, (x)+ C 1 n 1/ Ḡ3, (x).ḡ1, (x).ḡ, (x) + O n 1, P (W >x)=p (RLRT > x)+o n 1, P (RS > x) =P (RLRT > x)+ C 1. n 1/ Ḡ, (x).ḡ7, (x) (13) + C. n 1/ Ḡ 1, (x).ḡ, (x) + O n 1. From Theorem 3 we see that the RLRT and the Wald test based on the restricted likelihood have identical power up to second order (i.e. up to O n 1 ) against local Pitman alternatives. Since Ḡs, (x) Ḡl, (x) < for all x> when l>s,it follows from (13) that RLRT will be guaranteed to be more powerful than the Rao score test against local alternatives if C 1 > and C >. This will be the case if φ<and ξ>, which is exactly the part of the parameter space which is of relevance in empirical applications in finance and economics. It is also interesting to note that the non-centrality parameter, which will be the main source of power, increases as α gets closer to the unit root. The non-centrality parameter also depends inversely on φ, but does not depend on the correlation between the two innovation series (u t,v t ). In the next Section we derive the REML likelihood under more general models for the regressor series as well as for multiple regressors and discuss some efficiency and computational issues. 14

15 4 REML for more general regressor models It is easy to generalise the REML likelihood in two directions that are both of practical interest. One generalisation is to the case where the predictor series is a multivariate AR(1) process. Applications of such models can be found, for example, in Amihud and Hurvich (4), who considered dividend yield and earnings to price ratio as bivariate predictors of market returns. The other generalisation is to the case where the univariate predictor follows a higher order AR process. We will state the REML log-likelihood for both these cases, starting with the multivariate AR(1) predictor model. The method of obtaining the log-likelihood is identical to that used in Theorem Multivariate regressors Assume that the data (Y 1,...,Y n,,..., n) follows Y t = c + β t 1 + u t, (14) t = µ + A t 1 + v t, (1) where u t = φ v t + e t, (e t, v t) N,diag(σ e, Σ v ) is an i.i.d. series and A is a k k matrix with all eigenvalues less than unity in absolute value. Let Σ v Var(v t ) and Σ Var( t ), given by and define ˆτ = Σ 1 vec (Σ )=(I k A A) 1 vec (Σ v ) + n (I A) Σ 1 v (I A) " 1 Σ 1 +(I A) Σ 1 v (I A) # n t. t=1 Lemma 1 Then the REML log-likelihood up to an additive constant for the model in (14) and 1

16 (1) is given by µ n 1 L M = log σe 1 σe S (φ, β, A) 1 log Σ n log Σ v (16) 1 log Σ 1 + n (I A) Σ 1 v (I A) ( ) where 1 ( ˆτ) Σ 1 ( ˆτ)+ S (φ, β, A) = n t=1 ( t ˆτ A ( t 1 ˆτ)) Σ 1 v ( t ˆτ A ( t 1 ˆτ)) n Yt,c φ t,c β φ A t 1,c, t,c = t n 1 P n t=1 t and t 1,c = t 1 n 1 P n t=1 t 1. t=1. To ease the computational burden during optimisation, the likelihood can be defined in terms of the re-parametrised set Σ v,σ e, A,φ,γ, where γ = β Aφ. This re-parametrisation allows us to concentrate σ e,φ,γ out of the likelihood, thus reducing the dimensionality of the optimisation problem. The likelihood can then be sequentially optimised, first over (Σ v, A), with the REML estimates of σ e,φ,γ being then obtained by OLS through the minimisation of S (φ, γ). Afurthersimplification of the above likelihood occurs if the coefficient matrix A is diagonal, given by A =diag (α 1,...,α k ) with max i α i < 1, in which case one gets µµ σv,ij Var( t ) Σ =, 1 α i α j where Σ v =((σ v,ij )). Amihud and Hurvich (4) find evidence to support this model with a diagonal coefficient matrix in the empirical example that they consider. It should be noted that in the case where A can be assumed to be a diagonal matrix, the predictive regression model is no longer a SUR system and hence OLS will no longer be efficient. However, REML will clearly retain efficiency, no matter what the form of A is, thus giving it an advantage both in terms of asymptotic efficiency and power over any OLS based procedure. Since the dimension of the parameter space is very large in the vector case, it is not feasible to obtain a result such as Theorem in the most general case. However, in the case where A is a diagonal matrix and where σe,φ,σ v are assumed known with Σv diagonal, we are able to obtain the following result on the finite sample behaviour of the RLRT for testing H : β =. The proof follows along lines similar to those for Theorem and is omitted. 16

17 Theorem 4 In the model given by (14) and (1), assume that A = diag (α 1,...,α k ), with max i α i < 1, and that σe,φ,σ ³ v are known with Σv = diag σv,11,...,σ v,kk. Let R M denote the RLRT based on the restricted likelihood in (16) for testing H : β =. Then, P (R M x) =P χ k x n 1 Ã k i=1! φ i σ v,ii σ e P χ 1+φ i σ v,ii σ k+ x P χ k x +O n. e Since <n 1 P ³ k i=1 φ i σ v,ii σ e 1+φ i σ v,ii σ e 1 <n 1 k trivially, the result shows that the χ distribution once again provides a very good approximation to the RLRT in this situation. It is useful to note that Theorem 4 shows that the quality of the χ approximation to the RLRT is affected only minimally by the dependence between u t and v t, over which one has no control. However, once one variable has been chosen, we can control which other variables should be included in the model and it is preferable to use a group of variables that have low (ideally zero) correlation among themselves to avoid unnecessary multicollinearity. Hence, the assumption in the Theorem that Σ v is diagonal is not very unreasonable. Finally, from the discussion below equation (4) and below Theorem, one would expect the χ approximation to continue to work well when σ e,φ,σ v are unknown (with Σv diagonal) and, indeed, we find this to be the case in our simulations in Section below. As a matter of fact, the simulations show that the RLRT behaves very well even when the cross-correlation in the variables is as high as.9 The restricted likelihood for the AR(p) regressor case can be derived in a manner analogous to that for the AR(1) case and is given next. 4. Higher order autoregressive regressors Let the observed data (Y 1,...,Y n, p+1, p+,..., n ) follow Y t = c + β t 1 + u t (17) and t = µ + α 1 t α p t p + v t, (18) 17

18 where u t = φv t + e t and (e t,v t ) N, diag σe,σ v are an i.i.d. series. Furthermore, assume that all the roots of the polynomial z p P p s=1 zp s α s lie within the unit circle. Define Y c = Y 1Ȳ, c = 1 1 1,..., p+1 1 p+1, where i =( i, i+1,..., n 1 i ) and i = n 1 1 i. Lemma The REML log-likelihood up to an additive constant for the model in (17) and (18) is given by L µ α, β, φ, σe,σ v n 1 = log σe 1 σe S (φ, β, α 1,...,α p ) (19) µ n + p 1 log σv 1 log Σ 1 log 1 Σ σv ( 1ˆτ) Σ 1 ( 1ˆτ) where Var() =σv Σ, ˆτ = 1 Σ Σ 1 and S (φ, β, α 1,...,α p )=Z cz c where Ã! p Z c = Y c φ c,1 (β φα 1 ) c, + α c,i c,i+1. i= Though at first glance the expression in (19) looks formidable, it is actually very easy to compute. The quantity S (φ, β, α 1,...,α p ) is, of course, just a quadratic form. It is also well known that both the determinant Σ and bilinear/quadratic forms of the type y Σ 1 x are very easy to compute for AR(p) models, since for such models Σ 1 canbeeasilyexpressedas B B for some lower triangular matrix B. (See equation below). After a re-parametrisation γ = β φα 1, the parameters σe,σ v,φ,γ can be concentrated out of (19) and the concentrated log-likelihood optimised over the remaining parameters (α,...,α p ). In the next Section we report the results of our simulations. Simulations We first study the performance of four estimators of β: (i) The OLS estimate ˆβ OLS (ii) The bias corrected OLS estimate ˆβ OLS,c (iii) The REML estimate ˆβ and (iv) The bias corrected 18

19 REML estimate ˆβ c. The bias corrected estimators are defined below Theorem 1. The data was simulated from the model given by equations (1) and () with three sample sizes, n =, 1 and 1. For each sample size, the predictive regression slope coefficient β was set to while the autoregressive coefficient α was varied over three values,.9,.97 and.99. Since the bias in the estimate of β is proportional to φ, we chose φ = 8 (as noted earlier, Amihud and Hurvich, 4, find empirical estimates of φ as high as -9 ) to highlight the bias gains to be had by using the bias corrected REML estimate. The values of σv and σe were set to be.4681and 1 respectively, implying that Corr(u t,v t )=.8, which is representative of the innovation correlation found in empirical examples. Tables I-III report the simulation means and standard deviations of these four estimates based on 1, replications for each of the three values of α. As predicted by the theory, the bias of ˆβ is uniformly less than that of ˆβ OLS, and approximately half its value when α =.99. At the same time, ˆβ does not suffer any loss of efficiency, indeed having a uniformly smaller standard deviation than that of ˆβ OLS. The bias of ˆβ c is also always substantially less than that of ˆβ OLS,c, at times by as much as 7% and its standard deviation is uniformly lower too. The simulation results clearly demonstrate the advantage enjoyed by the REML estimate in terms of both bias and standard deviation over the OLS estimate. To test the robustness of the procedure to non-normal thick tailed errors, we also generated data from the same model and parameter configurations, but letting (e t,v t ) be independent t errors. The results are presented in the second set of columns in Tables I-III and it is seen that once again, ˆβ and ˆβ c are consistently better than ˆβ OLS and ˆβ OLS,c respectively in terms of both bias and standard deviation. We next turn to studying the quality of the χ 1 approximation to the distribution of the RLRT for the above model and parameter configurations with both normal and t innovations. The χ 1 approximation was assessed by three measures (i) QQ plots of the RLRT against the theoretical quantiles of a χ 1 distribution (ii) simulation means and standard deviations of the RLRT (which are theoretically 1 and respectively for a χ 1 distribution) and (iii) simulation sizes at the % and 1% level. The simulation mean, standard deviation and sizes are reported intableivwhiletheqqplotsareshowninfigure1. (WepresenttheQQplotsonlyforthe normal innovations since the plots for t errors are qualitatively similar in nature). Apart from 19

20 a small distortion at n =and α =.99, the RLRT is seen to be very well approximated by the χ 1 distribution according to each of the three measures. It is also worth stressing again that the performance of the RLRT is not affected by the bias of ˆβ at all. We finally generate data from a model in which the regressors are a bivariate AR(1) model. More specifically, we use the model Y t = β t 1 +u t and t = A t 1 +v t, where u t = φ v t +e t. Thesamplesizewassetatn =, the slope vector β was set to zero, while φ =( 8, 8). Two configurations of the autoregressive coefficient matrix A were considered, diag (.9,.8) and diag (.9,.9). The innovation matrix Σ v was set to one of the following three configurations: (a) the identity matrix (b) variances equal to and correlation ρ v =. (c) variances equal to 1 and correlation ρ v =.9. In all cases, the innovation variance σe was set to unity. This design (except for Σ v = I) matches that used in Amihud and Hurvich (4). The number of replications for each parameter configuration was, in the vector case. As before, the quality of the χ approximation to the distribution of the RLRT was assessed via three measures (i) QQ plots of the RLRT against the theoretical quantiles of a χ distribution (ii) simulation means and standard deviations of the RLRT (which are theoretically and 4 respectively for a χ distribution) and (iii) simulation sizes at the % and 1% level. The results are provided in Tables V and VI and Figure. As noted in sub-section 4.1 above, OLS is no longer asymptotically efficient if A is a diagonal matrix since the system is no longer a SUR. Hence, not only does REML afford a dramatic reduction in bias over OLS, it also provides a great reduction in variance, as can be seen in Table V. Furthermore, for the inference problem it is once again seen that the RLRT is very well approximated by the χ distribution in all the cases we consider. The overall conclusion to be had from the simulations is that the REML procedure yields point estimates that are much less biased than their OLS counterparts and also an RLRT that is very well behaved, even when the regressors are close to being integrated. 6 Appendix ProofofTheorem1:

21 As noted at the start of Section ) above, the REML likelihood corresponds to the likelihood of TZ, where T is any full row rank matrix such that T1 =. We will obtain this likelihood by choosing T to have the form T = T 1, T where T 1 and T are full row rank matrices of dimension (n 1) n and n (n +1)respectively, satisfying T 1 1 =, T 1 =, T 1 T 1 = I and T T = I and using the fact that L (TZ) =L (T 1 Y / T ) L (T ). We first obtain L (T 1 Y / T ). Since u t = φv t + e t, where φ = σ uv /σ v and e t N,σ e is a series independent of {v t }, we get Y t = c + β t 1 + φ ( t µ α t 1 )+e t () = c + φ t +(β φα) t 1 + e t. Let Ỹ = T 1Y, 1 = T 1 1 and = T 1. From () it then follows that Ỹ = θ+ẽ, (1) where = h i 1,, ẽ = T 1 e, e =(e 1,...,e n ) and θ =(φ, β φα). Since t is a function only of {v t,v t 1,...}, the series { t } is independent of {e t }. Furthermore, knowledge of T, where T is any full row rank matrix such that T 1 =, implies knowledge of. Hence, from ³ (1) the conditional distribution of Ỹ given T is N θ, σ e I, since T 1 T 1 = I. It follows that the conditional log-likelihood of ³Ỹ / T up to an additive constant is given by µ n 1 l 1 ³Ỹ / T,θ,σe = log σe 1 ³Ỹ θ ³Ỹ θ σe. Note, however, that θ = T 1 [ 1, ] θ. Thus, ³Ỹ θ ³Ỹ θ =(Y [ 1, ] θ) T 1T 1 (Y [ 1, ] θ). Since the matrix T 1 when augmented by the row n 1/ 1 is an orthogonal matrix, it follows that T 1 T 1= I n 1 11 and hence ³Ỹ θ ³Ỹ θ =(Y [ 1, ] θ) I n 1 11 (Y [ 1, ] θ) = S (φ, β, α). 1

22 Thus, we get µ l 1 ³Ỹ /, θ,σ n 1 e = log σe 1 σe S (φ, β, α). () The log-likelihood of T is obtained from Harville (1974) and up to an additive constant is given by l T,α,σv ³ n = log σv 1 log Σ 1 log 1 Σ σv ( 1ˆτ) Σ 1 ( 1ˆτ), (3) where ˆτ = 1 Σ Σ 1 and Σ = σ v Var(). From () and (3), we obtain the loglikelihood of (T 1 Y, T ) up to an additive constant to be L µ T 1 Y, T, α,β,φ,σv,σ e n 1 = log σe 1 σe S (φ, β, α) (4) ³ n log σv 1 log Σ 1 log 1 Σ σv Q (α). The final form, as stated in (9) is obtained by algebraic simplification, using the fact that Σ 1 = B B, where B is the (n +1) (n +1)matrix given by 1 α α 1 B = α α 1 and noting that ˆτ = (1 α) P n i= i n 1. (n 1) (1 α)+ To obtain the REML estimates of α, β, φ, σe,σ v, it helps to consider the re-parametrised set of parameters α, γ, φ, σ e,σ v, where γ = β φα. It is then immediately obvious from inspecting (4) that the REML estimates of α, σv are obtained by simply maximising just l T,α,σv ³ n = log σv 1 log Σ 1 log 1 Σ σv ( 1ˆτ) Σ 1 ( 1ˆτ). In other words, the REML estimates of α, σv are just those estimates that would have been obtained by maximising the REML likelihood function of ( 1,..., n ). It thus follows immediately from Cheang and Reinsel () that the bias of ˆα is E (ˆα α) α + o (n 1). n 1 ()

23 The REML estimates of (φ, γ) are obtained as the least squares estimates ³ ˆφ, ˆγ = c c 1 c Y c, which then yields the REML estimate ˆβ =ˆγ + ˆφˆα. Since ˆφ = 1 ˆφ ˆβ ˆα 1 ˆγ, it follows that the REML estimates of the original parameters (φ, β) can also be obtained in a direct regression of Y c on ˆα( 1 ), 1. Thus, ˆβ is identical to the ARM estimate considered by Amihud and Hurvich (4) using the REML estimate ˆα as aproxyforα. Hence, the bias of ˆβ can be obtained from Theorem of Amihud and Hurvich (4) and is ³ E ˆβ β = φe (ˆα α) = αφ n 1 + o n 1. ³ Finally, Lemma 1 of Amihud and Hurvich (4) implies that E ˆφ = φ. ProofofTheorem: Since the LRT is invariant to re-parametrisation we choose to work with the re-parametrisation λ =(β,α,φ,τ v,τ e ) (β,λ ), where τ v = σ v and τ e = σ e since this greatly reduces the burden of our computations. For the REML log-likelihood given in (9), we will denote expectations of the log-likelihood derivatives as κ rs = n 1 E L/ λ r λ s, κrst = n 1 E 3 L/ λ r λ s λ t, κ (t) rs = κ rs / λ t. Letting δ = τ e /τ v, it is easily seen that the information matrix K =(( κ rs )) is given by K = K β,α + O n 1, (6) K φ,τv,τ e where K β,α = δ 1 φ 1 α φ φ + δ 1, 3

24 and ½ K φ,τv,τ e = diag δ, Let κ rs denote the entries of K 1 and κ rs denote the entries of K 1, where K is the lower right 4 4 sub-matrix of K. From the results of Hayakawa (1977, 1987), Cordeiro (1987), Barndorff-Nielsen and Hall (1988) and Chesher and Smith (199), we have 1 τ v, 1 τ e ¾. P (R T x) =P χ 1 x + An 1 P χ 3 x P χ 1 x + O n, (7) where A = 1 (l rstu l rstuvw ) ³ lrstu l rstuvw, (8) λ µ 1 l rstu = κ rs κ tu 4 κ rstu κ (u) rst + κ(su) rt λ, lrstu = κ rs κ µ 1 tu 4 κ rstu κ (u) rst + κ(su) rt, µ 1 l rstuvw = κ rs κ tu κ vw 6 κ rtvκ suw κ rtuκ svw κ rtv κ (u) sw κ rtu κ (v) sw + κ (v) rt κ(u) sw + κ (u) rt κ(v) sw and lrstuvw = κ rs κ tu κ µ 1 vw 6 κ rtvκ suw κ rtuκ svw κ rtv κ sw (u) κ rtu κ sw (v) + κ (v) rt κ(u) sw + κ (u) rt κ(v) sw. Exploiting the near diagonal (up to O n 1 ) structure of K, we can simplify the term P λ (l rstu l rstuvw ) in A as (l rstu l rstuvw )= (l rstu l rstuvw )+ λ + (β,α) ((β,α),(φ,τ v,τ e )) (φ,τ v,τ e ) (l rstu l rstuvw ) (l rstu l rstuvw )+O n 1, (9) where P ((β,α),(φ,τ v,τ e)) denotes that at least one index in the summand must come from (β,α) and at least one index from (φ, τ v,τ e ). By the same logic, we can simplify the term P λ ³ lrstu rstuvw l 4

25 in A as ³ lrstu rstuvw l ³ lrstu l rstuvw + ³ lrstu l rstuvw λ = α + (α,(φ,τ v,τ e)) = α + (φ,τ v,τ e) ³ lrstu l rstuvw + O n 1 ³ lrstu l rstuvw + (α,(φ,τ v,τ e)) (φ,τ v,τ e) (l rstu l rstuvw ) ³ lrstu l rstuvw + O n 1, (3) where the last step in (3) follows from the fact that the entries of K for (φ, τ v,τ e ) are diagonal up to O n 1. It thus follows from (8), (9) and (3) that A = 1 (l rstu l rstuvw ) ³ lrstu rstuvw l (β,α) α + 1 (l rstu l rstuvw ) ³ lrstu rstuvw l + O n 1 ((β,α),(φ,τ v,τ e)) (α,(φ,τ v,τ e)) A (β,α) + C λ + O n 1, (31) It is obvious from the structure of A (β,α) in (31) that A (β,α) would be the leading remainder term in the expansion of the distribution of R T of the form in (7) if (φ, τ v,τ e ) were known and α were the only nuisance parameter. The term C λ, thus, is a measure of the extent to which lack of knowledge of the parameters (φ, τ v,τ e ) affects the finite sample distribution of R T. We now compute the two terms A (β,α) and C λ, beginning with A (β,α). Though (β,α) are not orthogonal, the computation of P (β,α) (l rstu l rstuvw ) can be simplified by working with the transformed parameters (γ,α), where γ = β φα, since γ is orthogonal to α (Note that as stated above, when computing A (β,α) the remaining parameters (φ, τ v,τ e ) are fixed). Since (γ,α) is an affine transformation of (β,α), wecanexploitthefactthatthe first term of A (β,α) is invariant under such transformations (see page 371 of Hayakawa 1977) and thus get 1 (l rstu l rstuvw )= 1 (l rstu,p l rstuvw,p ), (3) (β,α) (γ,α)

26 where the extra subscript p will mean that the computation is being carried out for the reparameterised form of (9) with γ = β φα. The right hand side of (3) is much simpler to compute due to the fact that κ γα,p = κ (γ) γγ,p = κ (γγ) γγ,p =and all the terms κ rst,p and κ rstu,p are at most O n 1, for all permutations of the subscripts. Since κ γγ,p = δ 1 α 1 and κ αα,p = 1 α 1, it follows that 1 (γ,α) l rstu,p = 1 καα p κ αα p ³ κ (αα) αα,p + O n 1 = 1+3α 1 α 1 + O n 1 (33) and 1 (γ,α) l rstuvw,p = 1 καα p κ αα p κ αα p ³ κ (α) αα + O n 1 = 4α 1 α 1 + O n 1. (34) From (3), (33) and (34), we get 1 (l rstu l rstuvw )= 1+O n 1. (3) (β,α) The second term in A (β,α) is not invariant under affine transformations and we revert to the original log-likelihood (9) to compute it. Noting that κ αα = φ δ +1 1 α 1 and κ αα = φ δ α, we have 1 ³ lrstu rstuvw l α From (3) and (36) we conclude that = 1 ³ καα κ αα κ αα (αα) 1 καα κ αα κ ³ αα κ (α) αα + O n 1 = 1 φ δ +1 + O n 1. (36) φ δ A (β,α) = φ δ +1 + O n 1 = ρ + O n 1. (37) We now turn our attention to computing the second term, C λ, in (31). We first note from the near diagonal structure of K that 1 ((β,α),(φ,τ v,τ e )) (l rstu l rstuvw )= (β,(φ,τ v,τ e )) (β,(φ,τ v,τ e )) (β,α,(φ,τ v,τ e )) l rrtt + 1 l rrttvv 1 l rstuvw. (α,(φ,τ v,τ e )) (a,(φ,τ v,τ e )) l rrtt (38) l rrttvv 6

27 Each of the terms in (38) are now computed. The details are not provided here, both to save space and also because the computation does not afford any special insight into the problem. The detailed calculations, however, are available from the authors. The terms in ³ lrstu rstuvw l can be decomposed in a manner similar to that in (38), except P (α,(φ,τ v,τ e)) that in this case there are no terms in β. When all these terms are put together, one gets C λ = O n 1. (39) The theorem now follows from (7), (31), (37) and (39). ProofofTheorem3: The expansion of the distribution of the LRT and the Wald test under local Pitman alternatives is given in Hayakawa (197), while that of the distribution of the Rao score test is given by Harris and Peers (198). These results are consolidated using simpler notation in Cordeiro, Botter and Ferrari (1994) and we follow the notation used in their work. To obtain the results of Theorem 3, we calculate the quantities in equations (1) - (b) on page 711 of Cordeiro et al. (1994). Letting λ =(β,α,φ,τ v,τ e ), where τ v = σ v and τ e = σe, we note that the quantities κ ij = n 1 E L/ λ i λ j have been already obtained in (6). To obtain the quantities κ i,jk = n 1 E ( L/ λ i ) L/ λ j λ k and κi,j,k = n 1 E [( L/ λ i )( L/ λ j )( L/ λ k )], we exploit the Bartlett identities (Bartlett, 193) to obtain κ i,j,k =κ ijk κ jk λ i κ ik λ j κ ij λ k and κ i,jk = κ ijk + κ jk. λ i The quantities κ jk / λ i are then calculated using (6). Using the same notation (note, however, that we define all our cumulants κ to be O (1) whereas Cordeiro et al. (1994) define them to be O (n)) as in equations (3a) - (b) of Cordeiro et al. (1994), we get b 1 =.n 1/ C 1 +O n 1,b 11 = n 1/ C 1 +O n 1,b 1 =.n 1/ C 1 +O n 1,b 13 =, b =.n 1/ C 1 +O n 1,b 1 = n 1/ C 1 +O n 1,b =.n 1/ C 1 +O n 1,b 3 = O n 1 7

28 and b 3 =.n 1/ C 1 + O n 1,b 31 = n 1/ (C 1 + C )+O n 1 b 3 = n 1/ C 1 + O n 1,b 33 =.n 1/ C 1 + O n 1. The result now follows from equation of Cordeiro et al. (1994). References Amihud, Y. and Hurvich, C. (4): "Predictive Regressions: A Reduced-Bias Estimation Method," Journal of Financial and Quantitative Analysis, 39, Barndorff-Nielsen, O.E. and Hall, P. (1988): "On the Level-Error after Bartlett Adjustment of the Likelihood Ratio Statistic," Biometrika, 7, Bartlett, M. (1937): "Properties of Sufficiency and Statistical Tests," Proc.Roy.Soc.Ser.A, 16, Bartlett, M. (193): "Approximate Confidence Intervals II. More than One Unknown Parameter," Biometrika, 4, Cordeiro, G. (1987): "On the Corrections to the Likelihood Ratio Statistics," Biometrika, 74, Cordeiro, G., Botter, D. and Ferrari, S. L. (1994): "Nonnull Asymptotic Distributions of Three Classic Criteria in Generalised Linear Models," Biometrika, 81, Cheang, W.K. and Reinsel, G. (): "Bias Reduction of Autoregressive Estimates in Time Series Regression Model through Restricted Maximum Likelihood," Journal of the American Statistical Association, 9, Chesher, A. and Smith, R. (199): "Bartlett Corrections to Likelihood Ratio Tests," Biometrika, 8, Fisher, R. (1973): "Statistical Methods and Scientific Inference." Haffner, New York. 8

29 Francke, M, and de Vos, A. (6): "Marginal Likelihood and Unit Roots," Forthcoming in Journal of Econometrics. Efron, B. (197): "Defining the curvature of a statistical problem (with applications to second order efficiency)," Annals of Statistics, 3, van Garderen, K. (1999): "Exact Geometry of Autoregressive Models," Journal of Time Series Analysis,, 1-1. van Giersbergen, N (6): "Bartlett Correction in the Stable AR(1) Model with Intercept and Trend," Working paper, Universiteit van Amsterdam. Harris, P. and Peers, H. W. (198): "The Local Power of the Efficient Scores Test Statistic," Biometrika, 67, -9. Harville, D. (1974): "Bayesian Inference of Variance Components Using Only Error Contrasts," Biometrika, 61, Harville, D. (1977): "Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems," Journal of the American Statistical Association, 7, Hayakawa, T. (197): "The Likelihood Ratio Criterion for a Composite Hypothesis under a Local Alternative," Biometrika, 6, Hayakawa, T. (1977): "The Likelihood Ratio Criterion and the Asymptotic Expansion of its Distribution," Ann. Inst. Statist. Math., 9, Hayakawa, T. (1987): Correction, Ann. Inst. Statist. Math., 39, 681. Jansson, M. and Moreira, M. (6): "Optimal Inference in Regression Models with Nearly Integrated Regressors," Econometrica, 74, Kalbfleisch, J. and Sprott, D. (197): "Application of Likelihood Methods to Models Involving Large Numbers of Parameters," Journal of the Royal Statistical Society B, 3,

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series Willa W. Chen Rohit S. Deo July 6, 009 Abstract. The restricted likelihood ratio test, RLRT, for the autoregressive coefficient