Bias Reduction and Likelihood Based Almost-Exactly Sized Hypothesis Testing in Predictive Regressions using the Restricted Likelihood

Size: px
Start display at page:

Download "Bias Reduction and Likelihood Based Almost-Exactly Sized Hypothesis Testing in Predictive Regressions using the Restricted Likelihood"

Transcription

1 Bias Reduction and Likelihood Based Almost-Exactly Sized Hypothesis Testing in Predictive Regressions using the Restricted Likelihood Willa W. Chen Rohit S. Deo April 1, 7 Abstract: The restricted likelihood, which has small curvature, is derived for the bivariate predictive regression problem. The bias of the Restricted Maximum Likelihood (REML) estimates is shown to be approximately % less than that of the OLS estimates near the unit root, without loss of efficiency. The error in the chi-square approximation to the distribution of the REML based Likelihood Ratio Test (RLRT) for no predictability is shown to be 3/4 ρ n 1 (G 3 ( ) G 1 ( )) + O n, where ρ < 1 is the correlation of the innovation series and G s ( ) is the c.d.f. of a χ s random variable. This very small error, free of the AR parameter, implies that the RLRT for predictability has very good size properties even when the regressor is nearly integrated, a fact borne out by our simulation study. The Bartlett corrected RLRT achieves an O n error. The power of the RLRT under a sequence of local Pitman alternatives is obtained and shown to be always identical to that of the Wald test and higher than that of the Rao score test for empirically relevant regions. Some extensions to the case of vector AR(1) regressors and more general univariate regressors are provided. The RLRT is found to work very well in simulations and to be robust to non-normal errors. Keywords: Bartlett correction, likelihood ratio test, curvature Department of Statistics, Texas A&M University, College Station, Texas 77843, USA New York University, 44 W. 4 th Street, New York NY 11, USA; rdeo@stern.nyu.edu 1

2 1 Introduction A question of interest in financial econometrics is whether future values of one series {Y t } can be predicted from lagged values of another series { t }. The hypothesis of no predictability is commonly tested under the assumption that the two series {Y t } n t=1 and { t} n t= following model: obey the Y t = c + β t 1 + u t, (1) t = µ + α t 1 + v t, () where α < 1, u t = φv t + e t, (e t,v t ) N, diag σe,σ v are an i.i.d. series and ³ N µ (1 α) 1,σv 1 α 1. Interest generally centers on the case where the regressor series { t } possesses a strong degree of autocorrelation with the autoregressive parameter α lying close to the unit root. It is well known (Stambaugh, 1999) that the standard ordinary least squares (OLS) estimate of β is biased when the errors {u t,v t } are contemporaneously correlated, with the amount of bias increasing as α gets closer to unity. This bias results in the corresponding t-statistic being biased with poor size properties. However, Stambaugh (1999) provided a simple estimable expression for the bias in the OLS estimate of β that allows the researcher to compute a bias-corrected OLS estimate as well as a t-statistic based on the it. See, for example, Amihud and Hurvich (4). There is currently, however, no known theoretical justification that such a t-statistic based on the bias-corrected OLS estimate will have improved size properties relative to the test based on the uncorrected OLS estimate. Indeed, Sprott and Viveros-Aguilera (1984) and Sprott (199) point out that if inference is the goal of the researcher, computing pivotal t statistics from bias corrected point estimates is not necessarily guaranteed to improve finite sample performance, even providing an example to make this point. Instead, following Fisher (1973), Sprott argued in a series of papers (1973, 197, 198, 199) that from an inferential point of view, issues such as the bias of point estimates may be irrelevant in small samples, and he stressed the importance of examining the likelihood. In brief, one of Sprott s (197, 198) key ideas can be summarised roughly as follows: (i) There may exist some 1 1 transformation of the parameters such that the likelihood ratio

3 for the transformed parameters is approximately quadratic in the pivotal statistic based on that transformed parameter. (ii) The conditions that allow for this quadratic approximation to be adequate also improve the normal approximation to the distribution of the pivotal statistic. It follows from these two observations that since the likelihood is invariant under 1 1 reparametrisation, the likelihood ratio will be well approximated by a chi-square variable in finite samples as long as some parametrisation satisfying (i) exists, even if one does not know what that parametrisation is. Sprott (1973, 197, 198) showed that such a parametrisation would exist if the curvature of the log-likelihood 1, as measured by a function of its higher order derivatives, was small. The use of such a likelihood would then result in a well behaved likelihood ratio test (LRT) in finite samples. See also Efron (197) and McCullagh and Cox (1986) for a geometrical approach to curvature and likelihood ratio based hypothesis testing. A curvature related approach to tackling the predictive regression problem for the model in (1) and () was taken by Jansson and Moreira (6), assuming that the error covariance parameters (φ, σ e,σ v ) are known, that the series { t } has zero intercept (i.e. µ =)and that the initial value is known to be. Under these assumptions, Jansson and Moreira (6) noted that though the joint likelihood of (y t y t 1,x t ) was curved exponential, they could remove the curvature and obtain a linear exponential family by conditioning on a statistic that is specific ancillary for β. The linear exponential family obtained by conditioning enabled them to obtain tests with certain finite sample optimality properties. All the finite sample properties that Jansson and Moreira (6) obtain, stated in their Theorems and and Lemmas 1, 3 and 4, are under the assumptions stated above, viz. that (φ, σ e,σ v ) are known, that the series { t } has zero intercept (i.e. µ =)and that the initial value is known to be. Such assumptions would normally not be satisfied in practice. Also, we will see below in Section that the assumption µ =removes a major source of the problem since the likelihood already has smaller curvature when µ =. In their Theorem 6, Jansson and Moreira (6) allow for an unknown µ taking 1 There are several formal measures of curvature based on higher order derivatives of the likelihood in the literature. See, for example, Kass and Slate (1994). We will not define any such measure explicitly since that is not the focus of our work here. 3

4 potentially non-zero values. However, in this Theorem 6 they do not provide any finite sample results on the size or power properties of their test statistic but pointwise asymptotic results along a sequence of local-to-unity parameter values. This pointwise asymptotic result in their Theorem 6, which allows for arbitrary unknown µ, is obtained under the assumption that the error covariance parameters (φ, σ e,σ v ) are known. The approach we take in this paper is to work with an unconditional likelihood, called the restricted likelihood, that already has very small curvature without making any assumptions about knowledge of any nuisance parameters. As a result, we are able to obtain theoretical results that demonstrate that the LRT based on the restricted likelihood (RLRT) has good finite sample performance in the predictive regression problem. Interestingly, we will see that the restricted likelihood also has an interpretation as a conditional likelihood, though the conditioning in this case is done with respect to a statistic that is complete sufficient for (c, µ). In Section, we provide motivation for considering the restricted likelihood. We then obtain the restricted likelihood for the bivariate predictive regression modelandprovideresultsonthebiasofthe REML estimates. In Section 3, we state our result on the finite sample behaviour of the RLRT for β and compare its power under a sequence of Pitman alternatives to that of the restricted likelihood based Wald and Rao score test. Section 4 provides results on extensions of the REML method to higher order AR processes for the regressor series, as well as multivariate AR(1) regressors. The finite sample performance of the REML estimates and the RLRT is studied through simulations in Section. All proofs are relegated to the Appendix at the end of the paper. Restricted Maximum Likelihood Estimation To understand why the restricted likelihood may yield well behaved LRT s in the predictive regression context, it is instructive to consider LRT s for α in the univariate AR(1) model given in (). If LRT α,µ denotes the LRT for testing H : α = α versus H : α 6= α in that model, 4

5 then Theorem of van Giersbergen (6) in conjunction with the results of Hayakawa (1977, 1987), Cordeiro (1987), Barndorff-Nielsen and Hall (1988) and Chesher and Smith (199) yields P (LRT α,µ x) G 1 (x) =. (1 + 7α )(1 α ) 1 n 1 (G 3 (x) G 1 (x)) + O n, (3) where G s (x) is the c.d.f. of a χ s random variable. On the other hand, if LRT α denotes the LRT for testing H : α = α versus H : α 6= α in the univariate AR(1) model given in () with µ known to be, then it follows from Theorem 1 of van Giersbergen (6) that P (LRT α x) G 1 (x) =.n 1 (G 3 (x) G 1 (x)) + O n, (4) It is obvious from (3) that LRT α,µ is very unstable when the autoregressive parameter α is close to the unit root with the leading error term in the Edgeworth expansion going to infinity. In stark contrast, we see from (4) that LRT α is very well behaved with the leading term in its Edgeworth expansion being both very small and free of α. Simulation results in Figure 1 of van Giersbergen (6) confirm the accuracy of the standard chi-square approximation for LRT α, even when α is close to unity. Not surprisingly, van Garderen (1999) has found that for the univariate AR(1) model in () with µ known to be, the Efron (197) curvature (one of the standard measures of curvature of the likelihood) is very small 3, being of the order O n and converging to zero as α 1 for every fixed n. These results indicate that the culprit in the finite sample failure of the LRT in the univariate AR(1) model is the unknown intercept µ. Since the bivariate prediction model in (1) and () is a vector AR(1) (with the first column of the coefficient matrix restricted to zero), one is led to suspect that the LRT for β may perhaps be well behaved if the intercept vector (c, µ) were van Giersbergen (6) provides the expected value of the LRT, while the combined results from the remaining references show that the leading term in the Edgeworth expansion (3) is half that expected value. 3 Interestingly and somewhat surprisingly, van Garderen (1999) found that if the innovation variance σv is knowntheefroncurvatureofthemodelisn 1 + O n, which, though still small, is larger than when σv is unknown and he provided a geometrical explanation for this phenomenon. Correspondingly, results in van Giersbergen (6) imply that the coefficient of the leading term in (4) increases from.n 1 to 1n 1 when σv is known, indicating that the LRT is better approximated by the limiting chi-square distribution when the innovation variance is unknown than when it is known, though, of course, both approximations are very good by themselves.

6 known. Since the assumption that (c, µ) is known is extremely unrealistic, we are prompted to seek a likelihood that does not involve the location parameters and yet possesses small curvature properties similar to those of the model with known location parameters. The restricted likelihood turns out to be the one that has such properties and we turn next to defining it and stating some of its properties. The idea of restricted likelihood was originally proposed by Kalbfleisch and Sprott (197) precisely as a means of eliminating the effect of nuisance location (more generally, regression coefficient) parameters when estimating the parameters of the model covariance structure. In general, the restricted likelihood is defined as the likelihood of linearly transformed data where the linear transformation is chosen to be orthogonal to the regression design matrix (and orthogonal to the vector of ones in the case of a location parameter). Harville (1974) provided a simple formula for computing the restricted likelihood and pointed out that the restricted likelihood would change only by a multiplicative constant for different choices of the linear transformation. Harville (1977) showed that REML estimates of the parameters of the covariance structure do not suffer any loss of efficiency due to the linear transformation of the data. Harville (1974) also provided a Bayesian interpretation of the restricted likelihood, while Smyth and Verbyla (1996) showed that the restricted likelihood is also the exact conditional likelihood of the original data given the complete sufficient statistic for the regression coefficient parameters. Though the restricted likelihood has been studied primarily in the context of variance component models, there has also been some work on it in the context of time series models. See, for example, Tunnicliffe Wilson (1989) and Rahman and King (1997), among others. Francke and de Vos (6) studied unit root tests for AR(1) models based on the restricted likelihood, while Chen and Deo (6 a, b) showed that confidence intervals for the sum of the autoregressive coefficients of univariate AR(p) processes based on the restricted likelihood have good coverage properties, even when the series is close to a unit root process. The restricted maximum likelihood (REML) estimates also are less biased than regular ML estimates in nearly integrated univariate AR models with intercept (Cheang and Reinsel, ) and with trend (Kang, Shin and Lee, 3). When =(,..., n ) follows the univariate AR(1) model in (), it follows from Harville 6

7 (1974), that the restricted log-likelihood is given by L R σ v,α, ³ n = log σv + 1 µ log 1+α 1 (n 1) (1 α)+ σv Q (α), () where Q (α) = Σ 1 Σ Σ 1 1 (6) and Σ Var(), whereas the regular likelihood of for model () with known µ (set, w.l.o.g., to zero) is L σv,α, µ n +1 = log σv + 1 log 1 α 1 σv Σ 1. (7) On comparing () and (7) and noting that the second term in Q (α) is O (1) whereas Σ 1 is O (n), it immediately becomes apparent that the restricted likelihood in this case differs on a relative scale by only an order O n 1 from the likelihood of the AR(1) process with known intercept µ. Due to this, it can be shown that the curvature properties of the restricted likelihood for the AR(1) as well as the Edgeworth expansion of the LRT are the same as those of the AR(1) model with known intercept, up to order O n 1. As a matter of fact, the restricted likelihood for the AR(1) also provides REML estimates of α whose bias is αn 1 + O n and thus is identical, up to order O n 1, to the bias of the maximum likelihood estimate when the intercept is known. See Marriott and Pope (194) and Cheang and Reinsel (). Since the bias of the maximum likelihood estimate of α when the intercept is not known is (1 + 3α) n 1 + O n, the REML estimate is able to achieve a bias reduction of approximately % when the autoregressive parameter is close to the unit root. All of these results show that the restricted likelihood provides a great advantage for both hypothesis testing and estimation for the univariate AR(1) model through the elimination of the nuisance intercept parameter. We now show that the restricted likelihood continues to do so in the case of the bivariate predictive model. We first define some quantities that will be useful for us in stating our results. For the observed data (Y, ) =(Y 1,...,Y n,,..., n ) define 1 =( 1,..., n ) and =(, 1,..., n 1 ). Define the sample means Ȳ =n 1 1 Y, =(n +1) 1 1, 1 =n and =n 1 1 and the sample mean corrected data Y c = Y 1Ȳ, c = 1 1 1, 1.Define S (φ, β, α) =(Y c φ c,1 (β φα) c, ) (Y c φ c,1 (β φα) c, ) (8) 7

8 and note that for computational purposes Q (α) givenin(6)canbewrittenas Q (α) = n t n 1 + α t n 1 α t t+1 t= t=1 (1 α) α + n (n 1) (1 α)+. We now can state the following theorem. t= Theorem 1 For the model given by (1) and (), the REML log-likelihood up to an additive constant is given by L µ β,α,φ,σv,σ e n 1 = log σe 1 σe S (φ, β, α) (9) ³ n log σv + 1 µ log 1+α 1 (n 1) (1 α)+ σv Q (α). The REML estimates ˆψ ³ = ˆβ, ˆα, ˆφ, ˆσ v, ˆσ e are given by ½ µ ¾ n log Q (α) log, and The bias in ˆα =argmin α ³ ˆα, ˆβ is given by 1+α (n 1) (1 α)+ ³ˆφ, ˆβ = 1 1 c c c Y c, ˆα 1 ³ S ˆφ, ˆβ, ˆα ˆσ e = n 1 ˆσ v = Q (ˆα) n. E (ˆα α) = α n 1 + o n 1 and and ³ E ˆβ β = φe (ˆα α) µ α n 1 = φ ³ E ˆφ φ =. + o n 1, 8

9 Remark 1 It is obvious that obtaining the REML estimates is computationally easy since all the estimates are obtained in succession after the optimisation of a one-dimensional function which is almost quadratic. Remark Note that the restricted likelihood in (9) is well defined at the unit root α =1, without having to assume that the initial value is fixed when α < 1. It is interesting to compare the bias in ˆβ with the bias in the OLS estimate ˆβ OLS, given by (see Stambaugh 1999) ³ µ 1+3α E ˆβOLS β = φe (ˆα OLS α) = φ + o n 1, n 1 where ˆα OLS is the OLS estimate of α. Thus, the bias in ˆβ depends upon the bias in ˆα in a manner identical to the way the bias in ˆβ OLS depends on the bias in ˆα OLS. Consequently, the approximately % reduction in bias that ˆα achieves compared to ˆα OLS closetotheunitrootis inherited by ˆβ, relative to ˆβ OLS. The bias expression in Theorem 1 also suggests a bias corrected version of the REML estimate of β that may be computed as µ ˆβ c = ˆβ + ˆφ ˆαc, n 1 where ˆα c =ˆα µ n +1 n 1 is the bias corrected REML estimate of α. Since the bias correction term φα (n 1) 1 is smaller than φ (1 + 3α)(n 1) 1, one would expect the bias corrected REML estimate ˆβ c to have both less bias and a smaller variance than its bias corrected OLS counterpart, µ ˆβ OLS,c = ˆβ OLS + ˆφ 1+3ˆαc, n 1 particularly since the parameter φ can take any value in (, ). (Amihud and Hurvich (4) find empirical estimates of φ that are as large as 94). This is indeed the case, as we see in the simulations reported in Section below. The restricted likelihood given in (9) is derived for the situation where we assume that ³ the initial value comes from the stationary distribution N µ (1 α) 1,σv 1 α 1. A 9

10 similar argument can be used to obtain the restricted likelihood for the model in (1) and () where the regressor series follows an asymptotically stationary process, given by t = µ + e t, where e t = α e t 1 + v t for t 1 and e = v 1. Under this assumption, the restricted likelihood is given by where el µ β,α,φ,σv,σ e n 1 = log σe 1 σe S (φ, β, α) (1) ³ n log σv + 1 µ log 1 n (1 α) 1 eq +1 σv (α), eq (α) =( ˆµ (α)) + n ( t ˆµ (α) α ( t 1 ˆµ (α))) t=1 and ˆµ (α) = +(1 α) P n t=1 t 1+n (1 α). The bias results of Theorem 1 continue to hold for the REML estimates under this initial value condition. In the next section, we provide a theorem that shows that the REML based LRT has very good finite sample properties, in that its finite sample distribution approaches the limiting one very quickly and is practically unaffected by nuisance parameters, while also maintaining power against local alternatives when compared to the Wald and score test. 3 REML Likelihood Ratio Test One standard method for testing the composite hypothesis H : β =vs. H a : β 6= is the likelihood ratio test (LRT) which compares the log-likelihood evaluated at the unrestricted estimates of the parameters to the log-likelihood evaluated at the parameter estimates obtained under the restriction that the null hypothesis H : β =is true. Using the quantities defined in (8) and just above it, it can be easily verified that under H : β =the restricted estimates ³ ˆψ =, ˆα, ˆφ, ˆσ v, ˆσ e are obtained as µ n log Q (α) log +(n 1) log R (α), ˆα =argmin α 1+α (n 1) (1 α)+ 1

11 where R (α) =Y c ³ I Z c (α) Z c (α) Z c (α) 1 Z c (α) Y c, Z c (α) = α 1, ˆφ = Z c (ˆα ) Z c (ˆα ) 1 Z c (ˆα ) Y c, ˆσ v, = Q (ˆα ), ˆσ e, = 1 n n 1 R (ˆα ). Just as in the unrestricted case, it is obvious that obtaining the restricted estimates requires only the optimisation of a nearly quadratic one dimensional function. The REML based likelihood ratio test (RLRT) for testing H : β =vs. H a : β 6= is now given by ³ ³ R T = L ˆψ +L ˆψ, (11) where L ( ) is the REML log-likelihood presented in (9). Under H : β =, the asymptotic distribution of R T is χ 1, the chi-square distribution with one degree of freedom. In finite samples, however, the distribution of a LRT may deviate from the limiting χ distribution, particularly in the presence of nuisance parameters. The first step towards attempting to improve this approximation of the finite sample distribution by the limiting one was taken by Bartlett (1937). Bartlett argued that if a LRT is asymptotically distributed as χ p and E (LRT )=p(1 + cn 1 )+ O n for some constant c, then 1+cn 1 1 LRT would be better approximated by the limiting χ p distribution. The quantity 1+cn 1, which may depend on the underlying parameter values, has since become known as the Bartlett correction to the likelihood ratio test. There is a large literature that has developed subsequently that deals with such corrections. This literature includes Lawley (196), who derived a general formula for the Bartlett correction, Hayakawa (1977, 1987), who provided an asymptotic expansion of the distribution of the LRT and Barndorff-Nielsen and Hall (1988), who showed that for regular models the Bartlett corrected LRT is approximated by the limiting χ distributionwithanerrorofo n compared to an error of O n 1 for the uncorrected test statistic. For the predictive regression model in this paper, we are able to obtain the following theorem on the finite sample behaviour of the RLRT. Theorem Under H : β =in the model given by (1) and (), we have P (R T x) =P χ 1 x + 3/4 ρ n 1 P χ 3 x P χ 1 x + O n, (1) 11

12 where ρ = Corr(u t,v t ). Theorem in conjunction with the result of Barndorff-Nielsen and Hall (1988) yields the following corollary. Corollary 1 If the Bartlett corrected RLRT is defined as R TB = 1+ 3/4 ˆρ n 1 1 RT, where ˆρ = ³ˆφˆσ v +ˆσ e 1 ˆφˆσ v, then P (R TB x) =P χ 1 x + O n. Remark 3 The results of Theorem and Corollary 1 continue to hold for the restricted likelihood given in (1) when the initial condition is not from the stationary distribution. The results of Theorem obviously imply that the χ 1 approximation to R T is very good and almost unaffected by the nuisance parameters. Most importantly, the leading term in the error is free of the AR parameter, which most affects the finite sample performance of tests particularly when it is close to unity. Theorem also suggests two very simple ways in which one could adjust the p value when carrying out a test of H : β =. Onewouldbetousethefirst two terms on the right hand side of (1), with ρ replaced by ˆρ. The other is to use the Bartlett corrected statistic R TB. However, (1) suggests that the original test R T used in conjunction with the standard χ 1 distribution should be very well behaved and any improvements will be minimal at best. This belief is supported by the simulations that we provide in Section. It is worth noting that though the REML likelihood does not provide an unbiased estimate of β (indeed, the bias of ˆβ can be arbitrarily large due to the fact that φ is unbounded, as noted below Theorem 1), the REML likelihood yields a very well behaved test for β, irrespective of how large φ is. This result serves to illustrate the point that it may be more desirable at times 1

13 to carry out tests of hypothesis using appropriate likelihoods rather than using parameter point estimates and supports the idea that bias can be irrelevant in inference, as described in the Introduction. It is interesting to compare the quality of the chi-square approximation in Theorem, where all four parameters α, φ, σv,σ e are nuisance parameters, with the approximation in the case where the parameters φ, σv,σ e are known and α is the only nuisance parameter. From equation (31) and the discussion below it in the proof of Theorem, it can be seen that if φ, σv,σ e are known the relevant RLRT for testing H : β =, denoted by R T,1, would satisfy P (R T,1 x) =P χ 1 x ρ n 1 P χ 3 x P χ 1 x + O n. By comparing this result with that in Theorem, one sees that the quantity 3/4 is a measure of the extent to which lack of knowledge of the innovation parameters (φ, τ v,τ e ) affects the finite sample distribution of R T. Furthermore, since sup ρ > sup ρ 3/4 for ρ ( 1, 1), the chi-square approximation to the RLRT is better in a "minimax" sense for the case when the innovation covariance parameters are unknown than when they are known. This finding is very much in keeping with the results of van Giersbergen (6) for the LRT in the univariate AR(1) model that we described in the footnote at the beginning of section. The result in Theorem guarantees that the RLRT will yield a test that is almost of exact size in finite samples. However, one would also like to ensure that this is not achieved at the expense of loss of power. The obvious tests that are competitors to the RLRT are the Wald and Rao score test based on the restricted likelihood. It is a well known fact that just as these three tests share the identical limiting distribution under the null hypothesis, they also have the same power properties to first order. Hence, in order to distinguish between them one has to consider a sequence of local Pitman alternatives given by H a : β = β + ξn 1/. The next Theorem obtains the power function of the RLRT, Wald and Rao score test against such local alternatives. Theorem 3 Let RLRT, W and RS denote the LRT, Wald test and Rao score test respectively of H : β = β based on the restricted likelihood in (9) of the model in (1) and (). Assume that 13

14 the true value of β is given by β = β + ξn 1/. Define = 1 σv 1 α φ σv + σe ξ + O n 1, αφσ 4 C 1 = vξ 3 (1 α ) (φ σv + σe) and 3αφσ C = vξ (1 α )(φ σv + σe). Let Ḡs, (x) denote the survival function of a non-central χ random variable with s degrees of freedom and non-centrality parameter. Then, and P (RLRT > x) =Ḡ1, (x)+ C 1 n 1/ Ḡ3, (x).ḡ1, (x).ḡ, (x) + O n 1, P (W >x)=p (RLRT > x)+o n 1, P (RS > x) =P (RLRT > x)+ C 1. n 1/ Ḡ, (x).ḡ7, (x) (13) + C. n 1/ Ḡ 1, (x).ḡ, (x) + O n 1. From Theorem 3 we see that the RLRT and the Wald test based on the restricted likelihood have identical power up to second order (i.e. up to O n 1 ) against local Pitman alternatives. Since Ḡs, (x) Ḡl, (x) < for all x> when l>s,it follows from (13) that RLRT will be guaranteed to be more powerful than the Rao score test against local alternatives if C 1 > and C >. This will be the case if φ<and ξ>, which is exactly the part of the parameter space which is of relevance in empirical applications in finance and economics. It is also interesting to note that the non-centrality parameter, which will be the main source of power, increases as α gets closer to the unit root. The non-centrality parameter also depends inversely on φ, but does not depend on the correlation between the two innovation series (u t,v t ). In the next Section we derive the REML likelihood under more general models for the regressor series as well as for multiple regressors and discuss some efficiency and computational issues. 14

15 4 REML for more general regressor models It is easy to generalise the REML likelihood in two directions that are both of practical interest. One generalisation is to the case where the predictor series is a multivariate AR(1) process. Applications of such models can be found, for example, in Amihud and Hurvich (4), who considered dividend yield and earnings to price ratio as bivariate predictors of market returns. The other generalisation is to the case where the univariate predictor follows a higher order AR process. We will state the REML log-likelihood for both these cases, starting with the multivariate AR(1) predictor model. The method of obtaining the log-likelihood is identical to that used in Theorem Multivariate regressors Assume that the data (Y 1,...,Y n,,..., n) follows Y t = c + β t 1 + u t, (14) t = µ + A t 1 + v t, (1) where u t = φ v t + e t, (e t, v t) N,diag(σ e, Σ v ) is an i.i.d. series and A is a k k matrix with all eigenvalues less than unity in absolute value. Let Σ v Var(v t ) and Σ Var( t ), given by and define ˆτ = Σ 1 vec (Σ )=(I k A A) 1 vec (Σ v ) + n (I A) Σ 1 v (I A) " 1 Σ 1 +(I A) Σ 1 v (I A) # n t. t=1 Lemma 1 Then the REML log-likelihood up to an additive constant for the model in (14) and 1

16 (1) is given by µ n 1 L M = log σe 1 σe S (φ, β, A) 1 log Σ n log Σ v (16) 1 log Σ 1 + n (I A) Σ 1 v (I A) ( ) where 1 ( ˆτ) Σ 1 ( ˆτ)+ S (φ, β, A) = n t=1 ( t ˆτ A ( t 1 ˆτ)) Σ 1 v ( t ˆτ A ( t 1 ˆτ)) n Yt,c φ t,c β φ A t 1,c, t,c = t n 1 P n t=1 t and t 1,c = t 1 n 1 P n t=1 t 1. t=1. To ease the computational burden during optimisation, the likelihood can be defined in terms of the re-parametrised set Σ v,σ e, A,φ,γ, where γ = β Aφ. This re-parametrisation allows us to concentrate σ e,φ,γ out of the likelihood, thus reducing the dimensionality of the optimisation problem. The likelihood can then be sequentially optimised, first over (Σ v, A), with the REML estimates of σ e,φ,γ being then obtained by OLS through the minimisation of S (φ, γ). Afurthersimplification of the above likelihood occurs if the coefficient matrix A is diagonal, given by A =diag (α 1,...,α k ) with max i α i < 1, in which case one gets µµ σv,ij Var( t ) Σ =, 1 α i α j where Σ v =((σ v,ij )). Amihud and Hurvich (4) find evidence to support this model with a diagonal coefficient matrix in the empirical example that they consider. It should be noted that in the case where A can be assumed to be a diagonal matrix, the predictive regression model is no longer a SUR system and hence OLS will no longer be efficient. However, REML will clearly retain efficiency, no matter what the form of A is, thus giving it an advantage both in terms of asymptotic efficiency and power over any OLS based procedure. Since the dimension of the parameter space is very large in the vector case, it is not feasible to obtain a result such as Theorem in the most general case. However, in the case where A is a diagonal matrix and where σe,φ,σ v are assumed known with Σv diagonal, we are able to obtain the following result on the finite sample behaviour of the RLRT for testing H : β =. The proof follows along lines similar to those for Theorem and is omitted. 16

17 Theorem 4 In the model given by (14) and (1), assume that A = diag (α 1,...,α k ), with max i α i < 1, and that σe,φ,σ ³ v are known with Σv = diag σv,11,...,σ v,kk. Let R M denote the RLRT based on the restricted likelihood in (16) for testing H : β =. Then, P (R M x) =P χ k x n 1 Ã k i=1! φ i σ v,ii σ e P χ 1+φ i σ v,ii σ k+ x P χ k x +O n. e Since <n 1 P ³ k i=1 φ i σ v,ii σ e 1+φ i σ v,ii σ e 1 <n 1 k trivially, the result shows that the χ distribution once again provides a very good approximation to the RLRT in this situation. It is useful to note that Theorem 4 shows that the quality of the χ approximation to the RLRT is affected only minimally by the dependence between u t and v t, over which one has no control. However, once one variable has been chosen, we can control which other variables should be included in the model and it is preferable to use a group of variables that have low (ideally zero) correlation among themselves to avoid unnecessary multicollinearity. Hence, the assumption in the Theorem that Σ v is diagonal is not very unreasonable. Finally, from the discussion below equation (4) and below Theorem, one would expect the χ approximation to continue to work well when σ e,φ,σ v are unknown (with Σv diagonal) and, indeed, we find this to be the case in our simulations in Section below. As a matter of fact, the simulations show that the RLRT behaves very well even when the cross-correlation in the variables is as high as.9 The restricted likelihood for the AR(p) regressor case can be derived in a manner analogous to that for the AR(1) case and is given next. 4. Higher order autoregressive regressors Let the observed data (Y 1,...,Y n, p+1, p+,..., n ) follow Y t = c + β t 1 + u t (17) and t = µ + α 1 t α p t p + v t, (18) 17

18 where u t = φv t + e t and (e t,v t ) N, diag σe,σ v are an i.i.d. series. Furthermore, assume that all the roots of the polynomial z p P p s=1 zp s α s lie within the unit circle. Define Y c = Y 1Ȳ, c = 1 1 1,..., p+1 1 p+1, where i =( i, i+1,..., n 1 i ) and i = n 1 1 i. Lemma The REML log-likelihood up to an additive constant for the model in (17) and (18) is given by L µ α, β, φ, σe,σ v n 1 = log σe 1 σe S (φ, β, α 1,...,α p ) (19) µ n + p 1 log σv 1 log Σ 1 log 1 Σ σv ( 1ˆτ) Σ 1 ( 1ˆτ) where Var() =σv Σ, ˆτ = 1 Σ Σ 1 and S (φ, β, α 1,...,α p )=Z cz c where Ã! p Z c = Y c φ c,1 (β φα 1 ) c, + α c,i c,i+1. i= Though at first glance the expression in (19) looks formidable, it is actually very easy to compute. The quantity S (φ, β, α 1,...,α p ) is, of course, just a quadratic form. It is also well known that both the determinant Σ and bilinear/quadratic forms of the type y Σ 1 x are very easy to compute for AR(p) models, since for such models Σ 1 canbeeasilyexpressedas B B for some lower triangular matrix B. (See equation below). After a re-parametrisation γ = β φα 1, the parameters σe,σ v,φ,γ can be concentrated out of (19) and the concentrated log-likelihood optimised over the remaining parameters (α,...,α p ). In the next Section we report the results of our simulations. Simulations We first study the performance of four estimators of β: (i) The OLS estimate ˆβ OLS (ii) The bias corrected OLS estimate ˆβ OLS,c (iii) The REML estimate ˆβ and (iv) The bias corrected 18

19 REML estimate ˆβ c. The bias corrected estimators are defined below Theorem 1. The data was simulated from the model given by equations (1) and () with three sample sizes, n =, 1 and 1. For each sample size, the predictive regression slope coefficient β was set to while the autoregressive coefficient α was varied over three values,.9,.97 and.99. Since the bias in the estimate of β is proportional to φ, we chose φ = 8 (as noted earlier, Amihud and Hurvich, 4, find empirical estimates of φ as high as -9 ) to highlight the bias gains to be had by using the bias corrected REML estimate. The values of σv and σe were set to be.4681and 1 respectively, implying that Corr(u t,v t )=.8, which is representative of the innovation correlation found in empirical examples. Tables I-III report the simulation means and standard deviations of these four estimates based on 1, replications for each of the three values of α. As predicted by the theory, the bias of ˆβ is uniformly less than that of ˆβ OLS, and approximately half its value when α =.99. At the same time, ˆβ does not suffer any loss of efficiency, indeed having a uniformly smaller standard deviation than that of ˆβ OLS. The bias of ˆβ c is also always substantially less than that of ˆβ OLS,c, at times by as much as 7% and its standard deviation is uniformly lower too. The simulation results clearly demonstrate the advantage enjoyed by the REML estimate in terms of both bias and standard deviation over the OLS estimate. To test the robustness of the procedure to non-normal thick tailed errors, we also generated data from the same model and parameter configurations, but letting (e t,v t ) be independent t errors. The results are presented in the second set of columns in Tables I-III and it is seen that once again, ˆβ and ˆβ c are consistently better than ˆβ OLS and ˆβ OLS,c respectively in terms of both bias and standard deviation. We next turn to studying the quality of the χ 1 approximation to the distribution of the RLRT for the above model and parameter configurations with both normal and t innovations. The χ 1 approximation was assessed by three measures (i) QQ plots of the RLRT against the theoretical quantiles of a χ 1 distribution (ii) simulation means and standard deviations of the RLRT (which are theoretically 1 and respectively for a χ 1 distribution) and (iii) simulation sizes at the % and 1% level. The simulation mean, standard deviation and sizes are reported intableivwhiletheqqplotsareshowninfigure1. (WepresenttheQQplotsonlyforthe normal innovations since the plots for t errors are qualitatively similar in nature). Apart from 19

20 a small distortion at n =and α =.99, the RLRT is seen to be very well approximated by the χ 1 distribution according to each of the three measures. It is also worth stressing again that the performance of the RLRT is not affected by the bias of ˆβ at all. We finally generate data from a model in which the regressors are a bivariate AR(1) model. More specifically, we use the model Y t = β t 1 +u t and t = A t 1 +v t, where u t = φ v t +e t. Thesamplesizewassetatn =, the slope vector β was set to zero, while φ =( 8, 8). Two configurations of the autoregressive coefficient matrix A were considered, diag (.9,.8) and diag (.9,.9). The innovation matrix Σ v was set to one of the following three configurations: (a) the identity matrix (b) variances equal to and correlation ρ v =. (c) variances equal to 1 and correlation ρ v =.9. In all cases, the innovation variance σe was set to unity. This design (except for Σ v = I) matches that used in Amihud and Hurvich (4). The number of replications for each parameter configuration was, in the vector case. As before, the quality of the χ approximation to the distribution of the RLRT was assessed via three measures (i) QQ plots of the RLRT against the theoretical quantiles of a χ distribution (ii) simulation means and standard deviations of the RLRT (which are theoretically and 4 respectively for a χ distribution) and (iii) simulation sizes at the % and 1% level. The results are provided in Tables V and VI and Figure. As noted in sub-section 4.1 above, OLS is no longer asymptotically efficient if A is a diagonal matrix since the system is no longer a SUR. Hence, not only does REML afford a dramatic reduction in bias over OLS, it also provides a great reduction in variance, as can be seen in Table V. Furthermore, for the inference problem it is once again seen that the RLRT is very well approximated by the χ distribution in all the cases we consider. The overall conclusion to be had from the simulations is that the REML procedure yields point estimates that are much less biased than their OLS counterparts and also an RLRT that is very well behaved, even when the regressors are close to being integrated. 6 Appendix ProofofTheorem1:

21 As noted at the start of Section ) above, the REML likelihood corresponds to the likelihood of TZ, where T is any full row rank matrix such that T1 =. We will obtain this likelihood by choosing T to have the form T = T 1, T where T 1 and T are full row rank matrices of dimension (n 1) n and n (n +1)respectively, satisfying T 1 1 =, T 1 =, T 1 T 1 = I and T T = I and using the fact that L (TZ) =L (T 1 Y / T ) L (T ). We first obtain L (T 1 Y / T ). Since u t = φv t + e t, where φ = σ uv /σ v and e t N,σ e is a series independent of {v t }, we get Y t = c + β t 1 + φ ( t µ α t 1 )+e t () = c + φ t +(β φα) t 1 + e t. Let Ỹ = T 1Y, 1 = T 1 1 and = T 1. From () it then follows that Ỹ = θ+ẽ, (1) where = h i 1,, ẽ = T 1 e, e =(e 1,...,e n ) and θ =(φ, β φα). Since t is a function only of {v t,v t 1,...}, the series { t } is independent of {e t }. Furthermore, knowledge of T, where T is any full row rank matrix such that T 1 =, implies knowledge of. Hence, from ³ (1) the conditional distribution of Ỹ given T is N θ, σ e I, since T 1 T 1 = I. It follows that the conditional log-likelihood of ³Ỹ / T up to an additive constant is given by µ n 1 l 1 ³Ỹ / T,θ,σe = log σe 1 ³Ỹ θ ³Ỹ θ σe. Note, however, that θ = T 1 [ 1, ] θ. Thus, ³Ỹ θ ³Ỹ θ =(Y [ 1, ] θ) T 1T 1 (Y [ 1, ] θ). Since the matrix T 1 when augmented by the row n 1/ 1 is an orthogonal matrix, it follows that T 1 T 1= I n 1 11 and hence ³Ỹ θ ³Ỹ θ =(Y [ 1, ] θ) I n 1 11 (Y [ 1, ] θ) = S (φ, β, α). 1

22 Thus, we get µ l 1 ³Ỹ /, θ,σ n 1 e = log σe 1 σe S (φ, β, α). () The log-likelihood of T is obtained from Harville (1974) and up to an additive constant is given by l T,α,σv ³ n = log σv 1 log Σ 1 log 1 Σ σv ( 1ˆτ) Σ 1 ( 1ˆτ), (3) where ˆτ = 1 Σ Σ 1 and Σ = σ v Var(). From () and (3), we obtain the loglikelihood of (T 1 Y, T ) up to an additive constant to be L µ T 1 Y, T, α,β,φ,σv,σ e n 1 = log σe 1 σe S (φ, β, α) (4) ³ n log σv 1 log Σ 1 log 1 Σ σv Q (α). The final form, as stated in (9) is obtained by algebraic simplification, using the fact that Σ 1 = B B, where B is the (n +1) (n +1)matrix given by 1 α α 1 B = α α 1 and noting that ˆτ = (1 α) P n i= i n 1. (n 1) (1 α)+ To obtain the REML estimates of α, β, φ, σe,σ v, it helps to consider the re-parametrised set of parameters α, γ, φ, σ e,σ v, where γ = β φα. It is then immediately obvious from inspecting (4) that the REML estimates of α, σv are obtained by simply maximising just l T,α,σv ³ n = log σv 1 log Σ 1 log 1 Σ σv ( 1ˆτ) Σ 1 ( 1ˆτ). In other words, the REML estimates of α, σv are just those estimates that would have been obtained by maximising the REML likelihood function of ( 1,..., n ). It thus follows immediately from Cheang and Reinsel () that the bias of ˆα is E (ˆα α) α + o (n 1). n 1 ()

23 The REML estimates of (φ, γ) are obtained as the least squares estimates ³ ˆφ, ˆγ = c c 1 c Y c, which then yields the REML estimate ˆβ =ˆγ + ˆφˆα. Since ˆφ = 1 ˆφ ˆβ ˆα 1 ˆγ, it follows that the REML estimates of the original parameters (φ, β) can also be obtained in a direct regression of Y c on ˆα( 1 ), 1. Thus, ˆβ is identical to the ARM estimate considered by Amihud and Hurvich (4) using the REML estimate ˆα as aproxyforα. Hence, the bias of ˆβ can be obtained from Theorem of Amihud and Hurvich (4) and is ³ E ˆβ β = φe (ˆα α) = αφ n 1 + o n 1. ³ Finally, Lemma 1 of Amihud and Hurvich (4) implies that E ˆφ = φ. ProofofTheorem: Since the LRT is invariant to re-parametrisation we choose to work with the re-parametrisation λ =(β,α,φ,τ v,τ e ) (β,λ ), where τ v = σ v and τ e = σ e since this greatly reduces the burden of our computations. For the REML log-likelihood given in (9), we will denote expectations of the log-likelihood derivatives as κ rs = n 1 E L/ λ r λ s, κrst = n 1 E 3 L/ λ r λ s λ t, κ (t) rs = κ rs / λ t. Letting δ = τ e /τ v, it is easily seen that the information matrix K =(( κ rs )) is given by K = K β,α + O n 1, (6) K φ,τv,τ e where K β,α = δ 1 φ 1 α φ φ + δ 1, 3

24 and ½ K φ,τv,τ e = diag δ, Let κ rs denote the entries of K 1 and κ rs denote the entries of K 1, where K is the lower right 4 4 sub-matrix of K. From the results of Hayakawa (1977, 1987), Cordeiro (1987), Barndorff-Nielsen and Hall (1988) and Chesher and Smith (199), we have 1 τ v, 1 τ e ¾. P (R T x) =P χ 1 x + An 1 P χ 3 x P χ 1 x + O n, (7) where A = 1 (l rstu l rstuvw ) ³ lrstu l rstuvw, (8) λ µ 1 l rstu = κ rs κ tu 4 κ rstu κ (u) rst + κ(su) rt λ, lrstu = κ rs κ µ 1 tu 4 κ rstu κ (u) rst + κ(su) rt, µ 1 l rstuvw = κ rs κ tu κ vw 6 κ rtvκ suw κ rtuκ svw κ rtv κ (u) sw κ rtu κ (v) sw + κ (v) rt κ(u) sw + κ (u) rt κ(v) sw and lrstuvw = κ rs κ tu κ µ 1 vw 6 κ rtvκ suw κ rtuκ svw κ rtv κ sw (u) κ rtu κ sw (v) + κ (v) rt κ(u) sw + κ (u) rt κ(v) sw. Exploiting the near diagonal (up to O n 1 ) structure of K, we can simplify the term P λ (l rstu l rstuvw ) in A as (l rstu l rstuvw )= (l rstu l rstuvw )+ λ + (β,α) ((β,α),(φ,τ v,τ e )) (φ,τ v,τ e ) (l rstu l rstuvw ) (l rstu l rstuvw )+O n 1, (9) where P ((β,α),(φ,τ v,τ e)) denotes that at least one index in the summand must come from (β,α) and at least one index from (φ, τ v,τ e ). By the same logic, we can simplify the term P λ ³ lrstu rstuvw l 4

25 in A as ³ lrstu rstuvw l ³ lrstu l rstuvw + ³ lrstu l rstuvw λ = α + (α,(φ,τ v,τ e)) = α + (φ,τ v,τ e) ³ lrstu l rstuvw + O n 1 ³ lrstu l rstuvw + (α,(φ,τ v,τ e)) (φ,τ v,τ e) (l rstu l rstuvw ) ³ lrstu l rstuvw + O n 1, (3) where the last step in (3) follows from the fact that the entries of K for (φ, τ v,τ e ) are diagonal up to O n 1. It thus follows from (8), (9) and (3) that A = 1 (l rstu l rstuvw ) ³ lrstu rstuvw l (β,α) α + 1 (l rstu l rstuvw ) ³ lrstu rstuvw l + O n 1 ((β,α),(φ,τ v,τ e)) (α,(φ,τ v,τ e)) A (β,α) + C λ + O n 1, (31) It is obvious from the structure of A (β,α) in (31) that A (β,α) would be the leading remainder term in the expansion of the distribution of R T of the form in (7) if (φ, τ v,τ e ) were known and α were the only nuisance parameter. The term C λ, thus, is a measure of the extent to which lack of knowledge of the parameters (φ, τ v,τ e ) affects the finite sample distribution of R T. We now compute the two terms A (β,α) and C λ, beginning with A (β,α). Though (β,α) are not orthogonal, the computation of P (β,α) (l rstu l rstuvw ) can be simplified by working with the transformed parameters (γ,α), where γ = β φα, since γ is orthogonal to α (Note that as stated above, when computing A (β,α) the remaining parameters (φ, τ v,τ e ) are fixed). Since (γ,α) is an affine transformation of (β,α), wecanexploitthefactthatthe first term of A (β,α) is invariant under such transformations (see page 371 of Hayakawa 1977) and thus get 1 (l rstu l rstuvw )= 1 (l rstu,p l rstuvw,p ), (3) (β,α) (γ,α)

26 where the extra subscript p will mean that the computation is being carried out for the reparameterised form of (9) with γ = β φα. The right hand side of (3) is much simpler to compute due to the fact that κ γα,p = κ (γ) γγ,p = κ (γγ) γγ,p =and all the terms κ rst,p and κ rstu,p are at most O n 1, for all permutations of the subscripts. Since κ γγ,p = δ 1 α 1 and κ αα,p = 1 α 1, it follows that 1 (γ,α) l rstu,p = 1 καα p κ αα p ³ κ (αα) αα,p + O n 1 = 1+3α 1 α 1 + O n 1 (33) and 1 (γ,α) l rstuvw,p = 1 καα p κ αα p κ αα p ³ κ (α) αα + O n 1 = 4α 1 α 1 + O n 1. (34) From (3), (33) and (34), we get 1 (l rstu l rstuvw )= 1+O n 1. (3) (β,α) The second term in A (β,α) is not invariant under affine transformations and we revert to the original log-likelihood (9) to compute it. Noting that κ αα = φ δ +1 1 α 1 and κ αα = φ δ α, we have 1 ³ lrstu rstuvw l α From (3) and (36) we conclude that = 1 ³ καα κ αα κ αα (αα) 1 καα κ αα κ ³ αα κ (α) αα + O n 1 = 1 φ δ +1 + O n 1. (36) φ δ A (β,α) = φ δ +1 + O n 1 = ρ + O n 1. (37) We now turn our attention to computing the second term, C λ, in (31). We first note from the near diagonal structure of K that 1 ((β,α),(φ,τ v,τ e )) (l rstu l rstuvw )= (β,(φ,τ v,τ e )) (β,(φ,τ v,τ e )) (β,α,(φ,τ v,τ e )) l rrtt + 1 l rrttvv 1 l rstuvw. (α,(φ,τ v,τ e )) (a,(φ,τ v,τ e )) l rrtt (38) l rrttvv 6

27 Each of the terms in (38) are now computed. The details are not provided here, both to save space and also because the computation does not afford any special insight into the problem. The detailed calculations, however, are available from the authors. The terms in ³ lrstu rstuvw l can be decomposed in a manner similar to that in (38), except P (α,(φ,τ v,τ e)) that in this case there are no terms in β. When all these terms are put together, one gets C λ = O n 1. (39) The theorem now follows from (7), (31), (37) and (39). ProofofTheorem3: The expansion of the distribution of the LRT and the Wald test under local Pitman alternatives is given in Hayakawa (197), while that of the distribution of the Rao score test is given by Harris and Peers (198). These results are consolidated using simpler notation in Cordeiro, Botter and Ferrari (1994) and we follow the notation used in their work. To obtain the results of Theorem 3, we calculate the quantities in equations (1) - (b) on page 711 of Cordeiro et al. (1994). Letting λ =(β,α,φ,τ v,τ e ), where τ v = σ v and τ e = σe, we note that the quantities κ ij = n 1 E L/ λ i λ j have been already obtained in (6). To obtain the quantities κ i,jk = n 1 E ( L/ λ i ) L/ λ j λ k and κi,j,k = n 1 E [( L/ λ i )( L/ λ j )( L/ λ k )], we exploit the Bartlett identities (Bartlett, 193) to obtain κ i,j,k =κ ijk κ jk λ i κ ik λ j κ ij λ k and κ i,jk = κ ijk + κ jk. λ i The quantities κ jk / λ i are then calculated using (6). Using the same notation (note, however, that we define all our cumulants κ to be O (1) whereas Cordeiro et al. (1994) define them to be O (n)) as in equations (3a) - (b) of Cordeiro et al. (1994), we get b 1 =.n 1/ C 1 +O n 1,b 11 = n 1/ C 1 +O n 1,b 1 =.n 1/ C 1 +O n 1,b 13 =, b =.n 1/ C 1 +O n 1,b 1 = n 1/ C 1 +O n 1,b =.n 1/ C 1 +O n 1,b 3 = O n 1 7

28 and b 3 =.n 1/ C 1 + O n 1,b 31 = n 1/ (C 1 + C )+O n 1 b 3 = n 1/ C 1 + O n 1,b 33 =.n 1/ C 1 + O n 1. The result now follows from equation of Cordeiro et al. (1994). References Amihud, Y. and Hurvich, C. (4): "Predictive Regressions: A Reduced-Bias Estimation Method," Journal of Financial and Quantitative Analysis, 39, Barndorff-Nielsen, O.E. and Hall, P. (1988): "On the Level-Error after Bartlett Adjustment of the Likelihood Ratio Statistic," Biometrika, 7, Bartlett, M. (1937): "Properties of Sufficiency and Statistical Tests," Proc.Roy.Soc.Ser.A, 16, Bartlett, M. (193): "Approximate Confidence Intervals II. More than One Unknown Parameter," Biometrika, 4, Cordeiro, G. (1987): "On the Corrections to the Likelihood Ratio Statistics," Biometrika, 74, Cordeiro, G., Botter, D. and Ferrari, S. L. (1994): "Nonnull Asymptotic Distributions of Three Classic Criteria in Generalised Linear Models," Biometrika, 81, Cheang, W.K. and Reinsel, G. (): "Bias Reduction of Autoregressive Estimates in Time Series Regression Model through Restricted Maximum Likelihood," Journal of the American Statistical Association, 9, Chesher, A. and Smith, R. (199): "Bartlett Corrections to Likelihood Ratio Tests," Biometrika, 8, Fisher, R. (1973): "Statistical Methods and Scientific Inference." Haffner, New York. 8

29 Francke, M, and de Vos, A. (6): "Marginal Likelihood and Unit Roots," Forthcoming in Journal of Econometrics. Efron, B. (197): "Defining the curvature of a statistical problem (with applications to second order efficiency)," Annals of Statistics, 3, van Garderen, K. (1999): "Exact Geometry of Autoregressive Models," Journal of Time Series Analysis,, 1-1. van Giersbergen, N (6): "Bartlett Correction in the Stable AR(1) Model with Intercept and Trend," Working paper, Universiteit van Amsterdam. Harris, P. and Peers, H. W. (198): "The Local Power of the Efficient Scores Test Statistic," Biometrika, 67, -9. Harville, D. (1974): "Bayesian Inference of Variance Components Using Only Error Contrasts," Biometrika, 61, Harville, D. (1977): "Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems," Journal of the American Statistical Association, 7, Hayakawa, T. (197): "The Likelihood Ratio Criterion for a Composite Hypothesis under a Local Alternative," Biometrika, 6, Hayakawa, T. (1977): "The Likelihood Ratio Criterion and the Asymptotic Expansion of its Distribution," Ann. Inst. Statist. Math., 9, Hayakawa, T. (1987): Correction, Ann. Inst. Statist. Math., 39, 681. Jansson, M. and Moreira, M. (6): "Optimal Inference in Regression Models with Nearly Integrated Regressors," Econometrica, 74, Kalbfleisch, J. and Sprott, D. (197): "Application of Likelihood Methods to Models Involving Large Numbers of Parameters," Journal of the Royal Statistical Society B, 3,

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series

The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series The Restricted Likelihood Ratio Test at the Boundary in Autoregressive Series Willa W. Chen Rohit S. Deo July 6, 009 Abstract. The restricted likelihood ratio test, RLRT, for the autoregressive coefficient

More information

Hypothesis Testing in Predictive Regressions

Hypothesis Testing in Predictive Regressions Hypothesis Testing in Predictive Regressions Yakov Amihud 1 Clifford M. Hurvich 2 Yi Wang 3 November 12, 2004 1 Ira Leon Rennert Professor of Finance, Stern School of Business, New York University, New

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Predictive Regressions: A Reduced-Bias. Estimation Method

Predictive Regressions: A Reduced-Bias. Estimation Method Predictive Regressions: A Reduced-Bias Estimation Method Yakov Amihud 1 Clifford M. Hurvich 2 November 28, 2003 1 Ira Leon Rennert Professor of Finance, Stern School of Business, New York University, New

More information

Predictive Regressions: A Reduced-Bias. Estimation Method

Predictive Regressions: A Reduced-Bias. Estimation Method Predictive Regressions: A Reduced-Bias Estimation Method Yakov Amihud 1 Clifford M. Hurvich 2 May 4, 2004 1 Ira Leon Rennert Professor of Finance, Stern School of Business, New York University, New York

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression Simulating Properties of the Likelihood Ratio est for a Unit Root in an Explosive Second Order Autoregression Bent Nielsen Nuffield College, University of Oxford J James Reade St Cross College, University

More information

New Methods for Inference in Long-Horizon Regressions

New Methods for Inference in Long-Horizon Regressions JOURNAL OF FINANCIAL AND QUANTITATIVE ANALYSIS Vol. 46, No. 3, June 2011, pp. 815 839 COPYRIGHT 2011, MICHAEL G. FOSTER SCHOOL OF BUSINESS, UNIVERSITY OF WASHINGTON, SEATTLE, WA 98195 doi:10.1017/s0022109011000135

More information

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions

Economics Division University of Southampton Southampton SO17 1BJ, UK. Title Overlapping Sub-sampling and invariance to initial conditions Economics Division University of Southampton Southampton SO17 1BJ, UK Discussion Papers in Economics and Econometrics Title Overlapping Sub-sampling and invariance to initial conditions By Maria Kyriacou

More information

A test for improved forecasting performance at higher lead times

A test for improved forecasting performance at higher lead times A test for improved forecasting performance at higher lead times John Haywood and Granville Tunnicliffe Wilson September 3 Abstract Tiao and Xu (1993) proposed a test of whether a time series model, estimated

More information

Likelihood and p-value functions in the composite likelihood context

Likelihood and p-value functions in the composite likelihood context Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8] 1 Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8] Insights: Price movements in one market can spread easily and instantly to another market [economic globalization and internet

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Bias-Correction in Vector Autoregressive Models: A Simulation Study

Bias-Correction in Vector Autoregressive Models: A Simulation Study Econometrics 2014, 2, 45-71; doi:10.3390/econometrics2010045 OPEN ACCESS econometrics ISSN 2225-1146 www.mdpi.com/journal/econometrics Article Bias-Correction in Vector Autoregressive Models: A Simulation

More information

ASSESSING A VECTOR PARAMETER

ASSESSING A VECTOR PARAMETER SUMMARY ASSESSING A VECTOR PARAMETER By D.A.S. Fraser and N. Reid Department of Statistics, University of Toronto St. George Street, Toronto, Canada M5S 3G3 dfraser@utstat.toronto.edu Some key words. Ancillary;

More information

Marginal Likelihood-based One-sided LR Test for Testing Higher Order Autocorrelation in Presence of Nuisance Parameters -A Distance Based Approach

Marginal Likelihood-based One-sided LR Test for Testing Higher Order Autocorrelation in Presence of Nuisance Parameters -A Distance Based Approach International Journal of Statistics and Probability; Vol 1, No 2; 2012 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Marginal Likelihood-based One-sided LR Test

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Multivariate Time Series

Multivariate Time Series Multivariate Time Series Notation: I do not use boldface (or anything else) to distinguish vectors from scalars. Tsay (and many other writers) do. I denote a multivariate stochastic process in the form

More information

A better way to bootstrap pairs

A better way to bootstrap pairs A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Forecasting 1 to h steps ahead using partial least squares

Forecasting 1 to h steps ahead using partial least squares Forecasting 1 to h steps ahead using partial least squares Philip Hans Franses Econometric Institute, Erasmus University Rotterdam November 10, 2006 Econometric Institute Report 2006-47 I thank Dick van

More information

Chapter 1 Likelihood-Based Inference and Finite-Sample Corrections: A Brief Overview

Chapter 1 Likelihood-Based Inference and Finite-Sample Corrections: A Brief Overview Chapter 1 Likelihood-Based Inference and Finite-Sample Corrections: A Brief Overview Abstract This chapter introduces the likelihood function and estimation by maximum likelihood. Some important properties

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56 Cointegrated VAR s Eduardo Rossi University of Pavia November 2013 Rossi Cointegrated VAR s Financial Econometrics - 2013 1 / 56 VAR y t = (y 1t,..., y nt ) is (n 1) vector. y t VAR(p): Φ(L)y t = ɛ t The

More information

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice - theory and application in finance R. Scott Hacker and Abdulnasser Hatemi-J April 200

More information

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 GLS and FGLS Econ 671 Purdue University Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 In this lecture we continue to discuss properties associated with the GLS estimator. In addition we discuss the practical

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator by Emmanuel Flachaire Eurequa, University Paris I Panthéon-Sorbonne December 2001 Abstract Recent results of Cribari-Neto and Zarkos

More information

Sheffield Economic Research Paper Series. SERP Number:

Sheffield Economic Research Paper Series. SERP Number: Sheffield Economic Research Paper Series SERP Number: 2004013 Jamie Gascoigne Estimating threshold vector error-correction models with multiple cointegrating relationships. November 2004 * Corresponding

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Cointegration Lecture I: Introduction

Cointegration Lecture I: Introduction 1 Cointegration Lecture I: Introduction Julia Giese Nuffield College julia.giese@economics.ox.ac.uk Hilary Term 2008 2 Outline Introduction Estimation of unrestricted VAR Non-stationarity Deterministic

More information

Lecture 2: Univariate Time Series

Lecture 2: Univariate Time Series Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Size and Power of the RESET Test as Applied to Systems of Equations: A Bootstrap Approach

Size and Power of the RESET Test as Applied to Systems of Equations: A Bootstrap Approach Size and Power of the RESET Test as Applied to Systems of Equations: A Bootstrap Approach Ghazi Shukur Panagiotis Mantalos International Business School Department of Statistics Jönköping University Lund

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Lecture 11 Weak IV. Econ 715

Lecture 11 Weak IV. Econ 715 Lecture 11 Weak IV Instrument exogeneity and instrument relevance are two crucial requirements in empirical analysis using GMM. It now appears that in many applications of GMM and IV regressions, instruments

More information

Chapter 12 REML and ML Estimation

Chapter 12 REML and ML Estimation Chapter 12 REML and ML Estimation C. R. Henderson 1984 - Guelph 1 Iterative MIVQUE The restricted maximum likelihood estimator (REML) of Patterson and Thompson (1971) can be obtained by iterating on MIVQUE,

More information

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models Journal of Finance and Investment Analysis, vol.1, no.1, 2012, 55-67 ISSN: 2241-0988 (print version), 2241-0996 (online) International Scientific Press, 2012 A Non-Parametric Approach of Heteroskedasticity

More information

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Paul Marriott 1, Radka Sabolova 2, Germain Van Bever 2, and Frank Critchley 2 1 University of Waterloo, Waterloo, Ontario,

More information

Flat and multimodal likelihoods and model lack of fit in curved exponential families

Flat and multimodal likelihoods and model lack of fit in curved exponential families Mathematical Statistics Stockholm University Flat and multimodal likelihoods and model lack of fit in curved exponential families Rolf Sundberg Research Report 2009:1 ISSN 1650-0377 Postal address: Mathematical

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models University of Illinois Fall 2016 Department of Economics Roger Koenker Economics 536 Lecture 7 Introduction to Specification Testing in Dynamic Econometric Models In this lecture I want to briefly describe

More information

Testing Error Correction in Panel data

Testing Error Correction in Panel data University of Vienna, Dept. of Economics Master in Economics Vienna 2010 The Model (1) Westerlund (2007) consider the following DGP: y it = φ 1i + φ 2i t + z it (1) x it = x it 1 + υ it (2) where the stochastic

More information

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED by W. Robert Reed Department of Economics and Finance University of Canterbury, New Zealand Email: bob.reed@canterbury.ac.nz

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

Optimal Jackknife for Unit Root Models

Optimal Jackknife for Unit Root Models Optimal Jackknife for Unit Root Models Ye Chen and Jun Yu Singapore Management University October 19, 2014 Abstract A new jackknife method is introduced to remove the first order bias in the discrete time

More information

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY Time Series Analysis James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY & Contents PREFACE xiii 1 1.1. 1.2. Difference Equations First-Order Difference Equations 1 /?th-order Difference

More information

Binary choice 3.3 Maximum likelihood estimation

Binary choice 3.3 Maximum likelihood estimation Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

Conditional Inference by Estimation of a Marginal Distribution

Conditional Inference by Estimation of a Marginal Distribution Conditional Inference by Estimation of a Marginal Distribution Thomas J. DiCiccio and G. Alastair Young 1 Introduction Conditional inference has been, since the seminal work of Fisher (1934), a fundamental

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis Sílvia Gonçalves and Benoit Perron Département de sciences économiques,

More information

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

Testing Some Covariance Structures under a Growth Curve Model in High Dimension Department of Mathematics Testing Some Covariance Structures under a Growth Curve Model in High Dimension Muni S. Srivastava and Martin Singull LiTH-MAT-R--2015/03--SE Department of Mathematics Linköping

More information

Steven Cook University of Wales Swansea. Abstract

Steven Cook University of Wales Swansea. Abstract On the finite sample power of modified Dickey Fuller tests: The role of the initial condition Steven Cook University of Wales Swansea Abstract The relationship between the initial condition of time series

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 4 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 23 Recommended Reading For the today Serial correlation and heteroskedasticity in

More information

The formal relationship between analytic and bootstrap approaches to parametric inference

The formal relationship between analytic and bootstrap approaches to parametric inference The formal relationship between analytic and bootstrap approaches to parametric inference T.J. DiCiccio Cornell University, Ithaca, NY 14853, U.S.A. T.A. Kuffner Washington University in St. Louis, St.

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S

CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S ECO 513 Fall 25 C. Sims CONJUGATE DUMMY OBSERVATION PRIORS FOR VAR S 1. THE GENERAL IDEA As is documented elsewhere Sims (revised 1996, 2), there is a strong tendency for estimated time series models,

More information

Vector autoregressions, VAR

Vector autoregressions, VAR 1 / 45 Vector autoregressions, VAR Chapter 2 Financial Econometrics Michael Hauser WS17/18 2 / 45 Content Cross-correlations VAR model in standard/reduced form Properties of VAR(1), VAR(p) Structural VAR,

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Specification Test for Instrumental Variables Regression with Many Instruments

Specification Test for Instrumental Variables Regression with Many Instruments Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing

More information

Estimating Moving Average Processes with an improved version of Durbin s Method

Estimating Moving Average Processes with an improved version of Durbin s Method Estimating Moving Average Processes with an improved version of Durbin s Method Maximilian Ludwig this version: June 7, 4, initial version: April, 3 arxiv:347956v [statme] 6 Jun 4 Abstract This paper provides

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

4.2 Estimation on the boundary of the parameter space

4.2 Estimation on the boundary of the parameter space Chapter 4 Non-standard inference As we mentioned in Chapter the the log-likelihood ratio statistic is useful in the context of statistical testing because typically it is pivotal (does not depend on any

More information

Inference about the Indirect Effect: a Likelihood Approach

Inference about the Indirect Effect: a Likelihood Approach Discussion Paper: 2014/10 Inference about the Indirect Effect: a Likelihood Approach Noud P.A. van Giersbergen www.ase.uva.nl/uva-econometrics Amsterdam School of Economics Department of Economics & Econometrics

More information

Econometrics of financial markets, -solutions to seminar 1. Problem 1

Econometrics of financial markets, -solutions to seminar 1. Problem 1 Econometrics of financial markets, -solutions to seminar 1. Problem 1 a) Estimate with OLS. For any regression y i α + βx i + u i for OLS to be unbiased we need cov (u i,x j )0 i, j. For the autoregressive

More information

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis Introduction to Time Series Analysis 1 Contents: I. Basics of Time Series Analysis... 4 I.1 Stationarity... 5 I.2 Autocorrelation Function... 9 I.3 Partial Autocorrelation Function (PACF)... 14 I.4 Transformation

More information

The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions

The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions The Generalized Cochrane-Orcutt Transformation Estimation For Spurious and Fractional Spurious Regressions Shin-Huei Wang and Cheng Hsiao Jan 31, 2010 Abstract This paper proposes a highly consistent estimation,

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Econ 623 Econometrics II Topic 2: Stationary Time Series

Econ 623 Econometrics II Topic 2: Stationary Time Series 1 Introduction Econ 623 Econometrics II Topic 2: Stationary Time Series In the regression model we can model the error term as an autoregression AR(1) process. That is, we can use the past value of the

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Economic modelling and forecasting

Economic modelling and forecasting Economic modelling and forecasting 2-6 February 2015 Bank of England he generalised method of moments Ole Rummel Adviser, CCBS at the Bank of England ole.rummel@bankofengland.co.uk Outline Classical estimation

More information

Bootstrap Testing in Econometrics

Bootstrap Testing in Econometrics Presented May 29, 1999 at the CEA Annual Meeting Bootstrap Testing in Econometrics James G MacKinnon Queen s University at Kingston Introduction: Economists routinely compute test statistics of which the

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

arxiv: v1 [stat.me] 5 Apr 2013

arxiv: v1 [stat.me] 5 Apr 2013 Logistic regression geometry Karim Anaya-Izquierdo arxiv:1304.1720v1 [stat.me] 5 Apr 2013 London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK e-mail: karim.anaya@lshtm.ac.uk and Frank Critchley

More information

Statistical Inference of Covariate-Adjusted Randomized Experiments

Statistical Inference of Covariate-Adjusted Randomized Experiments 1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu

More information

DEFNITIVE TESTING OF AN INTEREST PARAMETER: USING PARAMETER CONTINUITY

DEFNITIVE TESTING OF AN INTEREST PARAMETER: USING PARAMETER CONTINUITY Journal of Statistical Research 200x, Vol. xx, No. xx, pp. xx-xx ISSN 0256-422 X DEFNITIVE TESTING OF AN INTEREST PARAMETER: USING PARAMETER CONTINUITY D. A. S. FRASER Department of Statistical Sciences,

More information

Massachusetts Institute of Technology Department of Economics Time Series Lecture 6: Additional Results for VAR s

Massachusetts Institute of Technology Department of Economics Time Series Lecture 6: Additional Results for VAR s Massachusetts Institute of Technology Department of Economics Time Series 14.384 Guido Kuersteiner Lecture 6: Additional Results for VAR s 6.1. Confidence Intervals for Impulse Response Functions There

More information

MEI Exam Review. June 7, 2002

MEI Exam Review. June 7, 2002 MEI Exam Review June 7, 2002 1 Final Exam Revision Notes 1.1 Random Rules and Formulas Linear transformations of random variables. f y (Y ) = f x (X) dx. dg Inverse Proof. (AB)(AB) 1 = I. (B 1 A 1 )(AB)(AB)

More information

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Tak Wai Chau February 20, 2014 Abstract This paper investigates the nite sample performance of a minimum distance estimator

More information