Department of Economics, Vanderbilt University While it is known that pseudo-out-of-sample methods are not optimal for

Comment Atsushi Inoue Department of Economics, Vanderbilt University (atsushi.inoue@vanderbilt.edu) While it is known that pseudo-out-of-sample methods are not optimal for comparing models, they are nevertheless often used to test predictability in population. In this comment, I elaborate on the often complicated relationship between in-sample and pseudo-out-of-sample inference. I develop an in-sample likelihood ratio test that has a pseudo-out-of-sample flavor to it. First, consider the predictive models, y t = ε t and y t = µ + ε t, where ε t is known to have a standard normal distribution for simplicity. We are interested in testing H 0 : µ = 0. As Diebold (2013) points out, the pseudoout-of-sample method is not optimal for testing µ = 0 (see Inoue and Kilian, 2004). By the Neymann-Pearson lemma, the in-sample likelihood ratio test is most powerful. Even in the presence of a break to which Diebold (2013) alludes as a possible reason for the pseudo-out-of-sample method, one can still conduct an in-sample likelihood ratio test. For example, consider y t = δi(t > [τ ]) + ε t. (1) When the break occurs within the observed sample, t = 1,...,, one can define an in-sample likelihood ratio test for testing y t = ε t against (1), 1

which is most powerful by the Neymann-Pearson lemma (see Rossi, 2005, for example). Below I will consider an alternative environment in which an in-sample likelihood ratio test is closely related to pseudo-out-of-sample inference. Consider the simple time-varying-parameter model: ( ) t y t = µ + ε t, (2) where µ : [0, 1] is a smooth function of time. While {y t } t=1 is a triangular array by construction, we omit the dependence of y on to simplify the notation. Robinson (1989) and Cai (2006) developed nonparametric estimation methods for such time-varying-parameter models. In related work, Giacomini and Rossi (2013) develop a test for nonnested model comparisons using the local Kullback-Leibler information criterion in this environment. he local log-likelihood function for the parameter µ(t/ ) is defined as 2 log(2π) s=1 K W (s t) 1 2 s=1 ( ( )) t 2 y s µ K W (s t), (3) where K W (x) = (1/W )k(x/w ), k( ) is a kernel function and W is the bandwidth (Fan, Farmen and Gijbels, 1998). o establish a link between the resulting nonparametric estimator and the rolling regression estimator, I fo- 2

cus on the following asymmetric flat kernel: K W (x) = 1 if W < x < 0 W 0 otherwise. (4) hen the local maximum likelihood estimator of µ(t/ ) that maximizes the local log-likelihood function (3) is given by ˆµ ( t ) = 1 W t 1 s=t W y s. (5) Note that this is precisely the one-step-ahead forecast based on the rolling scheme with rolling window size W. In other words, the rolling regression forecast is a nonparametric estimator of the time-varying parameter with asymmetric flat kernel and bandwidth given by the rolling window size. Note that the kernel (4) is not centered on zero. his fact causes a problem known as the boundary problem in the literature on nonparametrics. Under the null hypothesis that µ( ) = 0, however, no bias will arise from the boundary problem. Also note that the window size plays an important role under the alternative hypothesis. he fixed window size used in Giacomini and White (2006) will yield a variance that is not asymptotically negligible. Moreover, the window size that is proportional to the sample size which has been considered in both old-school and new-school W CM will yield biased estimates under the alternative because the bias term is decreasing in the window size W. In our context, the window size needs to go to infinity at a 3

slower rate than the sample size, W/ 0 as, W to consistently estimate µ( ) under the alternative hypothesis. o derive a valid test of no predictive ability in population, define the log-likelihood function of µ( ) by summing (3) for t = W + 1,..., : ln L(µ( )) = 1 2 ln(2π) s=1 K W (s t) 1 2 s=1 ( ( )) t 2 K W (s t) y s µ. Evaluating (6) under the null (µ( ) = 0) and under the alternative (µ( ) = µ ( )) and taking the difference, we obtain the following log-likelihood ratio test statistic: (6) LR = 2(ln L( µ ( )) ln L(0)) = 1 t 1 ( ( )) t 2 ys 2 y s µ K W (s t) W s=t W = ( ( )) t 2 µ. (7) his statistic can also be obtained by taking the L 2 norm of the score functions giving a Lagrange multipler test interpretation to (7). he log-likelihood ratio test statistic has an intuitive form. Because µ(t/ ) = 0 for all t = 1, 2,..., under the null hypothesis, the sum of squares of their estimates is expected to be small under the null hypothesis. On the other hand, under the alternative hypothesis that µ(t/ ) 0 for many t, the sum of squares should diverge to infinity as the sample size grows, resulting in consistency of the test. 4

It is interesting to compare (7) to the DM test statistic. he numerator of the DM test statistic based on the rolling scheme can be written as y 2 t ( ( )) t y t ˆµ ( ) t = 2 ε t µ ( µ ( t )) 2. (8) he second term is the negative of the log-likelihood ratio test statistic. It is interesting to note that Clark and West (2006) remove this term by recentering the test statistic under their parametric setup. Because infinitely many parameters are involved, the asymptotic null distribution of the log-likelihood ratio test statistic is not chi-square, however. o make the testing procedure operational, it is convenient to normalize the test statistic as follows: [ t=w +1 W ( µ ( )) ] 2 t 1, (9) ˆσ where ˆσ 2 is an estimator of the long-run variance of W ( µ (t/ )) 2. Under the null hypothesis that µ( ) = 0, one can show that (9) is asymptotically normally distributed based on the central limit theorem for m-dependent random variables with m diverging to infinity as in Romano and Wolf (2000). Below I conduct a small Monte Carlo experiment that compares the size and power of the DM and LR tests when y t = ε t is tested against y t = µ(t/ ) + ε t. I postulate that ε t iid N(0, 1). Note that this is not what the DM test is designed to do, of course. In the first data generating process, 5

µ(t/ ) = 0. In the second DGP, µ(t/ ) = 1 for all t = 1, 2,...,. In the third DGP, µ(t/ ) = sin(2πt/ ). he sample size is set to 100 ( = 100) and I consider W = 5, 10, 15, 20, 25, 50, 75. he numbers are the rejection frequencies when the nominal size is set to 5%. able 1: Rejection Frequencies of the DM and LR ests with Size α = 0.05 size power power µ ( ) ( ) ( t = 0 µ t = 1 µ t ) = sin ( 2π t W DM LR DM LR DM LR 5 0.000 0.018 0.997 1.000 0.523 0.976 10 0.000 0.013 1.000 1.000 0.823 0.841 15 0.000 0.007 1.000 0.908 0.785 0.202 20 0.000 0.005 0.999 0.001 0.550 0.000 25 0.000 0.004 0.999 0.000 0.185 0.000 50 0.003 0.000 0.980 0.000 0.003 0.000 75 0.012 0.000 0.775 0.000 0.036 0.000 ) Under the null hypothesis, the two tests are both undersized; the DM test because it does not take into parameter estimation uncertainty, the LR test because it is a nonparametric test and requires larger samples. Under the no change alternative, the DM test has good power for all the window size considered whereas the LR test has power only when W is small. he power of the LR test drops significantly when W is greater than 16. his is because W is the bandwidth and is not supposed to be large relative to the sample size. Finally, under the smooth change alternative, the LR test dominates the DM test when W is small. he optimal bandwidth is expected to be a function of the smoothness of µ( ), i.e., the more rapidly µ( ) is changing, 6

the smaller the window size should be. he optimal window size should be smaller for the third DGP than for the second DGP. It should also be noted that while the DM test may not be optimal, it is less sensitive to the choce of window size than the LR test is. his discussion shows that an in-sample likelihood ratio test can have a pseudo-out-of-sample interpretation and that the local likelihood ratio test has good power in a simple Monte Carlo experiment. he nonparametric approach often brings new insights to the forecasting literature. For example, in related work, Inoue, Rossi and Jin (2012) show that the pseudo out-ofsample model selection criterion can be made consistent, which is in contrast to the parametric results in Inoue and Kilian (2006). It is an open question which asymptotic approximation performs better in practice. he choice of window size also has significant impacts on results for DM-type tests as shown by Hansen and immermann (2012) and Rossi and Inoue (2012). In this context, the nonparametric interpretation of the simulated out-of-sample forecasting scheme provides insight for choosing the window size (Giraitis, Kapetanios and Price, 2013; Inoue, Jin and Rossi, 2014). he nonparametric approach also has implications for comparing forecasts. When parameters are changing at the time of making a forecast, there will be a bias term that needs to be taken into account in addition to a variance term. ACKNOWLEDGMEN I thank Lutz Kilian and Barbara Rossi for helpful suggestions and com- 7

ments and the National Science Foundation for financial support. ADDIIONAL REFERENCES Cai, Zongwu (2007), rending ime-varying Coefficient ime Series Models with Serially Correlated Errors, Journal of Econometrics, 136, 163 188. Clark, odd E., and Kenneth D. West (2006), Using Out-of-Sample Mean Squared Prediction Errors to est the Martingale Difference Hypothesis, Journal of Econometrics, 135, 155 186. Fan, Jianqing, Mark Farmen and Irène Gijbels (1998), Local Maximum Likelihood Estimation and Inference, Journal of Royal Statistical Society, Series B, 60, 591 608. Giraitis, Liudas, George Kapetanios and Simon Price (2013), Adaptive Forecasting in the Presence of Recent and Ongoing Structural Change, Journal of Econometrics, 177, 153 170. Hansen, Peter Reinhard, and Allan immermann (2012), Choice of Sample Split in Out-of-Sample Forecast Evaluation, unpublished manuscript, European University Institute and University of California, San Diego. Inoue, Atsushi, Barbara Rossi and Lu Jin (2012), Consistent Model Selection Over Rolling Windows, in Xiaohong Chen and Norm R. Swanson eds., Recent Advances and Future Directions in Causality, 8

Prediction, and Specification Analysis: Essays in Honor of Halbert L. White Jr, Springer: New York, NY, pp.299 330. Inoue, Atsushi, Lu Jin and Barbara Rossi, (2014), Window Selection for Out-of-Sample Forecasting with ime-varying Parameters, unpublished manuscript, Vanderbilt University, North Carolina State University and Universitat Pompeu Fabra. Robinson, Peter M. (1989), Nonparametric Estimation of ime-varying Parameters, in Peter Hackl eds, Statistical Analysis and Forecasting of Economic Structural Change, Springer: Berlin, pp. 253 264. Romano, Joseph P., and Michael Wolf (2000), A More General Central Limit heorem for m-dependent Random Varibales with Unbounded m, Statistics & Probability Letters, 47, 115 124. Rossi, Barbara (2005), Optimal ests for Nested Model Selection with Underlying Parameter Instability, Econometric heory, 21, 962 990. 9