Forecasting under structural change

Size: px

Start display at page:

Download "Forecasting under structural change"

Mark Hopkins
6 years ago
Views:

1 Forecasting under structural change Liudas Giraitis, George Kapetanios, Mohaimen Mansur and Simon Price Abstract Forecasting strategies that are robust to structural breaks have earned renewed attention in the literature. They are built on weighted averages downweighting past information and include forecasting with rolling window, exponential smoothing or exponentially weighted moving average and forecast pooling. These simple strategies are particularly attractive because they are easy to implement, possibly robust to dierent types of structural change and can adjust for breaks in real time. This review introduces the dynamic model to be forecast, explains in detail how the data-dependent tuning parameter for discounting the past data is selected and how basic forecasts are constructed and the forecast error estimated. It comments on the forecast error and the impact of weak and strong dependence of noise on the quality of the prediction. It also describes various forecasting methods and evaluates their practical performance in robust forecasting. Keywords: Structural change, adaptive forecasts, robust forecasts, weighted averaging. JEL Codes: C1, C59 Liudas Giraitis Queen Mary, University of London, UK George Kapetanios Queen Mary, University of London, UK Mohaimen Mansur Queen Mary, University of London, UK Simon Price Bank of England and City University, London, UK 1

2 2 Liudas Giraitis, George Kapetanios, Mohaimen Mansur and Simon Price 1 Introduction Dealing with structural change has become one of the most crucial challenges in economic and nancial time series modelling and forecasting. In econometrics, structural change usually refers to evolution of a parameter of interest of a dynamic model that makes its estimation and/or prediction unstable. The change can be as dramatic as an abrupt shift or permanent break caused, for example, by introduction of a new monetary policy, breakdown of an exchange rate regime or sudden rise in oil price; or the change can be slow, smooth and continuous induced, for example, by gradual progress in technology or production. Empirical evidence of structural change is widespread and welldocumented. Stock and Watson (1996) investigate many US macroeconomic time series and nd instability in both univariate and bivariate relationships. In nance, structural changes are detected in interest rates (see, e.g., Garcia and Perron (1996), Ang and Bekaert (22)) and stock prices and returns (see, e.g., Timmermann (21) and Pesaran and Timmermann (22)). Such structural change or parameter instability has been identied as one of the main culprits for forecast failures (see Hendry (2)) and, not surprisingly, detection of breaks and forecast strategies in the presence of breaks have earned a lot of attention among researchers. Nonetheless, real time forecasting of time series which are subject to structural change remains to be a critical challenge to date and is often complicated further by presence of other features of time series such as persistence (see Rossi (212)). A natural strategy for forecasting in an unstable environment would be nding the last change point and using only the post-break data for estimation of a model and forecasting. However, standard tests of structural breaks are hardly suitable for real time forecasting, small breaks are dicult to detect, and the amount of post-break data may be insucient. Moreover, Pesaran and Timmermann (27) point out that a trade-o between bias and forecast error variance implies that it is not always optimal to use only postbreak data, and generally benecial to include some pre-break information. A second line of strategies involves formally modelling the break process itself and estimating its characteristics such as timing, size and duration. A standard model of this kind is the Markov-switching model of Hamilton (1989). Clements and Krolzig (1998), however, demonstrate via a Monte Carlo study that despite the true data generating process being Markovswitching regime, switching models fail to forecast as accurately as a simple linear AR(1) model in many instances. Research on Bayesian methods learning about change-points from past and exploiting this information as priors in modelling and forecasting continues to evolve rapidly (see, e.g., Pesaran et al. (26), Koop and Potter (27), Maheu and Gordon (28)). As an alternative to the dilemma of whether to restrict the number of breaks occurring in-sample to be xed or to treat it as unknown, a class of the time varying parameter (TVP) models arises, which

3 Forecasting under structural change 3 assume that a change occurs each point in time (see, e.g., Stock and Watson (27), D'Agostino et al. (213)). The diculty in nding a single best forecasting model leads to the idea of combining forecasts of dierent models by averaging (see, e.g., Pesaran and Timmermann (27), Clark and McCracken (21)). Robust forecasting approaches have earned renewed attention in the literature. This class of methods builds on downweighting past information and includes forecasting with rolling windows, exponential smoothing or exponentially weighted moving averages (EWMA), forecast pooling with window averaging etc. These simple strategies are particularly attractive because they are easy to implement, possibly robust to dierent types of structural change and can adjust for breaks without delay, which is particularly helpful for real time forecasting. On the downside, a priori selected xed rate discounting of the old data may prove costly when the true model is break-free. A signicant contribution in this respect is due to Pesaran and Timmermann (27). These authors explore two strategies: one is selecting a single window by cross-validation based on pseudo-out-of-sample losses and the other is pooling forecasts from the same model obtained with dierent window sizes which should perform well in situations where the breaks are mild and hence dicult to detect. The issue of structural change occurring in real time and the challenge it poses for time series forecasting is partly but systematically addressed in Eklund et al. (21). They exploit data-downweighting break-robust methods. One crucial question they do not answer is how much to downweight older data. The challenge of forecasting under recent and ongoing structural change has been dealt in a generic setting in a recent work of Giraitis et al. (213). Alongside breaks these authors consider various other types of structural changes including deterministic and stochastic trends and smooth cycles. They exploit the typical data-discounting robustto-break models such as rolling windows, EWMA, forecast averaging over dierent windows and various extensions of them. However, they make the selection of the tuning parameter which denes the discounting weights datadependent by minimising the forecast mean squared error. They provide detailed theoretical and simulation analyses of their proposal and convincing evidence of good performance of methods with data-selected discount rates when applied to a number of US macroeconomic and nancial time series. While Giraitis et al. (213) consider persistence in time series through short memory autoregressive dependence in noise process, they do not explore the possibility of long memory which is often considered as a common but crucial property of many economic and nancial series. Mansur (213) extends the work of Giraitis et al. (213) by oering a more complex yet realistic forecasting environment where structural change in a dynamic model is accompanied by noises with long range dependence. This adds a new dimension to the existing challenge of real time forecasting under structural changes. It also contributes to an interesting and ongoing argument in the econometric literature about possible `spurious' relationship between long

4 4 Liudas Giraitis, George Kapetanios, Mohaimen Mansur and Simon Price range dependence and structural change and potential forecasting diculties this may create. Many researchers argue that presence of long memory in the data can be easily confused with structural change (see, e.g., Diebold and Inoue (21), Gourieroux and Jasiak (21), Granger and Hyung (24) and Kapetanios (26)). This aggravates the already dicult problem of forecasting under structural change further. Given that it is often dicult to distinguish between the two, it is desirable to establish forecast methods that are robust to structural change and also appropriately account for long memory persistence. The rest of the paper is structured as follows. Section 2 introduces the dynamic model to be forecast that was proposed and developed in Giraitis et al. (213) and Mansur (213). We discuss in detail how the tuning parameter dening the rate of downweighting is optimally selected from data and how forecasts are constructed. Section 3 contains theoretical results and Section 4 reviews the forecast strategies and presents Monte Carlo evidence for evaluation of performance of robust forecast strategies. 2 Adaptive Forecast Strategy Our adaptive forecast strategy aims at out-of-sample forecasting under minimal structural assumptions. It seeks to adapt to the unknown model and does not involve model tting and parameter estimation. Such forecasting introduced in Pesaran and Timmermann (27) and Eklund et al. (21) was subsequently developed by Giraitis et al. (213). It considers a simple but general location model given by y t = t + u t ; t = 1; 2; :::; T (1) where y t is the variable to be forecast, t is a persistent process ("signal") of unknown type and u t is a dependent noise. Unlike most of the previous works where t 's mainly describe structural breaks, this framework oers more exibility and generality in the sense that it does not impose any structure on a deterministic and stochastic trend t and adapts to its changes, such as, e.g., structural breaks in the mean. While Giraitis et al. (213) specify the noise u t to be a stationary short memory process, Mansur (213) explores the possibility of long range dependence in the noise. Standard denitions in the statistical literature de- ne short memory as the P absolute summability of the auto-covariances 1 u (k) = Cov(u j+k ; u j ), k= j u(k)j < 1, and long memory as the slow decay of u (k) c k 1+2d, as k increases, for some < d < 1=2 and c >. Unlike short memory, the autocorrelations of long memory processes are nonsummable.

5 Forecasting under structural change 5 One can expect the long memory noise process u t to generate substantial amount of persistence itself, which is a common feature of economic and nancial time series, to be forecast by our adaptive method, and to feed into y t diluting the underlying model structure. Forecasting perspectives of such persistent series y t, undergoing structural change, are of great interest in applications. The downweighting forecasting method relies simply on a weighted combination of historical data. A forecast of y t is based on (local) averaging of past values y t 1,..., y 1 : by tjt 1;H = Xt 1 j=1 w tj;h y t j = w t1;h y t 1 + ::: + w t;t 1;H y 1 (2) with weights w tj;h such that w t1;h + ::: + w t;t 1;H = 1 and parameterised by a single tuning parameter H: Two types of weighting schemes are particularly popular in practice, namely the rolling window and the exponentially weighted moving average. Such forecasting requires choosing a tuning parameter which determines the rate at which past information will be discounted. Performance of such forecast methods using a priori selected tuning parameter is known to be sensitive to the choice of the tuning parameter, see Pesaran and Pick (211) and Eklund et al. (21). Clearly, setting the discounting parameter to a single xed value is a risky strategy and unlikely to produce accurate forecasts if a series is subject to structural change. Adaptive methods. Giraitis et al. (213) advocate a data dependent selection of the tuning parameter H and provide theoretical justication on how such a selection can be optimal. It does not require any particular modelling and estimation of the structure of t. The data based tuning parameter H is chosen on the basis of in-sample forecast performance evaluated over a part of the sample. The structure of the kernel type weights w tj;h is described in what follows. Their denition requires a kernel function K(x) R ; x which is continuous and dierentiable on its support, such that 1 K(u)du = 1, K() >, and for some C >, c >, K(x) C exp( cjxj); j(d=dx)k(x)j C=(1 + x 2 ); x > : For t 1, H >, we set K(j=H) w tj;h = P t 1 j = 1; ; t 1: s=1 K(s=H); Examples. The main classes of commonly used weights, such as rolling window weights, exponential weights, triangular window weights etc. satisfy this assumption.

6 6 Liudas Giraitis, George Kapetanios, Mohaimen Mansur and Simon Price (i) Rolling window weights w tj;h ; j = 1; 2; :::::; t 1; correspond to K(u) = I( u 1). They are dened as follows: for H < t, w tj;h = H 1 I(1 j H); for H t, w tj;h = (t 1) 1 I(1 j t 1); where I is the indicator function. (ii) Exponential weighted moving average (EWMA) weights are dened with K(x) = e x ; x 2 [; 1): Then, with = exp( 1=H) 2 (; 1), K(j=H) = j ; w tj;h = j t 1 = k=1 k ; 1 j t 1: While the rolling window simply averages the H previous observations, the EWMA forecast uses all observations y 1 ; ; y t 1, smoothly downweighting the more distant past. These classes of weights are parameterised by a single parameter H. Selection of the tuning parameter, H. Suppose we have a sample of T observations y 1 ; :::; y T : The one-step-ahead forecast by T +1jT;H requires to select the tuning parameter H. Data adaptive selection of H is done by a cross-validation method using the evaluation sample of in-sample forecasts ^y tjt 1;H, t = T ; ; T to compute the mean squared forecast error (MSFE), Q T;H := 1 T n TX t=t (y t ^y tjt 1;H ) 2 ; and then choosing the tuning parameter H which generates the smallest MSFE: ^H := arg min H2I T Q T;H : Here T n := T T + 1 is the length of cross-validation period, T is the starting point and the minimisation interval I T = [a; H max ] is selected such that T 2=3 < H max < T T with < < 1 and a >. Although the adaptive forecast ^y T +1jT; bh cannot outperform the best forecast ^y T +1jT;Hopt with the unknown xed value H opt minimising the MSE,! T;H := E(y T +1 ^y T +1jT;H ) 2, it is desirable to achieve asymptotic equivalence of their MSFE's. Giraitis et al. (213) show that the forecast ^y T +1jT; ^H of y T +1, obtained with the data-tuned ^H, minimises the asymptotic MSE,! T;H, in H, hence making the weighted forecast procedure ^y T +1jT; ^H operational. It is also asymptotically optimal:! T; bh =! T;H opt + o(1); and the quantity Q T; ^H provides an estimate for the forecast error! T; ^H : Q T; ^H =! T; ^H + o p(1): Giraitis et al. (213) show that for a number of models y t = t + u t with deterministic and stochastic trends t and short memory u t 's,

7 Forecasting under structural change 7 Q T;H = ^ T;u 2 + E[Q T;H u](1 2 + o p (1)); T! 1; H! 1; (3) P uniformly in H, where u 2 = Eu 2 1 and b T;u 2 := T n 1 T j=t u 2 j. They verify that the deterministic function E[Q T;H u] 2 has a unique minimum, which enables selection of the optimal data-tuned parameter H that asymptotically minimises the objective function Q T;H. 3 Theoretical results We illustrate the theoretical properties of the weighted forecast ^y T +1jT; ^H with data selected tuning parameter b H by two examples of yt = t + u t where t is either a constant or a linear trend and the noise u t has either short or long memory. The following assumption describes the class of noise processes u t. We suppose that u t is a stationary linear process: u t = 1X j= a j " t j ; t 2 Z; " j IID(; 2 "); E" 4 1 < 1: (4) In addition, we assume that u t has either short memory (i) or long memory (ii). P 1 (i) u t has short memory (SM) property k= j u(k)j < 1, and s 2 u := P 1 k= 1 u(k) > ; P kn j u(k)j = o(log 2 n): (ii) u t has long memory (LM): for some c > and < d < 1=2, u (k) c k 1+2d ; k! 1: Cases (i) and (ii) were discussed in Giraitis et al. (213) and Mansur (213), respectively. Dene the weights P 1 w j;h = K(j=H)= s=1 K(s=H); j 1: In (ii) a T b T denotes that a T =b T! 1, as T increases. We write o p;h (1) to indicate that sup H2IT jo p;h (1)j! p, while o H (1) stands for sup H2IT jo H (1)j!, as T! 1.

8 8 Liudas Giraitis, George Kapetanios, Mohaimen Mansur and Simon Price 3.1 Forecasting a stationary process y t The case of a stationary process y t = + u t provides an additional illustrative evidence of the practical use of weighted averaging forecasting. For i.i.d. random variables y t, the optimal forecast of y T +1 is the sample mean P y T = T 1 T t=1 y t, (rolling window over the period t = 1; ; T ). However, when persistence increases, for a long memory or near non-stationary process y t, the sample mean forecast y T will be outperformed by averaging ^y T +1jT; bh = H 1 P T t=t +1 H y t over the last few observations y T +1 H ; :::; y T. Data based selection of the tuning parameter H allows the selection of the optimal rolling window width H even if the structure of y t is not known, providing a simple and ecient forecasting strategy for persistent stationary process y t. (Such a strategy extends also for unit root processes, see Giraitis et al. (213)). We shall use notation q u;h := E u 1X j=1 w j;h u j 2 2 u: For SM u t, set 2 = R 1 K 2 (x)dx and = K() and dene For LM u t, dene LM = c Z 1 Z 1 SM = s 2 u( 2 ) + 2 u : K(x)K(y)jx yj 1+2d dydx 2 Z 1 K(x)x 1+2d dx : Theorem 1. Suppose that y t = + u t ; t 1, where u t is a stationary linear process (4), satisfying either SM assumption (i) or LM assumption (ii). Then, as T! 1, for H 2 I T, where, as H! 1, Q T;H = ^ 2 T;u + q u;h 1 + o p;h (1) ; (5)! T;H = 2 u + q u;h 1 + o H (1) ; q u;h = SM H 1 (1 + o(1)) q u;h = LM H 1+2d (1 + o(1)) under (i); under (ii): Theorem 1 implies that Q T;H is a consistent estimate of! T;H. The following corollary shows that the forecast y T +1jT; ^H computed with the data-tuned ^H has the same MSE as the forecast y T +1jT; Hopt with the tuning parameter H opt.

9 Forecasting under structural change 9 Corollary 1. If q u;h reaches its minimum at some nite H, then! T; ^H =! T;H opt + o p (1); (6) Q T; ^H =! T; ^H + o p(1) = 2 u + q u;h + o p (1): Remark 1. Corollary 1 implies that the quality of a forecast with the tuning parameter ^H is the same as with the parameter Hopt that minimises the forecast error! T;H. While ^H can be evaluated from the data, Hopt is unknown. Observe that SM < and LM < in view of (5) imply that ^H remains bounded when T increases, so only a few most recent observations will contribute in forecasting. In turn, whether SM < or LM < holds depends on the shape of the kernel function K and the strength of dependence in u t. For example, LM < holds for the rolling window weights and LM u t 's. Indeed, then K(x) = I( x 1), and LM = c R 1 = c R 1 = 2c R 1 R 1 R K(x)K(y)jx yj 1+2d 1 dxdy 2c K(x)x 1+2d dx R R 1 jx yj 1+2d 1 dxdy 2 R x u 1+2d dudx 1 2d x 1+2d dx = 2c 1+2d < : Thus, the width ^H of the rolling window remains nite, as T increases, and the error of the rolling window forecast is smaller than u. 2 On the contrary, under short memory, property SM < cannot be produced by the rolling window weights, because they yield 2 = = 1, and thus SM = u 2 is always positive. However, for the exponential kernel K(x) = e x, x, SM = u 2 s 2 u=2 becomes negative when the long-run variance of u t is suciently large: s 2 u > 2u, 2 for example, for an AR(1) model u t with autoregressive parameter greater than 1= Forecasting a trend stationary process When forecasting a process y t = at + u t, that combines a deterministic trend and a stationary noise u t, it is natural to expect the weighted average forecast to be driven by the last few observations which is conrmed by theoretical results. Denote q ;H := ( 1X j=1 w j;h j) 2 ; := ( Notation q u;h is the same as in Theorem 1. Z 1 K(x)xdx) 2 :

10 1 Liudas Giraitis, George Kapetanios, Mohaimen Mansur and Simon Price Theorem 2. Let y t = at + u t ; t = 1; ; T, a 6=, where u t is a stationary linear process (4), satisfying either SM assumption (i) or LM assumption (ii). Then, as T! 1, for H 2 I T, Q T;H = ^ 2 T;u + q ;H + q u;h + o p;h (H 2 ); (7)! T;H = 2 u + q ;H + q u;h + o H (H 2 ); where q ;H + q u;h = H 2 + o(h 2 ), as H! 1. Theorem 2 allows us to establish the following basic properties of the forecast y T +1jT; ^H of a trend stationary process y t. Corollary 2. Under assumptions of Theorem 2, ^H stays bounded:! T; ^H =! T;H opt + o p (1); (8) Q T; ^H =! T; ^H + o p(1) = 2 u + q ;H + q u;h + o p (1); where H is a minimiser of q ;H + q u;h. In the presence of a deterministic trend the optimal ^H will take small values and the averaging forecast will be based on the last few observations. 4 Practical performance 4.1 Forecast Methods We resort to the range of parametric forecast methods analysed in Giraitis et al. (213). Their weights are dened as functions of a tuning parameter. They discount past data and are known to be robust to historical and ongoing structural changes. For comparison, we consider parametric methods with xed and data-dependent discounting parameters. We compare forecasts against a number of simple benchmark models. In Section 2 we have introduced the Rolling window and EWMA methods. Rolling window. The weights are at in the sense that all the observations in the window get equal weights while the older data get zero weights. The one-step-ahead forecast by tjt 1 is then simply the average of H previous observations. In the tables we refer to this method as Rolling H: Besides selecting H optimally from data we use two xed window methods with H = 2 and 3: Exponential EWMA weights. The closer the parameter is to zero the faster is the rate of discounting and the main weights are concentrated on

11 Forecasting under structural change 11 the last few data points. The closer is to 1 the slower is the rate and signicant weights are attached to datum in distant past. In tables this method is denoted as Exponential. We consider several xed value downweighting methods with = :4; :6; :8; :9. The data-tuned parameter is denoted as b. Polynomial Method. This uses weights w tj;h = (t j) =P t 1 k=1 k ; 1 j t 1; with > : The past is downweighted at a slower rate than with exponential weights. This method is referred to as P olynomial : We do not consider any xed value for and only report data-dependent downweighting with estimated parameter b. Dynamic Weighting. Giraitis et al. (213) proposed a more exible extension of exponential weighting where the weights attached to the rst few lags are not determined by parametric functions, but rather freely chosen along with the tuning parameter, H. Thus, analogously to an AR process, the rst p weights, w 1 ; w 2 ; :::; w p are estimated as additional parameters, while the remaining weights are functions of H. The weight funcion is dened as: wj ; j = 1; :::; p ew tj;h = K(j=H); j = p + 1; ::::; t 1; H 2 I T ; and the nal weights are standardised as w tj;h = ew tj;h =P t 1 j=1 ew tj;h to sum to one. Note that Q T is jointly minimised over w 1 ; w 2 ; :::; w p and H. We consider a parsimonious representation by specifying p = 1 and choose exponential kernel K. We refer to it as Dynamic. Residual Methods. Giraitis et al. (213) argue that if a time series explicitly allows for modelling the conditional mean of the process and a forecaster has a preferred parametric model for it then it might be helpful to rst t the model and use the robust methods to forecast the residuals from the model. The original location model (1) is restrictive and not suitable for conditional modelling and a more generic forecasting model is therefore, proposed to illustrate the approach: y t = f(x t ) + y t; t = 1; 2; :::: where y t is the variable of interest, x t is the vector of predicted variables which may contain lags of y t ; and y t is the vector of residuals which are unexplained by f(x t ). In the presence of structural change, y t is expected to contain any remaining persistence in y t such as trends, breaks or other forms of dependence, and the robust methods should perform well in such scenario. Forecasts of f(x t ) and y t are then combined to generate improved forecasts of y t. We adopt the widely popular AR(1) process to model the conditional mean which gives f(x t ) = y t 1 : The residuals y t are forecasted using the parametric weights discussed above. The forecast of y t+1 based on y 1 ; y 2 ; :::; y t (9)

12 12 Liudas Giraitis, George Kapetanios, Mohaimen Mansur and Simon Price is computed as by t+1 = yt b + by : Two versions of the residual methods t+1jt; bh are considered. Exponential AR method. In this method the tuning parameter H and the autoregressive parameter are jointly estimated by minimising the in-sample mean squared forecast error, Q T;H = Q T;H which is computed by dening yt = y t y t 1 and using exponential weights. We refer to this as Exp: AR: Exponential residual method. This is a two-stage method, where the autoregressive parameter at y t 1 is estimated by OLS separately from the parameters associated with forecasting yt. It forecasts residuals yt = y t y t 1 using exponential weights producing H b and the forecast by : We refer to t+1jt; bh it as Exp: Residual: The benchmark and other competitors Full Sample mean. This benchmark forecast is the average of all observations in the sample: by benchmark;t +1 = 1 T TP t=1 y t : AR(1) forecast. We include forecasts based on an AR(1) dynamics which is often considered as a stable and consistent predictor of time series. The one-step-ahead forecast is given by: by T +1jT = b yt : Last observation forecast. For unit root process a simple yet competitive forecast is simply `no change' forecast: by T +1jT = y T : Averaging method. Pesaran and Timmermann (27) advocate a simple robust method where the one-step-ahead forecast y T +1jT is the average of the rolling window forecasts by T +1jT;H obtained using all possible window sizes, H, that include the last observation: TP y T +1jT = T 1 T +1jT;H ; by T +1jT;H = H H=1by 1 y t : t=t H+1 This method does not require selection of any discount parameter but the minimum window size used for forecasting, which is usually of minor signicance. We refer to this as Averaging. TP 4.2 Illustrative examples, Monte Carlo experiments Now we turn to the practical justication of the optimal properties of the selection procedure of H for y t = t + u t, where t is a persistent process (deterministic or stochastic trend) of unknown type, and u t is a stationary noise term. Our objective is to verify that the forecast y T +1jT; ^H of y T +1 with the optimal tuning parameter ^H produces comparable MSE's to those of the

13 Forecasting under structural change 13 best forecast y T +1jT;H with the xed H, e.g., we use H = 2; 3 for the rolling window and = :4; :6; :8 and :9 for the exponential weights. We consider 1 data generating processes as in Giraitis P et al. (213): Ex1: y t = u t : Ex7: y t = 2T 1=2 t i=1 v i + 3u t : Ex2: y t = :5t + 5u t : Ex8: y t = 2T P P 1=2 t i=1 v i + u t : t Ex3: y t = :5t + 3u t : Ex9: y t = :5 i=1 v i + u t : ut ; t :55T Ex4: y t = 1 + u t ; t > :55T: Ex5: y t = 2 sin(2t=t ) + 3u t : Ex6: y t = 2 sin(2t=t ) + u t : Ex1: y t = P t i=1 u i: In order to get a rst-hand idea about the dynamic behaviour of the generated series y t, it is useful to analyse their plots. Figure 1 shows plots of a trend stationary process y t of Ex3, for more plots see Mansur (213). In Ex7 9, v i IID(; 1). General patterns, observations, conclusions In Ex1, y t is determined by the noise process alone and there is no structural change. It is not surprising that forecasting an i.i.d. process requires accounting for a long past and that the benchmark sample mean should perform the best; see Table 1. Similarly, it is expected that a simple AR(1) benchmark will be dicult to outperform when forecasting persistent autoregressive processes. Long term dependence can create a false impression of structural change and make prior selection of a forecast model dicult. Additional persistence through autoregressive dependence could make the series closer to unit root. An AR(1) benchmark should still do well, but as persistence increases the `last observation' forecasts should be equally competitive. Both Ex2 and Ex3 introduce linear monotonically increasing trends in y t and dier only in the size of variance of noise process. Giraitis et al. (213) argue that such linear trends may be unrealistic but they can oer reasonable representations of time series which are detrended through standard techniques such as dierencing or ltering. Moreover, Figure 1 conrms that the eects of such trends are small enough to be dominated and muted by the noise processes. While linear trends are visually detectable for an i.i.d. noise, they become more obscure with increasing persistence. The panel (e) of Figure 1 conrms that when short and long memory persistence are combined, the trends can vanish completely. The functional form of y t in Ex4 accommodates structural break in the mean. The break occurs halfway the sample at time t = :55T. Giraitis et al. (213) argue that since the post-break period is greater than p T, as required by their theory, the robust forecasting methods should take account of such `not-too-recent' breaks and yield forecasts that are signicantly better than the benchmark sample mean. Their Monte Carlo study conrms their claims. Although the shift in mean can be well identied in i.i.d. or weak long

14 14 Liudas Giraitis, George Kapetanios, Mohaimen Mansur and Simon Price memory series, it becomes more concealed with increasing persistence in the noise process. Thus, it is of interest to see how methods with data-dependent discounting cope with these complex situations. The purpose of Ex5 and Ex6 is to introduce smooth cyclical bounded trends as observed in standard business cycles. Such trends are less likely to be completely removed from standard detrending and therefore, more realistic than a linear trend. The sample mean benchmark should do poorly, particularly for Ex6 where oscillation of the trend is wider compared to the variance of the noise process. Realisations of such processes show that higher persistence can distort shapes of smooth cycles to substantial extent. Ex7 and Ex8 accommodate the bounded stochastic trend t 's and represent increasingly popular time-varying coecients type dynamic models. Ex9 considers unbounded random walk (unit root) process, observed under noise u t. Ex1 analyses the standard unit root model. In general, the time series plots of Ex1 1 show that long memory can give false impression of structural change. Moreover, persistence in the noise processes induced by long memory or mixture of short and long memory dependence can confound types of structural changes in a time series. Thus is worth investigating whether typical robust-to-structrual-change methods, such as rolling window and EWMA methods, can perform well in forecasting in presence of long memory. We argue that as long as the choice of tuning parameter is data-dependent such methods can generate forecasts that are comparable to the best possible xed parameter forecasts. 4 (a) i.i.d. 2 (b) AR with = (c) ARFIMA( ) with = (d) AR with = (e) ARFIMA(1 ) with = 3 and = (f ) ARFIMA(1 ) with = 3 and = Fig. 1 Plots of generated series y t = :5t+3u t in Ex3 for dierent noise u t : a) i.i.d., b) AR(1) with = :7, c) ARF IM A(,d,) with d = :3, d) AR(1) with = :7, e) ARF IM A(1,d,) with d = :3 and = :7, f) ARF IM A(1,d,) with d = :3 and = :7.

15 Forecasting under structural change 15 Monte Carlo results We discuss Monte Carlo results of small sample performance of the adaptive forecasting techniques in predicting time series y t = t + u t with i.i.d. and long memory noise u t. In modelling the noise we use the standard normal i.i.d. noise u t IID(; 1), and we opt to use the long memory ARF IMA(1; d; ) model for u t dened as: (1 L)(1 L) d u t = " t ; where jj < 1 is the AR(1) parameter, < d < 1=2 is the long memory parameter that induces long memory property (ii) and L is the lag operator. After choosing a starting point = T 1, we apply reported methods to construct one-step-ahead forecasts by tjt 1;H ; t = ; :::; T. We compare performance of method j with the forecast error MSF E j = (T + 1) 1 X T t= (by(j) tjt 1;H y t) 2 with the benchmark forecast by P sample mean y t with the forecast error MSF E sm := (T + 1) 1 T t= (y t y t ) 2 by computing the relative MSF Ej RMSF E = MSF E sm : Results for dierent long memory specications of the noise processes are presented in Tables 2-3. The columns represent data-generating models Ex1 Ex1 and the rows represent dierent forecasting methods. Entries of the tables are M SF E of dierent methods relative to sample average, as dened above. We begin by discussing the results in Tables 1 and 2 which feature i.i.d and long memory ARF IMA(; :3; ) noises. Table 1 records sole dominance of the benchmark over the competitors when y t = u t which is expected, and gains over the benchmark when y t has a persistent component t. In Table 2, RMSF E values below unity suggest that, in general, all the reported forecasting methods, both with xed and data-driven discounting, are useful for processes with moderately strong long memory. Even the simplest case of `no structural change', y t = u t reported in the rst column Ex1 of Table 2 shows that forecasts of most of the competing methods, including the rolling-window schemes, outperform the benchmark of the full-sample average. The gains are, however, small. Gains over the benchmark are more pronounced when y t has a persistent component t. Then, even naive `last observation' forecasts are better than the mean forecast in most of the experiments. Persistence entering y t through long memory u t requires stronger discounting than for i.i.d. noise u t and using information contained in the more recent past. The data-dependent exponential weights do not exactly match the best xed value forecast method but are reasonably comparable and are never among the worst performing methods. Methods using data-adjusted rolling-window forecast better than methods with xed windows of size H = 2 and H = 3 and also outperform

16 16 Liudas Giraitis, George Kapetanios, Mohaimen Mansur and Simon Price the averaging method of rolling windows advocated by Pesaran and Timmermann (27). This justies the use of data-driven choice of downweighting parameter for rolling windows. Overall, comparison of competing forecasting methods in Tables 1 and 2 show that the full sample AR(1) forecasts are in general very good compared to the benchmark, but are often outperformed by most of the adaptive data-tuned methods. Forecasts based on the residual methods are impressive. Among the adaptive robust forecasting methods the dynamic weighting method, where the weight of the last observation is optimally chosen from data simultaneously with the exponential weighting parameter, consistently provides forecasts that are comparable to the best possible forecasts for all the experiments. The exponential AR method is also equally competitive. The advantages of data-based adaptive forecasting methods become clearly evident when we consider ARF IMA(1; :3; ) noise u t with a negative AR coecient = :7. Table 3 reports the corresponding RMSF E's. Although the full sample AR(1) forecast consistently beats the benchmark sample mean, it is outperformed by most of the adaptive forecasting techniques including the rolling window methods. Notable dierences between the results of ARF IMA(1; :3; ) with positive = :7, which we do not report, and those from models with negative AR coecient, are that margins of gains over the benchmark are higher in the former and that forecasts using data tuned exponential and rolling-window methods become more comparable, to AR forecasts, in the latter. For = :7, the data-based selection of downweighting, particularly, the dynamic weighting and the exponential AR weighting are the most dominant predictors. The residual methods also generate very good forecasts in most of the experiments. Maximum reduction in relative MSF E of the xed parameter EWMA methods comes from methods with very low discounting rates emphasising the necessity of including information of the more distant past. The optimally chosen exponential weights lead to forecasts that are comparable to the forecasts generated by the best performing xed parameter methods. The `no-change' (Last Observation) forecast is by far the worst reporting RMSF E's which are mostly much higher than unity. The Monte Carlo experiments with i.i.d. and long memory time series noise u t generated by ARF IMA models conrm that accuracy of forecasts varies based on the degree of persistence and consequently, depends on appropriate downweighting of past observations. The facts that many of the data-tuned discounting always match, if not outperform, the best forecast with xed downweighting parameter and that the optimal rate of discounting cannot be observed in advance, prove the superiority of data-tuned adaptive forecasting techniques, particularly when facing structural changes.

17 Forecasting under structural change 17 Table 1 Monte Carlo Results. T = 2. u t IID(; 1). Relative MSFE's of one-step ahead forecasts with respect to the full sample mean benchmark. Experiments Method Ex1 Ex2 Ex3 Ex4 Ex5 Ex6 Ex7 Ex8 Ex9 Ex1 Exponential = ^ Rolling H = ^H Rolling H = H = Exponential = : = : = : = : Averaging P olynomial = ^ Dynamic Exp: AR Exp: Residual Last Obs: AR(1) Table 2 Monte Carlo Results. T = 2. u t ARF IM A(; :3; ). Relative MSFE's of one-step ahead forecasts with respect to the full sample mean benchmark. Experiments Method Ex1 Ex2 Ex3 Ex4 Ex5 Ex6 Ex7 Ex8 Ex9 Ex1 Exponential = ^ Rolling H = ^H Rolling H = H = Exponential = : = : = : = : Averaging P olynomial = ^ Dynamic Exp: AR Exp: Residual Last Obs: AR(1) Table 3 Monte Carlo Results. T = 2. u t ARF IM A(1; :3; ) with = :7. Relative MSFE's of one-step ahead forecasts with the full sample mean benchmark. Experiments Method Ex1 Ex2 Ex3 Ex4 Ex5 Ex6 Ex7 Ex8 Ex9 Ex1 Exponential = ^ Rolling H = ^H Rolling H = H = Exponential = : = : = : = : Averaging P olynomial = ^ Dynamic Exp: AR Exp: Residual Last Obs: AR(1)

18 18 Liudas Giraitis, George Kapetanios, Mohaimen Mansur and Simon Price References 1. D'Agostino, A., Gambetti, L. and Giannone, D. (213). Macroeconomic forecasting and structural change. Journal of Applied Econometrics, 28, Ang, A. and Bekaert, G. (22). Regime switches in interest rates. Journal of Business & Economic Statistics, 2(2), Clark, T. E. and McCracken, M. W. (21). Averaging forecasts from VARs with uncertain instabilities. Journal of Applied Econometrics, 25(1), Clements, M. P., and Krolzig, H. M. (1998). A comparison of the forecast performance of Markov-switching and threshold autoregressive models of US GNP. The Econometrics Journal, 1(1), Diebold, F. X. and Inoue, A. (21). Long memory and regime switching. Journal of Econometrics, 15(1), Eklund, J., Kapetanios, G. and Price, S. (21). Forecasting in the presence of recent structural change. Bank of England Working Paper, Garcia, R. and Perron, P. (1996). An analysis of the real interest rate under regime shifts. The Review of Economics and Statistics, 79, Giraitis, L., Kapetanios, G. and Price, S. (213). Adaptive forecasting in the presence of recent and ongoing structural change. Journal of Econometrics, in press. 9. Gourieroux, C. and Jasiak, J. (21). Memory and infrequent breaks. Economics Letters, 7(1), Granger, C. W. and Hyung, N. (24). Occasional structural breaks and long memory with an application to the S&P 5 absolute stock returns. Journal of Empirical Finance, 11(3), Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica, 57(2), Hendry, D. F. (2). On detectable and non-detectable structural change. Structural Change and Economic Dynamics, 11(1), Kapetanios G. (26). Nonlinear autoregressive models and long memory Economics Letters, 91(3), Mansur, M. (213). PhD Thesis. Queen Mary, University of London. 15. Koop, G. and Potter, S. M. (27). Estimation and forecasting in models with multiple breaks. The Review of Economic Studies, 74(3), Maheu, J. M. and Gordon, S. (28). Learning, forecasting and structural breaks. Journal of Applied Econometrics, 23(5), Stock, J. H. and Watson, M. W. (1996). Evidence on structural instability in macroeconomic time series relations. Journal of Business & Economic Statistics, 14(1), Stock, J. H. and Watson, M. W. (27). Why has US ination become harder to forecast? Journal of Money, Credit and Banking, 39(s1), Pesaran, M. H. and Pick, A. (211). Forecast combination across estimation windows. Journal of Business & Economic Statistics, 29(2), Pesaran, M. H. and Timmermann, A. (22). Market timing and return prediction under model instability. Journal of Empirical Finance, 9(5), Pesaran, M. H. and Timmermann, A. (27). Selection of estimation window in the presence of breaks. Journal of Econometrics, 137(1), Pesaran, M. H., Pettenuzzo, D. and Timmermann, A. (26). Forecasting time series subject to multiple structural breaks. The Review of Economic Studies, 73(4), Rossi, B. (212). Advances in forecasting under instability. In: Elliott, G., Timmermann, A. (Eds.), Handbook of Economic Forecasting. Elsevier, North Holland. 24. Timmermann, A. (21). Structural breaks, incomplete information, and stock prices. Journal of Business & Economic Statistics, 19(3),

Working Paper No. 490 Adaptive forecasting in the presence of recent and ongoing structural change

Working Paper No. 490 Adaptive forecasting in the presence of recent and ongoing structural change Liudas Giraitis, George Kapetanios and Simon Price March 2014 Working papers describe research in progress