Forecasting Seasonal Time Series 1. Introduction. Philip Hans Franses Econometric Institute Erasmus University Rotterdam

Forecasting Seasonal Time Series 1. Introduction Philip Hans Franses Econometric Institute Erasmus University Rotterdam SMU and NUS, Singapore, April-May 2004 1

Outline of tutorial lectures 1 Introduction 2 Basic models 3 Advanced models 4 Further topics 2

1. Introduction How do seasonal time series (in macroeconomics, finance and marketing) look like? What do we want to forecast? Why is seasonal adjustment often problematic? 3

The contents of this tutorial are largely based on Ghysels and Osborn (2001), The Econometric Analysis of Seasonal Time Series, Cambridge UP Franses and Paap (2004), Periodic Time Series Models, Oxford UP. 4

These series are the running examples throughout the lectures: Private consumption and GDP for Japan, quarterly, 1980.1-2001.2 (source: economagic.com) M1 for Australia, quarterly, 1975.2-2004.1 (source: economagic.com) Total industrial production for the USA, monthly, 1919.01-2004.02 (source: economagic.com) Decile portfolios (based on size), NYSE, monthly, 1926.08-2002.12 (source: website of Kenneth French) (ranked according to market capitalization) Sales of instant decaf coffee, Netherlands, weekly, 1994 week 29 to 1998 week 28 5

How do seasonal time series (in macroeconomics, finance and marketing) look like? Tools: Graphs (over time, or per season) Autocorrelations (after somehow removing the trend, usually: 1 y t = y t y t 1 ) R 2 of regression of 1 y t on S seasonal dummies or on S (or less) sines and cosines Regression of squared residuals from AR model for 1 y t on constant and S 1 seasonal dummies (to check for seasonal variation in error variance) Autocorrelations per season (to see if there is periodicity). Note: these are all just first-stage tools, to see in what direction one could proceed. They should not be interpreted as final models. 6

160000 140000 120000 100000 80000 60000 40000 80 82 84 86 88 90 92 94 96 98 00 CONSUMPTION INCOME 7

11.3 LC by Season 11.2 11.1 11.0 10.9 10.8 Q1 Q2 Q3 Q4 10.7 10.6 80 82 84 86 88 90 92 94 96 98 00 8

The split seasonals graph for consumption indicates a rather constant pattern. The R 2 of the seasonal dummy regression for 1 y t is 0.927 for log(consumption) and log(income) it is 0.943. A suitable model for both log(consumption) and log(income) turns out to be 1 y t = 4 s=1 δ s D s,t +ρ 1 1 y t 1 +ε t +θ 4 ε t 4 (1) The R 2 of these models is 0.963 and 0.975, respectively. Hence, constant deterministic seasonality accounts for the majority of trend-free variation in the data. 9

22000 20000 18000 16000 14000 12000 10000 8000 1980 1985 1990 1995 2000 M1 10

.10.08.06.04.02.00 -.02 -.04 -.06 1980 1985 1990 1995 2000 LOG(M1)-LOG(M1(-1)) 11

The log of Australian M1 obeys a trend, and some seasonality, although at first sight not very regular. A regression of the growth rates (differences in logs) on seasonal dummies gives an R 2 of 0.203. An AR(5) fits the data for 1 y t (although the residuals are not entirely white ). The regression of the squares of these residuals on an intercept and 3 seasonal dummies gives an F value of 5.167, hence there is seasonal variation in the variance. If one fits an AR(2) model for each of the seasons, that is, regress 1 y t on 1 y t 1 and 1 y t 2 but allow for different parameters for the seasons, then the estimation results for quarters 1 to 4 are (0.929, 0.235), (0.226, 0.769), (0.070, -0.478), and (0.533, -0.203). This suggests that different models for different seasons might be useful. 12

120 100 80 60 40 20 0 20 30 40 50 60 70 80 90 00 IPUSA 13

.16.12.08.04.00 -.04 -.08 -.12 -.16 20 30 40 50 60 70 80 90 00 DY 14

The graphs suggest that the data might be nonlinear and that there is a break in the variance, see lecture 4. For now, this is ignored. A regression of 1 of log(production) on S = 12 seasonal dummies gives an R 2 of 0.374. Adding lags at 1, 12 and 13 to this auxiliary regression model improves the fit to 0.524 (for 1008 data points!), with parameters 0.398, 0.280 and -0.290. There is no residual autocorrelation. The sum of these parameters is 0.388. Hence, for multi-step ahead forecasts, constant seasonality dominates. There is also no seasonal heteroskedasticity. 15

The returns (y t ) of the 10 decile portfolios (based on size), NYSE, monthly, 1926.08-2002.12, can be best described by y t = 12 s=1 δ s D s,t + ε t (2) ε t = ρ 1 ε t 1 + u t (3) One might expect a January effect, and more so for the smaller stocks. The graphs shows the estimates of ˆδ s for the first two and last two deciles. Also the R 2 values are higher for lower deciles. 16

10 8 6 4 2 0-2 1 2 3 4 5 6 7 8 9 10 11 12 DEC1 DEC2 DEC9 DEC10 17

Comparing the estimated parameters for the decile models and their associated standard errors suggests that only a few parameters are significant. This is even more so in case one has, say, weekly data, which are common in marketing and usually coined as store-level data. One might then consider y t = 52 s=1 δ s D s,t + ε t, (4) but that involves a large amount of parameters. One can then reduce this number by deleting certain D s,t variables, but this might complicate the interpretation. 18

In that case, a more sensible model is y t = µ + 26 k=1 where t = 1, 2,... [α k cos( 2πkt 52 ) + β k sin( 2πkt 52 )] + ε t, (5) A cycle within 52 weeks (an annual cycle) then corresponds with k = 1, and a cycle within 4 weeks corresponds with k = 13. Other reasonable cycles would correspond with 2 weeks, 13 weeks and 26 weeks (k = 26, 4 and 2, respectively). Note that sin( 2πkt 52 ) is equal to 0 for k = 26, hence the µ. The next graph shows the fit of this model, where ε t is an AR(1) process, and where only cycles within 2, 4, 13, 26 and 52 weeks are considered, and hence there are only 9 variables to characterize seasonality. The fit is 0.139. 19

9.2 8.8 8.4 8.0 7.6 7.2 25 50 75 100 125 150 175 200 SEASFIT LOG(SALES02) 20

What do we want to forecast? When one constructs forecasting models for disaggregated data, one usually want to use the models for out-of-sample forecasting of such data. Hence, for seasonal data with seasonal frequency S, one usually considers forecasting 1-step ahead, S 2-steps ahead or S-steps ahead. One may also want to forecast 1 to S steps ahead, that is, say, a year. It is unknown whether it pays off to use models for disaggregated data. Hence, if one wants to forecast a year ahead, it is uncertain whether a model for the monthly data does better than a model for annual data. Indeed, there are more data, but also more noise and more effort needed to make a model. If one aims to forecast a year ahead, one can also use a model for seasonally adjusted data. (Otherwise : better not!) 21

Why is seasonal adjustment often problematic? Seasonal adjustment is usually applied to macroeconomic series, in order to make the trend and the cycle more visible. However, often only seasonally adjusted data are officially released. Seasonally adjusted data are estimated data, and one should better provide standard errors (Koopman, Franses, OBES 2003) Estimated innovations do not concern true innovations Uncertainty about the variable to be forecast: what is it? Correlations across variables disturbed Why not just take S y t? or 1 y t after removal of seasonal dummies? Season, trend and cycle can be related. 22

Conclusion for lecture 1 There is substantial seasonal variation in many economic and business time series. It seems wise to make a model for the seasonal data, in order to forecast out of sample, but also to construct multivariate models. As we will see in the next lectures: making models for seasonal data is not that difficult. 23

Forecasting Seasonal Time Series 2. Basic Models Philip Hans Franses Econometric Institute Erasmus University Rotterdam SMU and NUS, Singapore, April-May 2004 1

Outline of tutorial lectures 1 Introduction 2 Basic models 3 Advanced models 4 Further topics 2

2. Basic Models Constant seasonality (seasonal dummies, sines and cosines) Seasonal random walk Airline model Basic structural model 3

Constant seasonality Why could the seasonal pattern be constant? Weather: harvests, ice-free lakes and harbors, consumption patterns, but also: mood (think of consumer survey data with seasonality due to less good mood in October and better mood in January) Calendar: festivals, holidays Institutions: Tax year, end-of-year bonus, 13-th month salary, children s holidays 4

A general model for constant seasonality is S 2 y t = µ + [α k cos( 2πkt S ) + β k sin( 2πkt S )] + u t, k=1 (1) where t = 1, 2,... and u t is some ARMA type process. For example, for S=4, one has y t = µ+α 1 cos( 1 2 πt)+β 1 sin( 1 2 πt)+α 2 cos(πt)+u t, (2) where cos( 1 2πt) is the variable (0,-1,0,1,0,-1,...) and sin( 1 2πt) is (1,0,-1,0,1,0,...) and cos(πt) is (-1,1,-1,1,...). If one considers y t + y t 2, which can be written as (1 + L 2 )y t, then (1 + L 2 ) cos( 1 2 πt) = 0 and (1 + L 2 ) sin( 1 2πt) = 0. Moreover, (1 + L) cos(πt) = 0, and of course, (1 L)µ = 0. 5

This shows that deterministic seasonality can be removed by applying the transformation (1 L)(1+L)(1+L 2 ) = 1 L 4. Note that this also implies that the error term becomes (1 L 4 )u t. Naturally, the relation with the often applied seasonal dummies regression, that is y t = 4 s=1 δ s D s,t + u t, (3) is that µ = 4 s=1 δ s, that α 1 = δ 4 δ 2, β 1 = δ 1 δ 3 and α 2 = δ 4 δ 3 + δ 2 δ 1 6

The constant seasonality model seems applicable to many marketing and tourism series, a little less so in finance, but it is often not fully descriptive for macroeconomic series. Many macroeconomic series display some form of changing seasonality. Why could the seasonal pattern be changing? Changing consumption patterns (pay for holidays in advance, eating ice in winter) Changing institutions: Tax year, end-ofyear bonus, 13-th month salary, children s holidays Different responses to exogenous shocks in different seasons Shocks may occur more often in some seasons. 7

Seasonal random walk A simple model that allows the seasonal pattern to change over time is the seasonal random walk, given by y t = y t S + ε t. (4) To understand why seasonality may change, consider the S annual time series Y s,t. The seasonal random walk implies that for these annual series it holds that Y s,t = Y s,t 1 + ε s,t. (5) Hence, each season follows a random walk, and due to the innovations, the annual series may intersect, such that summer becomes winter. From graphs it is not easy to discern whether a series is a seasonal random walk or not. The observable pattern depends on the starting values of the time series, see the next graph. 8

100 80 60 40 Y: starting values all zero X: starting values 10,-8,14 and -12 120 80 40 0 20-40 0 60 65 70 75 80 85 90 95 Y X 9

So, due to widely varying starting values, the observed pattern of a seasonal random walk can be very regular. Of course, the autocorrelation function can be helpful, as the application of the filter 1 L S to y t should result in a white noise (uncorrelated) time series. Even though the realizations of a seasonal random walk can show substantial variation, the out-of-sample forecasts are constant. Indeed, at time n, the forecasts are ŷ n+1 = y n+1 S (6) ŷ n+2 = y n+2 S (7) : (8) ŷ n+s = y n (9) ŷ n+s+1 = ŷ n+1 (10) ŷ n+s+2 = ŷ n+2 (11) 10

A subtle form of changing seasonality can also be described by a time-varying seasonal dummy parameter model, that is, for S = 4, where y t = 4 s=1 δ s,t D s,t + u t, (12) δ s,t = δ s,t S + ε s,t. (13) When the variance of ε s,t = 0, then the constant parameter model appears. The amount of variation depends on the variance of ε s,t. The next slide gives a graphical example (for simulated data). Canova and Hansen (JBES, 1995) take this model to test for constant seasonality. 11

20 10 0 20 10 0-10 -20-10 -20-30 X: constant parameters Y: parameter for season 1 is random walk 60 65 70 75 80 85 90 95 Y X 12

Airline model An often applied model, popularized by Box and Jenkins (1970) and named after its application to monthly airline passenger data, is the Airline model, given by (1 L)(1 L S )y t = (1+θ 1 L)(1+θ S L S )ε t. (14) Note that (1 L)(1 L S )y t = y t y t 1 y t S + y t S 1 (15) It can be proved (Bell, JBES 1987) that as θ S = 1, the model reduces to (1 L)y t = S s=1 δ s D s,t + (1 + θ 1 L)ε t. (16) 13

Strictly speaking, the airline model assumes S + 1 unit roots. This is due to the fact that for the characteristic equation (1 z)(1 z S ) = 0 (17) there are S +1 solutions on the unit circle. For example, if S = 4 the solutions are (1, 1, -1, i, -i). This implies a substantial amount of random walk like behavior, even though it is corrected to some extent by the (1 + θ 1 L)(1 + θ S L S )ε t part of the model. In terms of forecasting, it assumes very wide confidence intervals around the point forecasts. Some estimation results for the airline model are given next. 14

Dependent Variable: LC-LC(-1)-LC(-4)+LC(-5) Method: Least Squares Date: 04/07/04 Time: 09:48 Sample(adjusted): 1981:2 2001:2 Included observations: 81 after adjusting endpoints Convergence achieved after 10 iterations Backcast: 1980:1 1981:1 Variable Coefficient Std. Error t-statistic Prob. C -0.000329 0.000264-1.246930 0.2162 MA(1) -0.583998 0.088580-6.592921 0.0000 SMA(4) -0.591919 0.090172-6.564320 0.0000 R-squared 0.444787 Mean dependent var -5.01E-05 Adjusted R-squared 0.430550 S.D. dependent var 0.016590 S.E. of regression 0.012519 Akaike info criterion -5.886745 Sum squared resid 0.012225 Schwarz criterion -5.798061 Log likelihood 241.4132 F-statistic 31.24325 Durbin-Watson stat 2.073346 Prob(F-statistic) 0.000000 Inverted MA Roots.88.58.00+.88i -.00 -.88i -.88 15

Dependent Variable: LM-LM(-1)-LM(-4)+LM(-5) Method: Least Squares Date: 04/07/04 Time: 09:50 Sample(adjusted): 1976:3 2004:1 Included observations: 111 after adjusting endpoints Convergence achieved after 16 iterations Backcast: 1975:2 1976:2 Variable Coefficient Std. Error t-statistic Prob. C -0.000192 0.000320-0.599251 0.5503 MA(1) 0.353851 0.088144 4.014489 0.0001 SMA(4) -0.951464 0.016463-57.79489 0.0000 R-squared 0.647536 Mean dependent var -3.88E-06 Adjusted R-squared 0.641009 S.D. dependent var 0.032112 S.E. of regression 0.019240 Akaike info criterion -5.036961 Sum squared resid 0.039981 Schwarz criterion -4.963730 Log likelihood 282.5513 F-statistic 99.20713 Durbin-Watson stat 2.121428 Prob(F-statistic) 0.000000 Inverted MA Roots.99.00+.99i -.00 -.99i -.35 -.99 16

Dependent Variable: LI-LI(-1)-LI(-12)+LI(-13) Method: Least Squares Date: 04/07/04 Time: 09:52 Sample(adjusted): 1920:02 2004:02 Included observations: 1009 after adjusting endpoints Convergence achieved after 11 iterations Backcast: 1919:01 1920:01 Variable Coefficient Std. Error t-statistic Prob. C 2.38E-05 0.000196 0.121473 0.9033 MA(1) 0.378388 0.029006 13.04509 0.0000 SMA(12) -0.805799 0.016884-47.72483 0.0000 R-squared 0.522387 Mean dependent var -0.000108 Adjusted R-squared 0.521437 S.D. dependent var 0.031842 S.E. of regression 0.022028 Akaike info criterion -4.790041 Sum squared resid 0.488142 Schwarz criterion -4.775422 Log likelihood 2419.576 F-statistic 550.1535 Durbin-Watson stat 1.839223 Prob(F-statistic) 0.000000 Inverted MA Roots.98.85+.49i.85 -.49i.49+.85i.49 -.85i.00 -.98i -.00+.98i -.38 -.49 -.85i -.49+.85i -.85 -.49i -.85+.49i -.98 17

Basic structural model A structural time series model for a quarterly time series can look like y t = µ t + s t + w t, w t NID(0, σ 2 w ) (18) (1 L) 2 µ t = u t, u t NID(0, σ 2 u) (19) (1 + L + L 2 + L 3 )s t = v t, v t NID(0, σv 2 ) (20) where the error processes w t, u t and v t are also mutually independent, and where NID denotes normally and independently distributed. 18

These three equations together imply that y t can be described by (1 L)(1 L 4 )y t = ζ t (21) where ζ t is a moving average process of order 5 [MA(5)]. The autocovariances γ k, k = 0, 1, 2,..., for ζ t are γ 0 = 4σ 2 u + 6σ2 v + 4σ2 w (22) γ 1 = 3σ 2 u 4σ2 v 2σ2 w (23) γ 2 = 2σ 2 u + σ 2 v (24) γ 3 = σ 2 u + σ2 w (25) γ 4 = 2σ 2 w (26) γ 5 = σ 2 w (27) γ j = 0 for j = 6, 7,.... (28) 19

Conclusion for lecture 2 There are various ways one can describe a time series (and use that description for forecasting) with constant or changing seasonal variation. The next lecture will add even more to this list. To make a choice, one needs to test models against each other. A key distinguishing feature of the models is the amount of random walk behavior, or better, the amount of unit roots (seasonal and non-seasonal). Part of lecture 3 will be devoted to the choice process. An informal look at graphs and autocorrelation functions is not sufficient. One needs to make a decision based on formal tests on parameter values in auxiliary regressions. 20

Forecasting Seasonal Time Series 3. Advanced Models Philip Hans Franses Econometric Institute Erasmus University Rotterdam SMU and NUS, Singapore, April-May 2004 1

Outline of tutorial lectures 1 Introduction 2 Basic models 3 Advanced models 4 Further topics 2

3. Advanced Models Seasonal unit roots Seasonal cointegration Periodic models Unit roots in periodic time series Periodic cointegration 3

A time series variable has a unit root if the autoregressive polynomial (of the autoregressive model that best describes this variable), contains the component 1 L, and the movingaverage part does not, where L denotes the familiar lag operator defined by L k y t = y t k, for k =..., 2, 1, 0, 1, 2,.... For example, the model y t = y t 1 + ε t has a first-order autoregressive polynomial 1 L, as it can be written as (1 L)y t = ε t, and hence data that can be described by this model, which is coined the random walk model, are said to have a unit root. The same holds of course for the model y t = µ + y t 1 + ε t, which is called a random walk with drift. 4

Solving this last model to the first observation, that is, y t = y 0 + µt + ε t + ε t 1 + + ε 1 (1) shows that such data display a trend. Due to the summation of the error terms, it is possible that data diverge from the overall trend µt for a long time, and hence at first sight one would conclude from a graph that there are all kinds of temporary trends. Therefore, such data are sometimes said to have a stochastic trend. The unit roots in seasonal data, which can be associated with changing seasonality, are the so-called seasonal unit roots, see Hylleberg et al (JofE, 1990). 5

For quarterly data, these roots are 1, i, and i. For example, data generated from the model y t = y t 1 + ε t would display seasonality, but if one were to make graphs with the split seasonals, then one could observe that the quarterly data within a year shift places quite frequently. Similar observations hold for the model y t = y t 2 +ε t, which can be written as (1+L 2 )y t = ε t, where the autoregressive polynomial 1 + L 2 corresponds to the seasonal unit roots i and i, as these two values solve the equation 1+z 2 = 0. Hence, when a model for y t contains an autoregressive polynomial with roots 1 and/or i, i, the data are said to have seasonal unit roots. 6