FORECASTING ENERGY USAGE IN THE INDUSTRIAL SECTOR IN SWEDEN USING SARIMA AND DYNAMIC REGRESSION

Size: px

Start display at page:

Download "FORECASTING ENERGY USAGE IN THE INDUSTRIAL SECTOR IN SWEDEN USING SARIMA AND DYNAMIC REGRESSION"

Scott Hampton
5 years ago
Views:

FORECASTING ENERGY USAGE IN THE INDUSTRIAL SECTOR IN SWEDEN USING SARIMA AND DYNAMIC REGRESSION Submitted by Carl Anners A thesis submitted to the Department of Statistics

1 FORECASTING ENERGY USAGE IN THE INDUSTRIAL SECTOR IN SWEDEN USING SARIMA AND DYNAMIC REGRESSION Submitted by Carl Anners A thesis submitted to the Department of Statistics in partial fulfillment of the requirements for a two-year Master degree in Statistics in the Faculty of Social Sciences Supervisors Katrin Kraus & Johan Lyhagen Spring, 2017

2 ABSTRACT Accurate prediction of future events is of great interest in various contexts. This thesis focuses on forecasting and predicting energy usage in the industrial sector, which provides valuable information for government agencies to plan and allocate the available budget. More specifically, the purpose is to evaluate if using explanatory variables in a dynamic regression with seasonal autoregressive integrated moving average (SARIMA) errors improves the forecasting accuracy of quarterly energy usage in the industrial sector in Sweden compared to a standard SARIMA model. The SARIMA model used for comparison is SARIMA(1,0,0)(0,1,1), while the dynamic regression model used has the explanatory variable value added of the industrial sector and SARIMA(1,0,0)(0,1,1) errors. The forecast performance of the two models is compared for both quarterly and yearly forecast horizons using root mean squared error (RMSE) and mean absolute error (MAE). The results show that the RMSE and MAE of the dynamic regression model are lower for both forecast horizons compared to the SARIMA model. Also, a significance test (OOS-t) and an encompassing test (ENC-NEW) are employed, which show that the difference in forecasting accuracy is statistically significant and that the SARIMA forecast doesn t encompass the forecast of the dynamic regression model.

3 Contents 1 Introduction 1 2 Theory and Methodology SARIMA Box-Jenkins Methodology Stationarity Order Selection Parameter Estimation Residual Diagnostics Dynamic Regression General Model Engle-Granger-Hylleberg-Lee Test Estimation and Forecasting Forecast Comparison Root Mean Squared Error and Mean Absolute Error OOS-t and ENC-NEW Tests Data 19 4 Results SARIMA Stationarity Test Model Selection Estimation and Residual Diagnostics Dynamic Regression Model Explanatory Variables Order of Integration Seasonal Cointegration and Stationarity Model Selection and Estimation Residual Diagnostics Out-of-sample Forecast Comparison Conclusion 36

4 1 1 Introduction Every year the Swedish Energy Agency publishes a report on the energy usage in Sweden. Amongst other things, this report includes a forecast of the total energy usage in the industry, housing and transport sectors. This forecast is on behalf of the Ministry of Finance and is used as basis for the budget proposal. The focus of this thesis is on the forecast of the industry sector. The industry sector is responsible for a large part of Sweden s energy usage. In 2015 the total energy usage in Sweden was 369 TWh, with the energy usage in the industry sector being 137 TWh, thereby accounting for 37% percent of the energy usage (Energimyndigheten, 2016). For this reason, forecasting the energy usage in the industry sector is important for the Ministry of Finance to allocate the available budget. The importance of forecasts for this budget was highlighted in March 2017, when a report from the Swedish National Audit Office criticized the Ministry of Finance for having too optimistic forecasts for future public finances, which in turn gives a false basis for higher public expenditure and lower taxes (Riksrevisionen, 2017). Currently the forecast is conducted by looking at a forecast of the value added of the industry sector. However, this procedure is somewhat ad-hoc, therefore estimating a new model using time series analysis techniques is relevant to give more robust predictions over time (Energimyndigheten, 2016). The purpose of this study is to investigate if including the explanatory variables "value added of the industry sector" and/or "energy prices" in a dynamic regression model with SARIMA errors improves the forecasting accuracy of energy usage in the industry sector compared to a standard SARIMA model. The reason for evaluating energy prices is that lower energy prices should lead to lower production costs as well as less incentive for using less energy and therefore should have an effect on the energy usage. Bayar and Kilic (2014) investigated this and found that oil and natural gas prices are statistically significant and have a negative effect on industrial production in the Eurozone. Prices are however not included in the Swedish Energy Agency s current forecasting procedure. The motivation for evaluating "value added of the industry sector" is that it is used in the current forecasting procedure. To answer this, a dynamic regression model with SARIMA errors, which uses either one or both explanatory variables is fitted. This model will then be compared to a

5 2 SARIMA model, which on the contrary, only includes past observations in the model. Furthermore, due to the release timing of the reports, the proposed model needs to be used for both a quarterly forecast and a yearly forecast, hence the two estimated models will be compared for both forecast horizons. The motivation for using the dynamic regression model with SARIMA errors is that it includes explanatory variables, while at the same time incorporates past observations. Unlike for example vector autoregressive models, dynamic regression with SARIMA errors is limited in the sense that, when forecasting it also requires values or forecasts of the explanatory variables for the forecast period. The Swedish Energy Agency however has forecasts for the explanatory variables being investigated readily available from the National Institute of Economic Research for value added of the industrial sector and the World Bank for energy prices. This makes the dynamic regression model with SARIMA errors suitable for the task of forecasting energy usage in the industrial sector. SARIMA and ARIMA models have been used extensively to forecast different forms of energy data, for example Erdogdu (2006) uses an ARIMA model to forecast electricity demand in Turkey and compares it with official forecasts. He finds that the official projection overestimate the electricity demand in Turkey. Another example from Turkey is Ediger and Akar (2006) who use ARIMA and SARIMA models to forecast energy demand by fuel. They compare forecasting each fuel individually and then summing up the forecasts with fitting an ARIMA model on the total energy demand directly. The results show that the latter gives lower forecast errors. Gonzales Chavez, Xiberta Bernat and Llaneza Coalla (1999) utilizes ARIMA models to forecast domestic and industrial electric energy consumption but also black-coal, antracite and electric energy production in northern Spain. These models are compared to Exponential Smoothing and regression models with the ARIMA models fitted being found to outperform the Exponential Smooting and regression models in all five cases. Abdel-Aal and Al-Garni (1997) fit a SARIMA model to forecast electric energy consumption in eastern Saudi Arabia. The SARIMA model fitted is shown to outperform the regression and abductive network machine-learning models developed earlier on the same data. Expanding the ARIMA or SARIMA model to include explanatory variables has been done in plenty of previous studies, although to the best of my knowledge not many to forecast energy usage explicitly. However, Citroean, Ouassaid and Maaroufi (2015) create an ARIMA model including the explanatory variables real GDP and demography to forecast electricity demand in Morocco. The model created is shown the be well fitted, though it is not compared to a standard ARIMA model without explanatory variables as is done in this thesis. On the other hand, De Felice, Alessandri and Ruti (2013) compare the forecasts of electricity demand in Italy of a standard ARIMA model and one which also includes weather forecasts as explanatory variables. The results show that the model including the weather forecasts has a better forecasting accuracy than

6 3 the standard ARIMA model. In this thesis the explanatory variables GDP, demography and weather forecasts used by Citroean et al. (2015) and De Felice et al. (2013) will not be considered. The motivations for this are straight forward; value added of the industrial sector already captures the GDP of the industrial sector, thus adding the GDP of Sweden will add no relevant information to the model. The variable demography is relevant for Citroean et al. (2015) since they forecast the electricity demand in general where an increase or decrease in population directly affects the electricity demand. For the industrial sector an increase or decrease in demography may lead to higher or lower demand for goods giving a change in production. However, this is also captured by the value added of the industrial sector. Lastly, the motivation behind not including weather forecasts is simply that usable data for this is lacking. As mentioned above, studies comparing using ARIMA or SARIMA with and without explanatory variables for forecasting in areas other than energy usage are plenty full. An example is Vagropoulos, Chouliaras, Kardakos, Simoglou, and Bakirtzis (2016) who compare SARIMA, SARIMA with explanatory variables and Artificial Neutral Networks on one day ahead electricity generation forecasting of grid-connected Photovoltaic (PV) plants and SARIMA and SARIMA with explanatory variables for intraday forecasting. They find that SARIMA with explanatory variables has the best forecasting accuracy for day ahead while standard SARIMA has the best forcasting accuracy for intraday. Other studies include Tsui, Ozer Balli, Gilbey and Gow (2014), who forecasts Hong Kong s airports passenger throughput using a SARIMA model without explanatory variables and then including a number of explanatory variables. The results show that the model including explanatory variables has more predictive power. The last example is Kongcharoen and Kruangpradit (2013) who compare the forecast of Thailand s export to 12 different countries using ARIMA and ARIMA with explanatory variables. They find that including explanatory variables lead to statistically significantly better forecasts for roughly half of them. In summary it can be said that the SARIMA framework has been found to be well suited for forecasting energy usage. Also, as one could expect, expanding on the SARIMA framework to including explanatory variables have shown mixed results, sometimes the SARIMA model without explanatory variables performs better while other times it doesn t. Thus it seems appropriate that the model to choose should be investigated in each case. Popular methods to forecast energy usage other than SARIMA/ARIMA with or without explanatory variables include Artificial Neutral Network (ANN), used by Chae, Horesh, Hwang, and Lee (2015), Azadeh, Ghaderi, and Sohrabkhani (2008) and Jebaraj, Iniyan and Goic, (2010). Also Grey Forecasting Model (GM) have been used by Yuan, Liu and Fang (2016), Xiong, Dang, Yao, and Wang (2014) and Kumar and Jain (2009). These models are interesting in their own right, but as the purpose of the thesis is to compare

7 4 the forecasting performance of SARIMA and dynamic regression models with SARIMA errors, they are deemed out of scope of this thesis as they aren t relevant to answer the research question. The outline of the rest of the thesis is as follows: In Section 2 the theory and methodology are presented. Firstly, the SARIMA model is described together with all statistical tests used to fit the SARIMA model. This is followed by a presentation of the dynamic regression model. Lastly, the forecast comparison is described together with the statistical test used for the comparison. In Section 3 the data used in this thesis is plotted and described. This is followed by presenting results in Section 4. The results section follows the same principle as the theory and methodology section, first the results of the SARIMA modelling are shown, followed by the dynamic regression and finally the two models are compared and discussed. In Section 5 the conclusions of the study are presented.

8 5 2 Theory and Methodology In this section the two different methods and the statistical tests used in the modelling in the study are described. First, the SARIMA model is presented followed by dynamic regression with SARIMA errors and then how they are compared. 2.1 SARIMA The SARIMA model is a version of the common ARIMA model which also incorporates a seasonal part. The general SARIMA model can be expressed as (Box, Jenkins, Reinsel, and Ljung, 2015, p. 310): φ p (B)Φ P (B s ) d D s y t = θ q (B)Θ Q (B s )ɛ t. (2.1) Using backshift operators, the Autoregressive (AR) and Moving Average (MA) polynomials are defined respectively as: φ p (B) = 1 φ 1 B φ 2 B 2 φ p B p, (2.2) θ q (B) = 1 + θ 1 B + θ 2 B θ q B q. (2.3) The seasonal AR and MA polynomials are defined respectively as follows: Φ P (B s ) = 1 Φ 1 B s Φ 2 B 2s Φ P B P s, (2.4) Θ Q (B s ) = 1 + Θ 1 B s + Θ 2 B 2s + + Θ 2 B Qs, (2.5) where s is the seasonal level (s=4 for quarterly data), d the order of integration and D s the order of seasonal integration, and ɛ t is a white noise process (Brockwell and Davis, 2016, p. 177). In an autoregressive model, y t is a linear combination of the p most recent values (Cryer and Chan, 2008, Page 66). Rather than using past values, The MA part of the model includes past errors (Cryer and Chan, 2008, p. 57). The consequence is that any shock

9 6 on y t will gradually fade off in the case of the AR model, but will do so abruptly in the case of an MA model Box-Jenkins Methodology The Box-Jenkins method gives a systematic step by step approach to finding the best SARIMA model for forecasting a time series. The method was proposed in 1970 by George Box and Gwilym Jenkins in their textbook Time Series Analysis: Forecasting and Control. The method consists of three steps: identification, estimation and diagnostic checking. The Box-Jenkins approach is shown in a flow diagram in Figure 1 (Box et al., 2015, p. 16). Figure 1: Box-Jenkins methodology step by step.

10 7 Identification The identification step is broken down into two parts. First, one has to asses if the time series is stationary and determine the number of differences needed to achieve stationarity. Once the series is stationary, overdifferencing should be avoided. If the series is overdifferenced, extra serial correlation and model complexity is introduced. If the series z t follows a random walk then taking the first difference w t = (1 B)z t = a t results in a (stationary) white noise process. However, if a second difference is taken, the resulting model will be (1 B)w t = (1 B)a t which is a more complex ARIMA(0,2,1) with parameter θ = 1 (Box et al., 201 p. 181). The second part of the identification step is to identify the (p,q)(p,q) order of the SARIMA model (Box et al., 2015, p. 180). This is done by investigating the autocorrelation function and partial autoccorelation function plots and choosing the order with the lowest Akaike information criterion or the Bayesian information criterion. Estimation The second step of the methodology refers to the estimation of the chosen model. This can be the maximum likelihood estimator, conditional least squares or unconditional least squares. The estimation step assumes that a model has been selected in the previous step (Box et al., 2015, p ). The advantage of maximum likelihood compared to least squares is that it uses all of the information in the data, whereas least squares uses the first and second moments only. (Cryer and Chan, 2008, p. 158). Diagnostic checking In the last step, the residuals of the estimated model are extracted. These are then checked for heteroskedasticity, autocorrelation and normality. If the diagnostic check fails, one returns to step one to find a better fit. Otherwise the model is ready for forecasting (Box et al., 2015, p ) Stationarity In the following section, the stationarity tests used in the model fitting are presented and is part of the identification step in the Box-Jenkins methodology. In this thesis, stationarity refers to weak-stationarity. A time series is said to be weakly stationary if the following is true:

11 8 E(y t ) = µ for all t, (2.6) γ t,t k = γ 0,k for all time t and lag k, (2.7) where µ is the mean and γ k the autocovariance of lag k (Cryer and Chan, 2008, p. 17). When testing for stationarity, the alternative (or null hypothesis, depending on the test), is that the series has a unit root. If the series has a unit root it is non-stationary. A unit root process can be described in the following way considering an ARMA process (1 φ 1 B φ 2 B 2 φ p B p )y t = (1 + θ 1 B + θ 2 B θ q B q )ɛ t, (2.8) where the moving average polynomial is invertible. equation 2.8 is then factored as: The autoregressive polynomial in (1 φ 1 B φ 2 B 2 φ p B p ) = (1 λ 1 B)(1 λ 2 B) (1 λ p B). (2.9) The process has a unit root if any of the eigenvalues λ lie outside of the unit circle. By testing both the null hypothesis of a unit root and null hypothesis of stationarity, one can differentiate between series that are stationary, series that have a unit root and series where the data are not informative enough to determine if the series is stationary or integrated (Kwiatkowski, Phillips, Schmidt, and Shin, 1992). The Augmented Dickey Fuller Test The Augumented Dickey-Fuller test (ADF-test) was originally proposed in 1979 by Dickey and Fuller and later extended by Said and Dickey in The test has the null hypothesis of a unit root with the alternative being either stationary or trend stationary. The regression model used to test the alternative hypothesis of stationarity is the following: y t = α + γy t 1 + p δ i y t i + ω t. (2.10) i=1 The intercept of the model is α, is the first difference operator, δ the parameter of the lagged differences, p the number of lags and ω t is assumed to follow a white noise process. The lagged differences are included to take autocorrelation, which is present in many time series, into account. The γ is the parameter that is tested, with the null hypothesis of γ = 0 against the alternative of γ < 0. The test statistic is then: DF τ = ˆγ SE(ˆγ). (2.11)

12 9 It is noteworthy that the test statistic is not asymptotically normal. The asymptotic distribution of the test statistic also depends on what deterministic components are included in the regression. Critical values have been tabulated and can be found in for example Hamilton (1994). To test for trend stationarity, the following regression is used: y t = α + βt + γy t 1 + p δ i y t i + ω t, (2.12) i=1 where a trend component βt has been added. To include α or β should be decided ahead of conducting the test. If the data does not have a trend but is centered around a non-zero mean, the test should be performed with an intercept only. If the data has a trend (either up or down) the time trend component should be included. If the data is centered around a zero mean, then neither intercept or time trend should be included. The number of lags to include in the model is determined by choosing the regression with the lowest AIC (Neusser, K. 2016, p ). Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test To test for the null of no unit root, the Kwiatkowski-Phillips-Schmidt-Shin test (KPSStest) uses a model with a random walk and a stationary error in the following way (Kwiatkowski et al., 1992): y t = r t + ɛ t (2.13) where the random walk component is expressed as follows: r t = r t 1 + u t (2.14) with u t being IID (0, σ 2 u) and r 0 the intercept. The null hypothesis of level stationarity is σ 2 u = 0 against the alternative hypothesis of a unit root. The test statistic used is given by ˆη µ = η µ S s 2 (l) = T 2 2 t s 2 (l), (2.15) where S t = T t=1 e t and a consistent estimator s 2 (l) of σ 2 ɛ is formulated using the residuals e t in the following way s 2 (l) = T 1 T l T e 2 t + 2T 1 w(s, l) + e t e t s. (2.16) i=1 s=1 t=s+1

13 10 The w(s, l) corresponds to the Bartlett window 1 s where the bandwidth l is chosen l+1 according to the automatic procedure suggested by Newey and West (1994). The critical values of the test statistic can for example be found in the original paper by Kwiatkowski et al. (1992). It is also possible to have trend stationarity as null hypothesis. In that case the following regression is used y t = ξt + r t + ɛ t, (2.17) where ξ is the parameter of a time trend. The KPSS test is intended to complement other unit root tests such as the ADF-test. The Hylleberg-Engle-Granger-Yoo Test The Hyllberg-Engle-Granger-Yoo test (HEGY-test) was proposed by Hylleberg, Engle, Granger, and Yoo (1990) to test for seasonal unit roots on quarterly data. Factorizing the quarterly seasonal difference operator as 4 = (1 B 4 ) = (1 L)(1 + L)(1 + il)(1 il), (2.18) shows that it will have four unit roots on the unit circle: 1, 1 and ±i, where 1 is nonseasonal. The HEGY test uses the following auxiliary regression to test for the unit roots ψ(b)y 4,t = π 1 y 1,t 1 + π 2 y 2t 1 + π 3 y 3t 1 + π 3 y 4t 2 + µ t + ɛ t. (2.19) where y 1,t = (1 + B + B 2 + B 3 )y t, y 2,t = (1 B + B 2 B 3 )y t, y 3,t = (1 B 2 )y t, y 4,t = (1 B 4 )y t, µ t = Deterministic components which can be an intercept, seasonal dummies and/or trend, ψ = is a polynomial of B. The null hypotheses are H 0 : π 1 = 0 vs H 1 : π 1 < 0, H 0 : π 2 = 0 vs H 1 : π 2 < 0 are tested with one sided t-tests and the joint H 0 : π 3 = π 4 = 0 vs H 1 : π 3 0 and/or π 4 0 with an F-test. The t-statistics follow the Dickey-Fuller distribution and the critical values of the F-test can be found in for example the original paper by Hyllberg et al. (1990). Not rejecting the first null hypothesis would indicate a nonseasonal unit root, not rejecting the second null hypothesis would indicate a seasonal unit root at the biannual frequency and not rejecting the third null hypothesis would indicate seasonal unit roots at the annual

14 11 frequency. Like the ADF-test, the HEGY-test tests for a nonseasonal unit root with the null hypothesis of a unit root. Franses (1996, p. 73) notes that the ADF and HEGY tests complement each other as the HEGY test, for the zero frequency, may have lower power than the ADF-test in some cases. Canova and Hansen Test The Canova and Hansen test (CH-test) was proposed by Canova and Hansen in 1995 for the purpose of testing the null of no seasonal unit roots. The CH-test has the null hypothesis of stable seasonality with the alternative hypothesis of seasonal unit roots. The CH-test is intended to complement the HEGY-test in the same way as the KPSStest complements the ADF-test. The regression model used in the CH-test is expressed as: y t = µ + x tβ + d ta + e t (2.20) where µ is an intercept, x t are explanatory variables, d ta a deterministic seasonal component with with d t being seasonal dummy indicators (four for quarterly data), the parameter a represent the seasonal effects and e t N(0, σ 2 ). A trigonometric representation of the deterministic seasonal component S t is formulated as: S t = q f jtγ j (2.21) j=1 where, for quarterly data, q = s/2 = 2. For q = 1, f jt = cos( πt πt ), sin( ) and for q = 2, 2 2 f jt = cos(πt) which correspond to the seasonal annual unit roots ±i and seasonal biannual unit root 1. We then have S t = f jtγ j with γ = ( γ1 inserting this into the equation (2.20) gives. γ 2 ), f t = ( f1t f 2t ). (2.22) y t = µ + x tβ + f tjγ j + e t. (2.23) The test statistic is then given by: L = T 2 T t=1 ˆF t A(A ˆΩf A) 1 A ˆFt, (2.24) where ˆF t = T t f tê t and where ê t are the OLS residuals from regression (2.23), ˆΩf is an

15 12 estimate of the long-run covariance matrix of f t e t and is A is determined by the seasonal frequencies to be tested. When simultaneously testing for unit roots at all seasonal frequencies, A = I s 1 and the test statistic is given by L f = T 2 T t=1 ˆF t ( ˆΩ f ) 1 ˆFt. (2.25) The test statistic follows the von Mises distribution and the critical values can for example be found in the original paper by Canova and Hansen (1995). The test assumes no unit root at the zero frequency and it is therefore recommended by the authors to first make suitable transformations before conducting the test Order Selection Order selection is part of the first step of the Box-Jenkins methodology and deals with selecting the AR, MA, Seasonal AR and Seasonal MA order (p,q)(p,q). Autocorrelation Function and Partial Autocorrelation Function The autocorrelation function (ACF) and partial autocorrelation function (PACF) are useful in determining the (p,q)(p,q) order (Box et al., 2015, p. 183). The ACF can be estimated using the autocovariance. The sample autocovariance function at lag k is given by ˆγ(k) = 1 T k (y t+k ȳ)(y t ȳ). (2.26) T The sample autocorrelation function at lag k is then given by t=1 ˆρ(k) = ˆγ(k) ˆγ(0) (2.27) (Brockwell and Davis, 2016, p. 16). Table 1: Behaviour of the ACF and PACF for determining AR and MA orders AR(p) MA(q) ARMA(p,q) ACF Tails off Cuts off after lag q Tails off PACF Cuts off after lag p Tails off Tails off

16 13 Table 2: Behaviour of the ACF and PACF for determining SAR and SMA orders AR(P)s MA(Q)s ARMA(P,Q)s ACF Tails off at lags s Cuts off after lag Qs Tails off at lags s PACF Cuts off after lag Ps Tails off at lags s Tails off at lags s The sample partial autocorrelations are obtained using the estimated autocorrelations in the following way ˆρ j = ˆφ k1 ˆρ j 1 + ˆφ k2 ˆρ j ˆφ k(k 1) ˆρ j k+1 + ˆφ kk ˆρ j k for j=1, 2,..., k (2.28) and solving the equations for k = 1, 2,... where the parameters ˆφ 11, ˆφ ˆφ jj are the partial autocorrelations 1, 2 and j respectively (Box et al., 2015, p. 66). To use the ACF and PACF in determining the AR,MA, Seasonal AR and Seasonal MA order, the ACF and PACF are plotted. In Table 1 and 2 the ACF and PACF behaviour and what order they suggest are shown. (Shumway Stoffer, 2006, p.109 and p.156). Model Selection Determining the (p,d)(p,d) part of the model from the plotted ACF and PACF only can however be difficult. Box et al. (2015, p. 190) recommend the use of the Akaike information criterion (AIC) proposed by Akaike (1974) and the Bayesian information criterion (BIC) proposed by Schwarz (1978) as supplementary tools to be used in model selection. AIC = 2k 2ln(L), (2.29) BIC = ln(t )k 2ln(L), (2.30) where: L =Likelihood function of the estimated model, k =Number of parameters in the estimated model(p+q+p+q+1), T =Number of observations. The AIC and BIC measure the goodness of fit of a statistical model and at the same time provide a penalty for adding more parameters to the model to avoid overfitting. The best model is the one with the lowest AIC and BIC value. The BIC gives a greater penalty to adding more parameters than the AIC. Koehler and Murphree (1988) compared the

17 14 model selection of the AIC and BIC on 91 real time series. The AIC and BIC chose different model orders in 27% of the cases and the forecast performance of the AIC and BIC are compared for those cases. The study concludes that it is preferable to use the BIC over the AIC, which according to the authors is also in line with previous simulations conducted by Sneek (1984) Parameter Estimation Parameter estimation is the second step of the Box-Jenkins methodology. The estimation method used in the SARIMA model is maximum likelihood. Simulation studies to compare unconditional least squares, conditional least squares and maximum likelihood for ARMA models have been conducted by Dent and Min (1978) and Ansley and Newbold (1980). These simulations suggest that although unconditional least squares and conditional least squares are adequate approximations to the maximum likelihood for large sample sizes, maximum likelihood is preferred for small and moderate sample sizes (Box et al. 2015, p. 217) Residual Diagnostics The residual diagnostics is the last step of the Box-Jenkins methodology before the model can be used for forecasting. In this part, the residuals of the model are extracted and checked for heteroskedasticity, serial correlation and normality. The assumption of normality is not explicitly necessary, but needed when using the AIC and BIC for model selection and useful when calculating prediction intervals. Ljung-Box Test The Ljung-Box test proposed by Ljung and Box (1978) is used to test for the absence of autocorrelation in the residuals of the chosen model. The null hypothesis of the test is that the residuals are serially uncorrelated against the alternative hypothesis of serially correlated residuals up to order m. The test statistic is given by: where: T =sample size, ˆρ(k)=sample autocorrelation at lag k, Q = T (T + 2) m =number of autocorrelations to test. m k=1 ˆρ(k) 2 T k, (2.31)

18 15 The Q statistic has an approximate Chi-square distribution with m p q degrees of freedom (Box et al., 2015, p. 289). Hydman (2014) suggests that the number of lags should be m = min(10, T/5) for nonseasonal time series and m = min(2s, T/5) for seasonal time series with s being the number of seasonal periods. The Jarque Bera Test The purpose of the Jarque-Bera test is to test the residuals of the fitted model for normality. The test was proposed by Jarque and Bera (1980) and assesses whether the kurtosis and skewness of the data match a normal distribution. The null hypothesis of the test is that the data is normal against the alternative hypothesis of not being normal. The test statistic is given by: where T =sample size, g 1 =sample skewness, g 2 =sample kurtosis. ( g 2 JB = T (g ) 2 3) 2, (2.32) 24 Under the null hypothesis the test statistic is approximately χ 2 distributed with 2 degrees of freedom (Cryer and Chan, 2008, p ). 2.2 Dynamic Regression In the following section the dynamic regression model with SARIMA errors and the seasonal cointegration test used are presented General Model Dynamic regression with SARIMA errors fits a linear regression with the error term being corrected for autocorrelation. A general dynamic regression with SARMA errors can be written as y t = α + β 1 x 1,t + + β k x k,t + N t, (2.33) φ p (B)Φ P (B s )N t = θ q (B)Θ Q (B s )ɛ t. (2.34)

19 16 Solving for N t and substituting back into equation 2.33 we get y t = α + β 1 x 1,t + + β k x k,t + θ q(b)θ Q (B s ) φ p (B)Φ P (B s ) ɛ t, (2.35) where ɛ t is assumed to follow a white noise process. (Pankratz, 1991, page 100). It is important to note that when N t is differenced, all other variables need to have the same order of integration after differencing (Pankratz, 1991, page 121). Therefore a dynamic regression model with SARIMA(0,1,0)(0,1,0) 4 errors would be of the form 4 N t = ɛ t, (2.36) 4 Y t = β 1 4 x 1,t + + β k 4 x k,t + ɛ t. (2.37) However, if the variables are cointegrated, the N t will be stationary even if the variables themselves are not Engle-Granger-Hylleberg-Lee Test To test for cointegration of seasonal time series the Engle-Granger-Hylleberg-Lee test (EGHL-test) is used (Engle, Granger, Hylleberg, S. and Lee, 1993). The test is closely related to the HEGY-test and uses the following three regressions to test for cointegration at the zero, annual and biannual frequencies y 1t = α 12 x 1t + u 1, (2.38) y 2t = α 22 x 2t + v t, (2.39) y 3t = α 32 x 3t + x 3t 1 + w t, (2.40) where y 1,t = (1 + B + B 2 + B 3 )y t, y 2,t = (1 B + B 2 B 3 )y t, y 3,t = (1 B 2 )y t. The error terms are then used to test for nonseasonal and seasonal unit roots. The error term u t can be tested directly using the ADF-test while the error term v t is tested on the form (v t + v t1 ) on v t1. Finally, the error term w t is tested for unit roots at the annual frequency using the regression w t + w t 2 = π 3 ( w t 2 ) + π 4 ( w t 1 ) + error through the joint F-test π 3 = π 4 = 0. The critical values for the three tests are not the same as for the ADF-test or the HEGY-test. Critical values have however been tabulated in Engle and Granger (1987) for the ADF-tests and Engle et al. (1993) for the joint F-test.

20 Estimation and Forecasting The parameters of the dynamic regression model with SARIMA errors is estimated with maximum likelihood as suggested by Pankratz (Pankratz, Page 324). A drawback with dynamic regression models with SARIMA errors is that in order to forecast y t+h, one needs to have a value for x t+h. In some cases the x t+h can be replaced by known values, for example if x t represents a know future policy decision, or if x t are known deterministic values such as seasonal dummies, holidays or trading days. If x t+h is unknown however, in order to forecast y t+h one also needs to forecast x t+h. Pankratz suggests using the ARIMA methodology to forecast x t+h. There are however a variety of sources for finding forecasts for unknown values of x t+h, many are done by national statistics offices and banks (Pankratz, 1991, page 330). Expert forecasts in Sweden are done by the National Institute of Economic Research and Sweden s central bank. 2.3 Forecast Comparison The SARIMA model and the dynamic regression model are compared on a recursive outof-sample forecast. The model forecasts h-steps ahead and the forecast errors are saved. Then the model is re-estimated including the new "known" values and then forecast h- steps ahead again. The process continues until the end of the sample is reached. The forecast errors are then evaluated. Stock and Watson (2007, p.571) refer to out-of-sample performance as the ultimate test of a forecasting model Root Mean Squared Error and Mean Absolute Error To compare the two models, the Root Mean Squared Error (RMSE) and the Mean Absolute Error(MAE) are calculated from the out-of-sample predictions. n t=1 RMSE = (ŷ t y t ) 2, (2.41) n MAE = 1 n n ŷ t y t, (2.42) t=1 where ŷ t =predicted value, y t =actual value, n =Number of observations.

21 18 The difference between the two is that the RMSE is more sensitive to large deviations from the real value than the MAE. The model with the lower RMSE and MAE can be seen as the better one for forecasting OOS-t and ENC-NEW Tests However, neither RMSE or MAE show whether there is a statistical difference in the prediction accuracy. To test the difference in forecast accuracy, McCracken (2007) suggests, among others, the following test based on the Diebold-Mariano test OOS t = ˆΩ 0.5 (MSF E 1 ( ˆB 1 ) MSF E 2 ( ˆB 2 )), (2.43) where MSF E 1 ( ˆB 1 ) =mean squared forecast error of the nested model, MSF E 2 ( ˆB 2 ) =mean squared forecast error of the nesting model, ˆΩ =estimated variance of the mean of the squared-error loss differential. The limiting distribution of the OSS-t statistic is non standard and depends on the insample/out-of-sample ratio as well as the number of additional variables in the nesting model. Critical values for the statistic can however be found in for example McCracken (2007). To test if one forecast encompasses the other, Clark and McCracken (2001) suggest the following test where P =number of predictions, ENC NEW = P P 1 t (û2 1,t+1 û 1,t+1 û 2,t+1 ) P, (2.44) 1 t û2 2,t+1 û 1 =forecast error of the nested model, û 2 =forecast error of the nesting model. The null hypothesis of the Clark-McCracken test is that the forecast of the nested model encompasses the forecast of the nesting model. This implies that the additional variables are redundant. As with the OSS t statistic, the limiting distribution of the ENC N EW statistic depends on the in-sample/out-of-sample ratio as well as the number of additional variables in the nesting model. A selected number of critical values for the ENC NEW statistic can be found in the original paper by Clark and McCracken (2001).

22 19 3 Data The data used in this study is the energy usage in the industry sector in Sweden and was obtained from Statistics Sweden. The data is quarterly and stretches from 1987 until the third quarter in 2016, which includes 119 observations. The data is split into two parts, 1987 to the third quarter in 2011 (99 observations) and fourth quarter 2011 to third quarter 2016 (20 observations), which are used for model construction and model evaluation, respectively. The choice of the splitting ratio is motivated by having a large sample for model fitting at the same time as leaving enough observations for evaluation. However the exact splitting point is also due to necessity. As can be seen in Figure 2, there are outliers around 2008 and This is due to the financial crisis where the Swedish industry production dropped 30 index points from 120 to 90 (Statistics Sweden). Evaluating the models over this time period is not of interest since the aim of this thesis is not to forecast energy usage in times of crisis.

23 20 Figure 2: Quarterly energy usage from 1987Q1 to 2016Q3 in the industry sector in Sweden. Reported in Terajoule. To model the energy usage, two explanatory variables will be considered for the dynamic regression with the SARIMA errors model. These are energy prices and value added of the industrial sector. The explanatory variables are described closer in section

24 21 4 Results 4.1 SARIMA In the following section the SARIMA selection is presented Stationarity Test Using the Box-Jenkins methodology outlined in the theory, one first has to determine if the time series is stationary. Therefore, the ADF-test is employed to test for a unit root. Since there does not seem to be a linear trend, the test is performed with an intercept only. The results are shown in Table 3. The null hypothesis of the presence of a unit root is not rejected for the ADF-test. The KPSS-test is used to test for level stationarity and, as can be seen in Table 3, the null hypothesis is rejected. Because the alternative hypothesis of the KPSS-test is the presence of a unit root, the results are conclusive. Since stationarity is a crucial assumption in SARIMA modelling, this implies that a first difference should be taken to achieve stationarity. To test for seasonal unit roots, the Canova-Hansen and HEGY-tests are performed. The HEGY-test is performed on the original data while for the Canova-Hansen test a first difference is taken first. This is because the Canova-Hansen test assumes no unit root at the zero frequency. The HEGY-test is performed with an intercept and seasonal dummies. The results from the Canova-Hansen test in Table 4 show that the null hypothesis for the annual and biannual frequencies are rejected. This indicates the presence of seasonal unit roots and that a seasonal difference should be taken. The results from the HEGY-test are shown in Table 5 and the null hypothesis is not rejected for any of the frequencies. This means that the presence of both a nonseasonal and seasonal unit roots and that a seasonal difference should be taken. However, since taking both the seasonal difference and first difference could result in over-differencing and thus lead to an unnecessarily complex model as described in the theory Section 2.1.1, I take the seasonal difference only and perform the ADF, KPSS, HEGY and CH-tests again.

25 22 The ADF-test is conducted with an intercept, the KPSS-test with the alternative hypothesis of level stationarity and the HEGY-test with a constant but no seasonal dummies. From Table 3, 4 and 5 it can be seen that the ADF-test rejects the null hypothesis of a unit root, the KPSS-test does not reject the null hypothesis of level-stationarity, the HEGY-test rejects the null hypothesis of nonseasonal and seasonal unit roots and the Canova-Hansen does not reject the null hypothesis of stable seasonality. It therefore seems correct to take the seasonal difference only before proceeding to model selection. Table 3: Unit root tests. ***Implies Rejection at 1% Type of test ADF-test KPSS-test Statistic *** 4 Statistic -4.2*** Table 4: Seasonal unit roots test. *** Implies rejection at the 1% CH-test Frequency Statistic 4 Statistic π 2 1.3*** π 1.04*** joint 1.5*** Table 5: Seasonal unit roots test. *** Implies rejection at the 1% level ** Implies rejection at the 2.5% level HEGY-test Frequency Statistic 4 Statistic t π ** t π *** F π3,π ***

26 Model Selection Figure 3: Autocorrelation function and Partial Autocorrelation function of seasonal difference. Following the Box-Jenkins methodology, since the data is now stationary, we can proceed to determine the AR and MA polynomials. Looking at the plots of the autocorrelation function and partial autocorrelation function in Figure 3, the ACF plot has a wave like pattern and tails off to zero, which suggests an AR(1) process. The moving average (MA) part is harder to determine, the PACF has significant lags at lag 4, 5, 12 and 13 and a first guess is a seasonal MA(1) component. However as described in the theory part 2.1.3, determining the autoregressive and moving average order from the ACF and PACF alone is difficult. Therefore a number of models are considered and the model with the lowest Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) is selected. Eleven different SARIMA models with a seasonal difference are checked and shown in Table 6. It can be determined that the model with the lowest AIC and BIC is the SARIMA(1,0,0)(0,1,1), which is the same model that was guessed looking at the ACF and PACF. This model is then chosen to proceed to the estimation and residual diagnostics step.

27 24 Table 6: Model selection. Chosen model is in bold Model AIC BIC SARIMA(0,0,0)(0,1,0) SARIMA(1,0,0)(0,1,0) SARIMA(1,0,0)(0,1,1) SARIMA(1,0,1)(0,1,1) SARIMA(2,0,0)(0,1,0) SARIMA(2,0,0)(0,1,1) SARIMA(0,0,1)(0,1,1) SARIMA(0,0,1)(0,1,0) SARIMA(1,0,0)(1,1,0) SARIMA(0,0,0)(0,1,2) SARIMA(1,0,0)(0,1,2) Estimation and Residual Diagnostics The next step in the Box-Jenkins methodology is to estimate the parameters of the selected model, and then to extract the residuals to check whether they sufficiently follow a white noise process. This is done by visual inspection of the residuals to check for heteroscedasticity, followed by application of the Ljung-Box test to verify that the residuals are serially uncorrelated. Finally, the Jarque-Bera test is used to test for normality. In Table 7 the estimated parameters can be seen. Using these estimations, the implied ACF of PACF of such a model with corresponding parameters are calculated and plotted in Figure 4. The theoretical ACF tails off in a similar wave pattern as the ACF in Figure 3. The theoretical PACF has spikes at lag 4 and 5 which then tail off. The PACF also has spikes at lag 4 and 5 but they do not tail off as clearly as in the theoretical PACF. However, all in all the parameter estimates seem reasonable. The ACF and the PACF of the extracted residuals are plotted in Figure 5. From this, it can be determined that there are no significant lags left, which implies that the residuals are serially uncorrelated. This is also confirmed by the Box-Ljung test in Table 8, were the null hypothesis of the data being serially uncorrelated is not rejected. Table 7: Estimation of parameters for SARIMA(1,0,0)(0,1,1) Coefficient Estimation Standard Error AR(1) SMA(1)

28 25 Figure 4: Theoretical ACF and PACF of estimated SARIMA(1,0,0)(0,0,1) with phi = and Theta = Figure 5: The autocorrelation function and partial autocorrelation function of the extracted residuals from the fitted model

29 26 Table 8: Ljung-Box test on extracted residuals 8 lags are used Ljung-Box test Statistic 3.87 P-value 0.69 When inspecting the plot of the residuals in Figure 6, it is evident that there are big outliers around 2008 and 2009, which is probably due to the financial crisis mentioned earlier in section 3. The model is therefore re-estimated, but with adding dummy variables from the last quarter in 2008 to the third quarter in The Jarque Bera test is then used to test for normality. The results in Table 9 show that the null hypothesis of normality is not rejected. Furthermore, looking at the plot in Figure 5 one can see that the model does not suffer from heteroscedasticity. The SARIMA model is therefore ready for forecasting. Figure 6: Plot of the residuals of the estimated SARIMA(1,0,0)(0,1,1) Table 9: Jarque-Bera test on extracted residual Jarque-Bera test Statistic P-value 0.53

30 Dynamic Regression Model In the following section the dynamic regression model with SARIMA errors is fitted. First the explanatory variables of the dynamic regression model will be presented. Thereafter the model with the best in-sample fit will be chosen and compared with the SARIMA model estimated earlier for best out-of-sample forecast Explanatory Variables The two variables which will be considered for the model is the log value added of the industrial sector and log energy prices. Value Added of the Industrial Sector Valued added is plotted in Figure 6. As can be seen in this plot, the financial crisis of 2008 is clearly visible, as well as the Swedish financial crisis of the early 1990s. This data is taken from Statistics Sweden. Figure 6: Log Transform of the value added of the industrial sector

31 28 Energy Prices Figure 7: Plot of the Energy Price Index Third Quarter 2010 is 100 The second variable considered for the dynamic regression is the natural logarithm of an energy price index, which includes coal, oil and natural gas. This time series is plotted in Figure 7. The log energy price index has also been corrected for inflation and the SEK/USD exchange rate. The energy price index is taken from the World Bank Order of Integration Table 10: Unit root tests. ***, * Implies Rejection at 1% and 5% level Type of test ADF-test KPSS-test Value Added *** 4 Value added -3.63* 0.11 Energy Prices *** 4 Energy prices -3.32*** 0.11

32 29 Table 11: Seasonal unit roots test. *** Implies rejection at 1% level CH-test Frequency Value added 4 Value added Energy prices π *** π 1.38*** joint 1.93*** Table 12: Seasonal unit roots test. ***, **, * Implies rejection at the 1%, 2.5% and 5% level HEGY-test Frequency Value added 4 Value added Energy prices 4 Energy prices t π *** * t π ** -7.13*** -6.26** F π3,π * 64.79*** 26.38*** 22.74*** To test for nonseasonal and seasonal unit roots the same tests are employed as in section As can be seen in Table 10, the ADF-test does not reject the null hypotheses of a unit root for both energy prices and value added at the same time as the KPSS-test rejects the null hypotheses of stationarity. This implies that a first difference should be taken. The CH-test is performed after taking the first difference as suggested by Canova and Hansen (1995). For value added, the null hypothesis of stable seasonality is rejected, but for energy prices it is not as Table 11 shows (no seasonality is also "stable" seasonality which is the null hypothesis of the CH-test). The HEGY-test shows that the null hypothesis of seasonal unit roots is rejected for energy prices. However, for value added it is not as clear, since the null hypothesis of a unit root at the annual frequency is rejected at the 5% level. However, considering the results for the CH-test in Table 11, taking the seasonal difference is the correct procedure. Next, the variables are tested after taking a seasonal difference. The reason for doing so is that the explanatory and dependent variables need to have the same level of integration after differencing and be stationary after having done so (unless they are cointegrated). From Table 10, 11 and 12 it can be seen that value added and energy prices are stationary after taking a seasonal difference.

33 Seasonal Cointegration and Stationarity Three different dynamic regression models are considered to determine which model will be compared with the SARIMA model selected in Section 4.1. The different models are "energy prices with SARIMA errors" (Model 1), "value added with SARIMA errors" (Model 2) and "value added and energy prices with SARIMA errors" (Model 3). For the dynamic regression models, the natural logarithms of the energy usage, value added and energy prices are used which gives the percentage change of the variables when taking a difference. From the results in section it can be seen that value added is seasonally integrated, which is the same order of integration as energy usage. Therefore it should be tested if they are seasonally cointegrated. If they are seasonally cointegrated but this is ignored, taking a seasonal difference will result in overdifferencing for Model 2, which in turn will make the forecast bad for longer horizons. Table 13 shows that value added and energy usage are not seasonally cointegrated. The three different models are then fitted using the seasonal difference of all the three variables and tested for whether the error term is stationary using the ADF and KPSS-tests. In Table 14 it can be seen that all three models pass the stationarity tests. Table 13: Cointegration tests. Seasonal cointegration Error term u t v t w t Statistic Critical value Table 14: Cointegration tests. *** Implies rejection at the 1% level Type of test ADF-test KPSS-test Model 1 test statistic -3.6*** 0.12 Model 2 test statistic -3.23*** 0.21 Model 3 test statistic -2.94*** 0.25

Testing for non-stationarity

Testing for non-stationarity 20 November, 2009 Overview The tests for investigating the non-stationary of a time series falls into four types: 1 Check the null that there is a unit root against stationarity. Within these, there are