Revisiting linear and non-linear methodologies for time series prediction - application to ESTSP 08 competition data

Similar documents
Econ 423 Lecture Notes: Additional Topics in Time Series 1

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Econometric Forecasting

Testing for Regime Switching in Singaporean Business Cycles

KALMAN-TYPE RECURSIONS FOR TIME-VARYING ARMA MODELS AND THEIR IMPLICATION FOR LEAST SQUARES PROCEDURE ANTONY G AU T I E R (LILLE)

Econometría 2: Análisis de series de Tiempo

Multivariate Time Series: VAR(p) Processes and Models

The autocorrelation and autocovariance functions - helpful tools in the modelling problem

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Hybrid HMM/MLP models for time series prediction

A Course in Time Series Analysis

Non-linear Analysis of Shocks when Financial Markets are Subject to Changes in Regime

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

MODELLING TIME SERIES WITH CONDITIONAL HETEROSCEDASTICITY

TIME SERIES DATA PREDICTION OF NATURAL GAS CONSUMPTION USING ARIMA MODEL

at least 50 and preferably 100 observations should be available to build a proper model

Ch 6. Model Specification. Time Series Analysis

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

Forecasting Bangladesh's Inflation through Econometric Models

Univariate linear models

Financial Econometrics

Arma-Arch Modeling Of The Returns Of First Bank Of Nigeria

Deep Learning Architecture for Univariate Time Series Forecasting

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Elements of Multivariate Time Series Analysis

Diagnostic Test for GARCH Models Based on Absolute Residual Autocorrelations

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Volatility. Gerald P. Dwyer. February Clemson University

APPLIED TIME SERIES ECONOMETRICS

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Do we need Experts for Time Series Forecasting?

4. Multilayer Perceptrons

Time Series 4. Robert Almgren. Oct. 5, 2009

9) Time series econometrics

Research Article Stacked Heterogeneous Neural Networks for Time Series Forecasting

Time Series I Time Domain Methods

A time series is called strictly stationary if the joint distribution of every collection (Y t

ECON3327: Financial Econometrics, Spring 2016

Predictive spatio-temporal models for spatially sparse environmental data. Umeå University

Automatic modelling of neural networks for time series prediction in search of a uniform methodology across varying time frequencies

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

The Size and Power of Four Tests for Detecting Autoregressive Conditional Heteroskedasticity in the Presence of Serial Correlation

Stationarity Revisited, With a Twist. David G. Tucek Value Economics, LLC

13. Estimation and Extensions in the ARCH model. MA6622, Ernesto Mordecki, CityU, HK, References for this Lecture:

Scenario 5: Internet Usage Solution. θ j

Time Series and Forecasting

Analysis. Components of a Time Series

GARCH processes probabilistic properties (Part 1)

FORECASTING YIELD PER HECTARE OF RICE IN ANDHRA PRADESH

Nonlinear Characterization of Activity Dynamics in Online Collaboration Websites

Wavelet Neural Networks for Nonlinear Time Series Analysis

5 Autoregressive-Moving-Average Modeling

Econometrics. 9) Heteroscedasticity and autocorrelation

y(n) Time Series Data

11. Further Issues in Using OLS with TS Data

A Test of the GARCH(1,1) Specification for Daily Stock Returns

1. Fundamental concepts

Statistics 910, #5 1. Regression Methods

Seasonal Autoregressive Integrated Moving Average Model for Precipitation Time Series

Gaussian Copula Regression Application

ECON 4160, Spring term Lecture 12

Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each)

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

Lecture 2: Univariate Time Series

Statistics of stochastic processes

FE570 Financial Markets and Trading. Stevens Institute of Technology

Switching Regime Estimation

FinQuiz Notes

Documents de Travail du Centre d Economie de la Sorbonne

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Optimizing forecasts for inflation and interest rates by time-series model averaging

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

Forecasting. Simon Shaw 2005/06 Semester II

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Nonlinear Time Series

MODELLING ENERGY DEMAND FORECASTING USING NEURAL NETWORKS WITH UNIVARIATE TIME SERIES

Chapter 5: Models for Nonstationary Time Series

Szilárd MADARAS, 1 Lehel GYÖRFY 2 1. Introduction. DOI: /auseb

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

Time Series Analysis -- An Introduction -- AMS 586

Oil price volatility in the Philippines using generalized autoregressive conditional heteroscedasticity

General comments Linear vs Non-Linear Univariate vs Multivariate

Acta Universitatis Carolinae. Mathematica et Physica

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

Generalized Autoregressive Score Models

Analysis of Fast Input Selection: Application in Time Series Prediction

Automatic seasonal auto regressive moving average models and unit root test detection

Estimation and application of best ARIMA model for forecasting the uranium price.

Univariate, Nonstationary Processes

Stochastic Volatility Models with Auto-regressive Neural Networks

Consistency of Quasi-Maximum Likelihood Estimators for the Regime-Switching GARCH Models

3 Theory of stationary random processes

Combining Regressive and Auto-Regressive Models for Spatial-Temporal Prediction

X t = a t + r t, (7.1)

Performance of Autoregressive Order Selection Criteria: A Simulation Study

A Comparison of the Forecast Performance of. Double Seasonal ARIMA and Double Seasonal. ARFIMA Models of Electricity Load Demand

Transcription:

Revisiting linear and non-linear methodologies for time series - application to ESTSP 08 competition data Madalina Olteanu Universite Paris 1 - SAMOS CES 90 Rue de Tolbiac, 75013 Paris - France Abstract. The goal of this paper is to remind some good sense principles that should be considered when modelling and forecasting time series. The use of more and more powerful computers lead to the developement of more and more methods. Most of them are black-box algorithms that take the time-series in the entry and display a predicted value, without making any fundamental assumption such as stationarity. We shall use the data sets proposed for the competition to remind principles such as parcimony, preprocessing, significance or uncorrelated residuals. 1 Introduction No doubt that time series is a challenging topic in many fields, whether we speak of finance, energy consumption, wheather or internet. For most of these phenomena, there is a time-dependence structure which makes appealing an autoregressive approach : one uses the past to explain the present and predict the future. Modelling and forecasting a time series supposes, in most of the cases, making some hypothesis on its behaviour. Since a time series contains noise, the main hypothesis is to assume that there is random, but not too much. We shall translate this mathematically by stationarity, which implies some regularities of the process and offers the frame for establishing asymptotic properties. The pioneer autoregressive model (AR hereafter) of Yule [1] supposes that the dependence on the past is finite and linear and that the noise is Gaussian. Although largely used for quite a long time, this model could not provide a good fit in all cases. Most of the data exhibit nonlinearities such as volatility clustering, periodicity, ruptures or asymmetries which are not handled by AR models. A wide variety of more complex models, designed to overcome these drawbacks, have been proposed starting with the 80 s. Let us recall some of the most popular : heteroscedastic conditional volatility models (ARCH [2], GARCH [3]), threshold or piecewise linear models [4], autoregressive regime switching models [5] or neural networks such as multilayer perceptron models [6]. Nowadays, more and more complex models became available in the neural networks community. In most of the cases, the statistical properties of these models are not established (most of the time due to their complexity), while modelling and are performed by a black-box algorithm, with no assumption of

stationarity and no preprocessing. However, stationarity is an important issue and should be taken care of. On one hand, stationary processes are relatively easy to predict (the statistical properties will be the same in the future as they have been in the past) and, on the other hand, they provide meaningfull sample statistics. The main goal of this paper is to use the opportunity of a forecasting competition to revisit the classics (AR and seasonal ARIMA models, multilayer perceptrons, hidden Markov models) and emphasize some points such as preprocessing, stationarity, parcimony, significant estimates and residuals independence. This paper doesn t seek for the best forecast on the three samples. It simply compares simple and complex models and shows that good use of simple models may provide better results than blind application of complex models. 2 Multivariate versus univariate models for Data1 The first data set contains three variables and 354 observations. The goal is to predict the next 18 values of the third component. Since the length of the series is not important, a parcimonious model will be needed. First, let us consider the autocorrelation function of the three dimensional process plotted in the left part of Fig 1. The first and the second components behave quite similarly. The autocorrelation functions are slowly decaying for all components and the process is subject to seasonality. After having removed the seasonality in all components, the autocorelation function of the new process shows that the correlations between the third variable and the first two are not significant. This remark will be very useful in the sequel, since it means that a model based on the third variable alone may be considered, without loosing much information. If the selected model contained the first or the second variable, forecasting would be a more difficult task with an important propagation error (besides forecasting the third variable, one should need to predict the first two also). The differenced series Y t = 12 X t was split into a training sample (318 inputs) and a test sample (24 inputs). Before estimation, the training sample was normalized. On the training sample, several classes of models were tested : an AR model for the third variable only, an ARX model considering all variables, a MLP model for the third variable and a MLP for the three variables. For every class of models, the best model was chosen with the BIC criterion. Let us now make a comment concerning the AR models and the MLP models that will be important hereafter for model selection. MLP models are recognized as universal approximators and give very good short-term. However, things change if we speak of long-term. If the time-series to be modelled is stationary, a long-term with an AR model will converge to the mean value of the process, but this property no longer holds for MLPs. The nonlinear function in the perceptron has attraction and/or repulsion points and the will either converge to the attraction point, or oscillate between the repulsion points. This explains why in most of the present examples, the

!"!" $!%!" $!& var" var" $ var% var" $ var&!% $!"!%!% $!& var% $ var" var% var% $ var&!& $!"!& $!%!& var& $ var" var& $ var% var& Fig. 1: of Data 1 before and after preprocessing AR models outperform the MLPs. For the first data set, the best two models in terms of normalized mean squared error (NMSE) measured on the test set are an AR(12) model for the third variable (NSME=0.26301) and a MLP for all variables (NSME=0.25624). The MLP has one hidden unit and takes as entries all variables are until 12. The on the test set are represented in Figure 2. 20 22 24 26 20 22 24 26 5 10 15 20 5 10 15 20 Fig. 2: Predictions on the test set - univariate AR model and multivariate MLP Although the perceptron performs better as a predictor, it has two important drawbacks : the number of parameters (43 parameters against 12 for the AR model) and the residuals, which fail the independence test (Figure 3). Thus, although the forecasting results on the test set are quite similar, we prefer the

AR model for making the 18 (Figure 4).!4!2 0 2 4 0 50 100 150 200 250 300 0 50 100 150 200 250 300 567 o9 residua4s!0.2 0 5 10 15 20 25 30 p va4ues 9or ;ox!=ierce statistic!4!2 0 2 4 567 o9 residua4s 0 5 10 15 20 25 30 p va4ues 9or ;ox!=ierce statistic 0.0 0.2 0.4 0.6 0 2 4 6 8 10 0 2 4 6 8 10 Fig. 3: Residuals - univariate AR model and multivariate MLP 10 15 20 25 30 data 0 100 200 300 Fig. 4: Predictions - univariate AR model

3 Seasonal ARIMA models versus multilayer perceptrons (MLP) for Data2 The challenge for the second data set is to predict the next 100 values for a series of length 1300 (Figure 5). First of all, let us remark that the data are highly nonstationary and nonnegative with values of order 10 8. The autocorrelation function is slowly decaying and shows a periodicity of order 7. 567 Data2 5.0e+08 1.0e+09 1.5e+09 2.0e+09 Partial 0.0 0.4 0.8!0.2 0.2 0.6 =567 0 200 400 600 800 1000 1200 567 ln(data2(t+1)/data2(t))!1.0 Partial!0.2 0.2 0.6 1.0!0.4!0.2 0.0 0.2 =567 0 200 400 600 800 1000 1200 Fig. 5: series 2 before and after preprocessing Modelling this data without preprocessing may lead to computation problems (saturation of the sigmoid functions in multilayer perceptrons, for example) or models that are not robust (nonstationarity). A transform of the data seemed then necessary : the natural logarithm and a first-order difference were considered. If X t, t {1,..., 1300} are the initial data, we decided to study ( Y t = ln Xt X t 1 ). The transformed series is split into a training sample (1199 inputs) and a test sample (100 inputs). The data in the training sample was normalized before estimation. Two classes of models were competing in this example, seasonal ARIMA models and MLPs. For every class, the best model was selected with a BIC criterion. The final SARIMA model is the following :

φ (B) ( 1 B 7) Y t = θ (B) Θ ( B 7) ɛ t φ (z) = 1 + 1.128z + 0.655z 2 + 0.027z 3 0.501z 4 0.504z 5 θ (z) = 1 + 0.833z + 0.182z 2 0.43z 3 0.741z 4 0.483z 5 + 0.219z 6 + 0.03z 7 Θ (z) = 1 0.885z 0.102z 2, (1) where B is the first-order difference operator BY t = Y t 1. The residuals of the SARIMA model are plotted in Figure 6. The autocorrelation function and the Llung-Box statistics validate the white-noise hypothesis. The forecasts on the test set are also represented in Figure 6. The top right graph contains the predicted values of the transformed series Ŷt+h, while the bottom right graph gives the for the initial data X t+h computed as follows : ( h ) X t+h = X t exp Ŷ t+i Predictions are quite accurate for the first 20 values, while for the rest, the are overestimated. On the log-returns series, the error is NSME=0.02131, while on the initial series, NSME=0.30534. i=1 (2)!6!2 2 4 6 0 200 400 600 800 1000 1200 567 o9 Residua4s!0.2!0.1 0.0 0.1 0.2 0.3 true value 0.0 0.4 0.8 0 5 10 15 20 25 30 p va4ues 9or?@unA!;ox statistic 6e+08 7e+08 8e+08 9e+08 1e+09 2 4 6 8 10 Fig. 6: Residuals and on the test set - SARIMA model Now, let us compare the SARIMA results with the MLP. As shown in Figure 7, the residuals of the MLP model fail the independence test, while the predicting

values are largely overestimating the. For illustration, the top right corner contains the made by the SARIMA model, while the bottom right graph represents the MLP. In terms of mean squared error for the MLP, NSME=2.92917. Since the SARIMA model outperforms the MLP in terms of and robustness, we used it for predicting the 100 awaited values of the series (Figure 8).!6!4!2 0 2 4 6 0.0 0.2 0.4 0.6 0.8 0 200 400 600 800 1000 1200 567 o9 residua4s 0 50 10 20 30 40 p va4ues 9or ;ox!=ierce statistic 6e+08 7e+08 8e+08 9e+08 1e+09 6.0e+08 8.0e+08 1.0e+09 1.2e+09 1.4e+09 1.6e+09 0 2 4 6 8 10 Fig. 7: Residuals - MLP model?oa returns!0.2 0.0 0.2 0 50 100 150 200 Bnitia4 data 6e+08 8e+08 1e+09 0 50 100 150 200 Fig. 8: 100 predicted values

4 Autoregressive Markov-switching models versus AR models for Data3 The third data set is highly nonstationary and contains over 31000 inputs. The goal is to forecast the next 200 values. The series plotted in Figure 9 suggests the possible existence of two regimes with quite opposite behaviours (increasing and decreasing trends). Also, the very slow decay of the autocorrelation function confirms the nonlinearity and the nonstationarity of the series. A closer look at the autocorrelation function shows a double periodicity, of orders 24 and 168. This may lead to think that the data are hourly recordings of some phenomenon, showing daily and weekly seasonality. If we make the hypothesis of some hourly recorded data, the plot of the series suggests also the existence of yearly periodicity. We did not consider it in our study, but it should be interesting to see whether taking the differenced series of order 8760 improves the results. 567 =567 0 50 100 150 200 250 0 5000 10000 15000 20000 25000 30000!200!100 0 100 200 Partial!0.2 567 =567!0.2 Partial 0.0 0.2 0.4 0.6 0.8 0 5000 10000 15000 20000 25000 30000 Fig. 9: series 3 before and after preprocessing The periodicity of order 24 was removed from the initial series and the resulting data was split into a training sample (31000 inputs) and a test sample (200 inputs). Before estimation, the training sample was normalized. Since we suspected the existence of two regimes, two classes of models were tested : autoregressive Markov-switching models (hereafter, HMM) and AR models. The BIC criterion selected 168 ged variables for both models. On the test set, the NSME is 0.01242 for the HMM model and 0.01249 for the AR model (Figure 10). Once again, as in the previous example, although the more complex model

0 50 100 150 200 20 30 40 50 20 30 40 50 0 50 100 150 200 Fig. 10: Predictions on the test set - HMM and AR model!15!10!5 0 5 10 15 0 5000 10000 15000 20000 25000 30000 567 o9 residua4s p va4ues 9or ;ox!=ierce statistic 0.0 0.2 0.4 0.6 0.8!15!10!5 0 5 10 15 0 5000 10000 15000 20000 25000 30000 567 o9 residua4s p va4ues 9or ;ox!=ierce statistic 0 2 4 6 8 10 0 2 4 6 8 10 Fig. 11: Residuals - HMM and AR model is slightly better in forecasting, the residuals fail the white noise tests and the AR model seems better (Figure 11). The demanded 200 forecasts are plotted in Figure 12. 5 Conclusion Three classes of models (AR and seasonal ARIMA, MLP, HMM) were compared on the three data sets of the competition. The comparison was made in terms of parcimony, quality of and residuals. All proposed time series are

20 30 40 50 initial data 0 200 400 600 800 Fig. 12: Predictions - AR model nonstationary with important seasonal components. Seasonal ARIMA models generally outperformed multilayer perceptrons and hidden Markov models in terms of long-term. This is mainly explained by the property of AR models of converging to the mean value of the series, while the other models get stucked in some attraction point and lead to the spread of the error. For this reason, long-term remains a difficult problem which complex models do not always solve, but its difficulty may be lessened by preprocessing. References [1] U. Yule, On a method of investigating periodicities in disturbed series with special reference to Wolfers s sunspot numbers, Philos. Trans. Royal. Soc. London, Series A, 226:267-298, 1927. [2] R.F. Engle, Autoregressive conditional heteroscedasticity with estimates of the variance of U.K. inflation, Econometrica, 50:987-1008, 1982. [3] T. Bollerslev, Generalized autoregressive conditional heteroscedasticity, J. Econ., 31:307-327, 1986. [4] H. Tong Threshold models in nonlinear time series analysis, Lecture Notes in Statistics 21, Springer, Heidelberg, 1983. [5] J.D. Hamilton, A new approach to the economic analysis of nonstationary time series and the business cycle, Econometrica, 57:357-384, 1989 [6] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 4:359-366, 1989