Using reservoir computing in a decomposition approach for time series prediction.

Size: px

Start display at page:

Download "Using reservoir computing in a decomposition approach for time series prediction."

Kory Crawford
5 years ago
Views:

1 Using reservoir computing in a decomposition approach for time series prediction. Francis wyffels, Benjamin Schrauwen and Dirk Stroobandt Ghent University - Electronics and Information Systems Department Sint-Pietersnieuwstraat 41, 9 Gent - Belgium Abstract. In this paper we combine wavelet decomposition and recurrent neural networks to provide fast and accurate time series predictions. The original time series is decomposed by means of wavelet decomposition into a hierarchy of time series which are easier to predict. The prediction core of our solution is given by reservoir computing, which is a recently developed technique for the very fast training of recurrent neural networks. The three time series of the ESTSP 28 competition will be used as an illustration for our method. 1 Introduction Forecasting is a domain with a broad range of useful applications. Therefore, researchers working on time series prediction come from a wide variety of fields and are using many methods such as theta method [1], support vector machines, neural networks, local modeling [2], wavelet-decomposition [3] and many more. This year, for the second time, the European Symposium on Time Series Prediction is held. This symposium always presents a challenging competition in the domain of time series prediction. This year the competition concerns the prediction of three different time series which can be found on the website Each time series has different properties which is interesting because this will reveal the strength and weaknesses of the many methods that are applied on the time series during the contest. Although we have no profound experience in the domain of time series prediction, we wanted to join this competition in order to compare our method with many others in the forecasting domain. In this paper we describe how a combined approach of wavelet decomposition and reservoir computing can be used for forecasting. Before we continue the full explanation of our method we give a short overview of reservoir computing which will be the baseline of our method. This research is partially funded by FWO Flanders project G and the Photonics@be Interuniversity Attraction Poles program (IAP 6/1), initiated by the Belgian State, Prime Minister s Services, Science Policy Office.

2 2 Reservoir computing: a short overview Reservoir computing is a novel technique for the fast training of large recurrent neural networks which have been successfully applied in a broad range of temporal tasks such as robotics [4], speech recognition [5, 6] and time series generation [7]. Last year, reservoir computing outperformed all other methods in the NN3 competition for financial time series prediction [8]. The reservoir computing technique is based on the use of a large untrained dynamical system, the reservoir, where the desired function is implemented by a linear memory-less mapping from the full instantaneous state of the dynamic system to the desired output. Only this linear mapping is learned. When the dynamical system is a recurrent neural network of analog neurons the method is referred to as echo state networks [7]. When spiking neurons are used, one often speaks of liquid state machines [9]. But both are now commonly referred to as reservoir computing [1]. Training is done in a supervised way by first driving the reservoir with teacher forced inputs and/or teacher forced output feedback. Secondly, the output layer is trained by using linear regression methods. This is summarized by the following equations: ( ) x[k + 1] = f Wres res x[k] + Winpu[k] res + Wouty[k] res + Wbias res ŷ[k + 1] = W out res x[k + 1] + W out inp u[k] + W out bias, (1) where x[k] is the reservoir s state, u[k] is the input, y[k] is the desired output and ŷ[k] is the actual output. When analog neurons are used, the nonlinearity f often represents the sigmoid function. All the weight matrices denoted by W out are trained, while those denoted by W res are fixed and randomly created. During testing, the teacher forced output feedback y[k] is replaced by the actual output ŷ[k] which we call free run mode. Because only the output weights are changed, training can be realized very quickly which can be an additional benefit in comparison with other methods. Additionally, reservoir computing doesn t suffer from local optima like other methods based on neural networks do. We conclude that reservoir computing gives us a powerful tool that can be easily used in a broad range of applications. 3 Time series prediction Now we have introduced the baseline of our prediction mechanism we can formalize our overall prediction scheme which is illustrated in Fig. 1. We present now each of the modules in greater detail.

Normalize Decompose Discard d1 Rescale Recombine Predict (RC) Fig. 1: This is a schematic overview of our prediction methodology. We start by normalizing the given time series.

3 Normalize Decompose Discard d1 Rescale Recombine Predict (RC) Fig. 1: This is a schematic overview of our prediction methodology. We start by normalizing the given time series. Next, the normalized time series is decomposed by means of wavelet decomposition what results in a trend and a bunch of detail coefficients. Hereafter the level 1 detail coefficient is discarded. The obtained components are predicted separately using reservoir computing (RC). Finally, the predicted components are combined and rescaled in order to get a prediction of the given time series. 3.1 Normalizing the time series Because we want to work in both the linear and nonlinear part of our recurrent neural network we need to normalize the time series to the interval [ 1, 1]. Otherwise all neurons would be saturated and thus loosing information. Normalization is done by removing the mean and dividing the outcome by the maximal absolute value: 3.2 Decomposition x = x x x norm = x / max ( x ), (2) Time series can be very often decomposed into components with different dynamics: a trend, periodical effects (sometimes denoted as seasonal effects) and irregular residual components. In [3] wavelet decomposition was motivated because of the easy analysis of the obtained components. A second motivation to use decomposition of the original time series is inherent to the use of reservoir computing which tend to be sensitive to a small temporal range [11]. This can be a problem with time series which contain information on different timescales. When there is no additional information available about the time series, decomposition can be done by using a set of successive filters. This is also known as multiscale decomposition and described in more detail in [12]. The filters are obtained by rescaling the so called mother wavelet. When the filters are applied iteratively, one obtains a slow varying trend series and a hierarchy of detail components which contain the system s dynamics at different timescales [3]. Because

4 the system s dynamics are now splitted up into different timescales processing will be a lot easier with reservoir computing. The number of iterations L depends on the length N of the time series and is limited by L max = log 2 N. But less iterations can be considered by inspecting the derived detail components. After L iterations, time series y[k], k = 1...N, with length N can be written as the sum of the trend c L [k] and L detail coefficients d m [k], m = 1...L: y[k] = c L [k] + L d m [k], (3) For our experiments we used the MATLAB Wavelet toolbox for decomposition of the time series. The filters we used were obtained from the discrete Meyer filter because this gave components which have few discontinuities. But we suspect that a Daubechies filter of a sufficient high order is also feasible. In Fig. 3 the first time series of the ESTSP 28 competition is shown with its trend and detail coefficients using a level eight decomposition. The most noisy coefficient d 1 is not illustrated. By way of inspecting the derived trend and detail components we decided that level eight decomposition produced sufficient smooth components for the first and second time series of the ESTSP 28 competition. For the third time series we used level 12 decomposition because this gave smoother and more predictable coefficients. 3.3 Prediction The trend and detail components we obtain from decomposition are used to predict the original time series. We neglect the level 1 component d 1 because it is too noisy to predict. The remaining components are predicted separately by means of reservoir computing. This has as an important implication that correlations between different components are neglected. We plan to investigate the use of many timescales within a reservoir so that all components can be predicted at once instead of training them separately. This would boost calculation time and has the benefit of combining possible correlation between the components. For each component a fully connected reservoir with 5 sigmoid neurons, one output and only output feedback as an input is constructed. No other external inputs are used. An illustration of this topology can be seen in Fig. 2 The connection weights are scaled so that the largest eigenvalue is nearly 1 which makes the reservoir nearly unstable. The output feedback weights were scaled to.1. Depending on the desired output, classical neurons, leaky neurons [7] or band-pass neurons [11] are used. A rule of thumb here is that leaky neurons are used for the slowest components, band-pass neurons for the faster components and classical neurons for the fastest varying components. We determined the leak-rates and band-pass neurons manually based on the frequency spectrum of the components. Because we want to generate the future of a time series, the output is fed back to each of the neurons in the reservoir. During training the desired signal is used as feedback using teacher forcing as we previously explained. m=1

5 Reservoir with N nodes Output node Random but xed interconnections Trained interconnections Fig. 2: Schematic overview of the described reservoir topology. The reservoir has only output feedback as an input. Only the output weights, presented by dashed lines, are trained. All other connection weights are initialized randomly and scaled so that the largest eigenvalue is nearly 1. In order to avoid overfitting we use ridge regression to train the output which proved to have good regularization properties [13], even for generation tasks. For training and testing we divide each component in three parts: one for training (the largest part), and the two last parts (which have lengths equal to the desired prediction horizon) for validation and testing. The final results for both testing and the competition were obtained by first training the reservoir using teacher forcing with the largest part. The optimal regularization parameter is determined using the performance of the reservoir in predicting the validation part. Next, the reservoir is retrained by teacher forcing it with the first and the second part and using the obtained optimal regularization parameter in order to predict the third (known) testing part. This part is used for evaluation of our approach and these results are presented in the next section. Finally, we train the reservoir again using the complete component in order to predict the unknown samples which are needed for the competition. We repeat this process ten times for each of the components, each time using an other reservoir. The unknown samples were generated by the reservoir which had the best performance on the testing part.

6 3.4 Composition and rescaling Recombination of the components can be done by using equation 3. Afterward the composed time series is rescaled again to undo the normalization. 4 Experimental results The goal of the ESTSP 28 competition is to predict three different time series each for a different prediction horizon. The evaluation of time series y with length N and its prediction ŷ is done by calculating the NMSE: NMSE = N t=1 (y t ŷ t ) 2, (4) Lσ 2 y For the first time series we have to predict the next 18 samples based on a history of 354 samples. Two additional time series were given which could be helpful for prediction of the first time series. After first trying a few prediction setups were these additional time series were used as an input of our reservoir we decided to neglect these external variables. This because they gave no significant improvement. The final result with decomposition of the time series is presented in Fig. 3. A NMSE of.25 was obtained on the last known 18 samples. The complete training and prediction procedure takes nearly 2 minutes using an average desktop computer with two gigabyte of memory and a 2.4 GHz Intel based CPU. The second time series of the ESTSP 28 competition consists of 13 samples of which we have to predict the next 1 samples. This time series has a period of 7 samples which makes it convenient to think that it was sampled from a daily updated variable. This thought becomes more pronounced when we cut this time series into sets of 365 samples and look to the correlations between the different sets. Altough we wanted to use this analysis first as additional information into a different approach, we choose to reject it because our results were comparable to the result we have now. We wanted to have one consistent methodology for the three time series. The predictions are shown in Fig. 4. A NMSE of.14 was obtained which is the best of the three given time series. A total time of nearly 5 minutes was needed to complete the training and testing procedure. For the third time series we got samples of which we had to predict the next 2 samples. Completion of the prediction procedure took three hours which is due to the long sample history and the use of more decomposition levels (and thus needing more reservoirs for prediction of the components). The results are shown in Fig. 5. A NMSE of.42 was obtained which gives us the worst performance, this possibly due to the discrete jumps in the trend and the many noisy detail coefficients that we derived from decomposition.

7 3 Time series a8 d7 d5 d ! !.1.1!.1.1! d8 d6 d4 d2! !.1.1!.1.1! Fig. 3: In solid black lines the original time series 1 of the ESTSP 28 competition and its decomposition (using level eight wavelet decomposition) into its trend and detail coefficients are shown. The level 1 detail coefficient is not shown because it was too noisy to predict. The last 18 samples of the original time series and its components were predicted in order to evaluate our prediction methodology which is given in a dashed gray line. This resulted in a NMSE of.25. The 18 unknown samples which were predicted for the competition are shown in a solid gray line.

8 x Time series x 1 8 Subset of time series Fig. 4: At the top, the complete time series 2 of the ESTSP 28 competition is shown in a solid black line. Our method was evaluated on the last 1 samples which gave a NMSE of.14. At the bottom, these predictions are marked with a dashed gray line. The next 1 unknown samples for the competition are marked with a solid gray line.

9 3 25 Time series x Subset of time series x 1 4 Fig. 5: At the top, an impression of the complete time series 3 of the ESTSP 28 competition is given. The last 2 samples were used for evaluating our technique which resulted in a N M SE of.42. These predicted samples are shown with a dashed gray line. Our prediction for the unknown future of 2 samples is illustrated with solid gray line.

10 5 Conclusions In this work a prediction scheme for fast and accurate time series prediction based on wavelet decomposition and reservoir computing was presented. By using time series decomposition, components of different time scales were obtained which are easier to predict. The obtained trend series and detail coefficients were predicted using reservoir computing. We evaluated our method on a known part of the three time series of the ESTSP 28 competition. In the end, the unknown samples of the three time series were generated. For future work we plan to use a setup using a single reservoir for both decomposition and prediction. This will give us a prediction mechanism which is able to use the interdependence between the obtained components. Therefore we will need to investigate first how many timescales can be used within one reservoir. References [1] V. Assimakopoulos and K. Nikolopoulos. The theta method: a decomposition approach for forecasting. International journal of forecasting, 16:521 53, 2. [2] J. McNames. Innovations in local modeling for time series prediction. PhD thesis, Stanford University, [3] S. Soltani. On the use of the wavelet decomposition for time series prediction. Neurocomputing, 48: , 22. [4] Eric. A. Antonelo, Benjamin Schrauwen, and Jan Van Campenhout. Generative modeling of autonomous robots and their environments using reservoir computing. Neural Processing Letters, 26(3): , 27. [5] Mark D. Skowronski and John G. Harris. 27 Special Issue: Automatic speech recognition using a predictive echo state network classifier. Neural Networks, 2(3): , 27. [6] D. Verstraeten, B. Schrauwen, D. Stroobandt, and J. Van Campenhout. Isolated word recognition with the liquid state machine: a case study. Information Processing Letters, 95(6): , 25. [7] Herbert Jaeger. The echo state approach to analysing and training recurrent neural networks. Technical Report GMD Report 148, German National Research Center for Information Technology, 21. [8] H. Jaeger. Background information: Jacobs university smart systems seminar wins international financial time series competition, 27. [9] W. Maass, T. Natschläger, and H. Markram. Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation, 14(11): , 22. [1] D. Verstraeten, B. Schrauwen, M. D Haene, and D. Stroobandt. An experimental unification of reservoir computing methods. Neural Networks, 2:391 43, 27. [11] F. wyffels, B. Schrauwen, D. Verstraeten, and D. Stroobandt. Band-pass reservoir computing. In Proceedings of the International Joint Conference on Neural Networks, 28. [12] Ingrid Daubechies. Ten Lectures on Wavelets (C B M S - N S F Regional Conference Series in Applied Mathematics). Soc for Industrial & Applied Math, December [13] F. wyffels, B. Schrauwen, and D. Stroobandt. Regularization methods for reservoir computing. In Proceedings of the International Conference on Analog Neural Networks (ICANN), 28. (accepted).

A First Attempt of Reservoir Pruning for Classification Problems

A First Attempt of Reservoir Pruning for Classification Problems Xavier Dutoit, Hendrik Van Brussel, Marnix Nuttin Katholieke Universiteit Leuven - P.M.A. Celestijnenlaan 300b, 3000 Leuven - Belgium Abstract.