Short-term Streamflow Forecasting: ARIMA Vs Neural Networks,JUAN FRAUSTO-SOLIS 1, ESMERALDA PITA 2, JAVIER LAGUNAS 2 1 Tecnológico de Monterre, Campus Cuernavaca Autopista del Sol km 14, Colonia Real del Puente, 6279, Xochitepec, Morelos MEXICO juan.frausto@itesm.mx, http://www.itesm.mx 2 Instituto de Investigaciones Eléctricas, Reforma 113, Col Palmira, 6249, Cuernavaca, Morelos MEXICO {epj, jlagunas}@iie.org.mx http://www.iie.org.mx Abstract: - Streamflow forecasting is ver important for water resources management and flood defence. In this paper two forecasting methods are compared: ARIMA versus a multilaer perceptron neural network. This comparison is done b forecasting a streamflow of a Mexican river. Surprising results showed that in a monthl basis, ARIMA has lower prediction errors than this Neural Network. Ke-Words: - Auto regressive Integrated Moving Average, Artificial Neural Networks, Streamflow, Forecasting. 1 Introduction Man activities regarding water resource sstems require a forecast streamflow. An accurate prediction helps to optimize other issues such as electric generation, future expansions, and so forth. In addition, in order to minimize the cost of generation, a plan that includes optimal coordination of hdropower and thermal power generation is needed; this optimal plan can be achieved whether forecast models are correctl designed to predict water filling of dams, especiall in rain seasons. A more accurate prediction allows the replacement of thermal power generation when its cost is higher; and then the dams can be empting so enough, in such a wa, that for the next rain season, the dams can be filled as much as possible. As a result, these sstems will have the lowest energ cost ever ccle (usuall a ear). This process is usuall named hdro-thermal coordination, which economical impact is ver high whether the raining filling of dams is correct, which a ke point is stream flow forecasting. Forecast of Streamflow is defined as the prediction of the water amount discharged on a specific waterwa or a river during a certain period of time. A classical methodolog for carring out this prediction is presented b Bowerman [2], which uses time series (TS) of the data. A TS is a sequences of observations of a variable for one or more periods should be predicted. TS analsis plas an important role in hdrological research area. TS are handled with mathematical models to predict new records and identif trends and changes on hdrological records. TS models can be classified in two categories depending on the number of time series involved in the model: a) singlevariable models and b) models using exogenous variables [1]. One of the main TS models is ARMA (Auto-Regressive Moving Averages), and one of its variations is ARIMA (Auto-Regressive Integrated Moving Average); ARIMA is considered the most effective ARMA method. The name Box & Jenkins methods is commonl used when one of the ARMA methods is used. Other alternative for stream flow prediction is Artificial Neural Networks (ANNs), which also use TS data. ANN is a data-driven method with a flexible mathematical structure which is able to identif complex non-linear relationships among input and output data sets. An important feature of ANNs is that the do not need to have an explicit model of the sstem the are forecasting. As it is well known ANNs are an analog with natural neural networks in human brain (and probabl some other animals), where the learning sstem is not located in ever particular neuron but in the function which describes the connections among neurons. Tang [6] compared several streamflow forecasting models. Tang found that among TS methods with long memor, Box & Jenkins methods had the best performance. However, it was also reported in [6] that among TS methods with short-term memor, ANNs had the best performance. Kisi in 5 [3] reported a better performance of ANNs to forecast streamflows than ARIMA models. Wen [1] also obtained a better performance with ANNs than ISSN: 179-5117 42 ISBN: 978-96-6766-47-3
ARIMA methods. In other hand Wang and Salas [4] obtained good forecast results with Box & Jenkins models when applied them to stream flows into Colorado River Sstems. As can be noticed ARIMA models had better performance than ANNs onl when the were applied to TS with long-term memor. However, the depth reason of this result could be into the TS features. In this paper a new real case of streamflow forecasting is presented which ARIMA Vs ANN models are compared. The variable used to predict the flow rate (or river discharged) is measured in m 3 /s. Forecasting is done for a short term period in monthl basis (i.e. the unit of time is a month). 2 Streamflow data and measurement errors Data set from San Juan Tetelcingo River is being used in order to test models. This River is the principal stream from a Mexican river basin. TS data were compiled for 1 ears (1996 to 6) and used for training the models. Figure 1 shows an explorator analsis of TS data. As can be observed, these TS data can be supposed as stationar with a seasonal component [1] and it will be taken into account for both; ARIMA and NN models. Disch a r ge m 3/s 9 8 7 6 E ne-96 Jul-96 E ne-97 Jul-97 E ne-98 Jul-98 E ne-99 Jul-99 E ne- Jul- E ne-1 Jul-1 E ne-2 Jul-2 E ne-3 Jul-3 E ne-4 Jul-4 E ne-5 Jul-5 E ne-6 Jul-6 Fig.1 San Juan Tetelcingo time serie. A prediction error is calculated in order to assess the adequac of each model in terms of how well each one is forecasting. Therefore, two tpes of measurement errors are used: Mean Absolute Error MAE = 1 m m i= 1 t ˆ + i t+ i (1) Where t + i : Observed data in the series belonging to the prediction set. ˆ t + i : Values predicted b ARIMA model. Mean Absolute Percentage Error MAPE = 1 t + i (2) m t + i ˆ m 1 i = t + i 3 ARMA and ARIMA Models One of the most important and highl popularized TS classes of models is named ARMA, which the basic models are named AutoRegressive (AR) and Moving Averge (MA). ARMA is onl a combination of AR and MA while ARIMA model includes the seasonal component to the ARMA model [2]. 3.1 Autoregressive process and Moving Average, (ARMA) The AR forecasting model is: Y t = φ 1 Y t-1 + φ 2 Y t-2 +...+φ p Y t-p + ε t (3) Where Y t is the estimated variable in the period t in terms of the first p data in TS; ε t is the error of the model versus the real data in the period t, and all the φ s are determined b a simple regression model. This model is denoted AR(p) because p data are taken into account. In a MA model, Y t is estimated around the average µ of the TS data; this is done b a ponderation of the errors ε t in q previous periods to period number t: Y t =µ t - θ 1 ε t-1 - θ 2 ε t-2 -...- θ q ε t-q (4) Because the number of errors in equation (4) is q, this model is represented as MA (q). ARMA model combines (3) and (4) as follows: Y t = φ 1 Y t-1 + φ 2 Y t-2 +...+φ p Y t-p + ε t - θ 1 ε t-1 - θ 2 ε t-2 -...- θ q ε t-q (5) The equation (5) can be written as an autoregressive infinite process: Y t = µ t - i= 1 i 1Yt i θ (6) Or in a reduced form φ p (B)Y t = θ q (B)ε t (7) Equation (7) is referred as an ARMA(p,q) model. ISSN: 179-5117 43 ISBN: 978-96-6766-47-3
3.2 Autoregressive Integrated Moving Average Process (ARIMA) Two principal issues should be taken into account to select ARMA models for hdrological TS data: ARMA model suppose Stationar TS ARMA model do not integrate Seasonal component When a TS is non stationar, this issue can be integrated b defining a new variable as follows: Z t = t - t-1 where t=2,3,..n. Then the second differences are determined b z t = ( t - t-1 )-( t-1 t-2 ) for t=3,.4,..n. The resulting model, integrating stationarit is known as ARIMA (integrated none stationar issue to ARMA). In order to integrate the seasonal component, ARMA is adapted again and the resulting model is named SARMA. Therefore two modes are derived from ARMA: ARIMA (Auto-Regressive Integrated Moving Average) SARMA (Seasonal Auto-Regressive Moving Average) Which in turn are combined into a more complete model: SARIMA (Seasonal Auto-Regressive Integrated Moving Average). In practice it is common to use the general term ARIMA to group all of them. These models are explained as follows: Firstl, delas in processes and random perturbations can be represented b a periodical form in ever seasonal pattern. If the data are in a monthl basis, then the seasonal delas (s) can be set as one ear (i.e. s = 12). Seasonal delas occur because mutual dependence in similar periods of successive ears. For instance a TS data of the streamflow for March 94, March 93, and March 92 ma have variations in the date for the maximum discharge of streamflow. SARMA and ARMA models are ver similar but the former is able to represent seasonal delas. However, SARMA (as ARMA model) suppose stationarit. In order to integrate a non stationarit component a SARIMA model is used. Other interesting model is obtained b combining seasonal with no seasonal models; the resulting model is able to correctl represent trend, seasonalit and non stationar component of a TS. This is the combination of ARIMA and SARIMA models represented as follows: ARIMA (p, d, q) x SARIMA (P, D, Q) Where p: order of the autoregressive model AR. d: differentiation order in the regular or nonregular part of the stationar series. q: order of the Moving Average model MA. P: order of the Seasonal Autoregressive SAR model. D: differentiation order in the seasonal part of the series. Q: order of a Seasonal Moving Average, SMA. An ARIMA(p,d,q)x(P,D,Q)s model can also be written as: φ P (Bs) (1-Bs)D φ p (B)(1-B)dY t = θ Q (Bs) θ q (B)ε t (8) but Y t = t - µ, Therefore (8) can be written as: φ P (Bs) (1-Bs)D φ p (B)(1-B)d t = δ+θ Q (Bs) θ q (B)ε t (9) where δ is a constant value. 3.3 ARIMA model identification and estimation Initial model identification is done using the autocorrelation function (ACF) and a partial autocorrelation function (PACF) [2]. Several experiments was realized with the TS data of the Mexican river San Juan Tetelcingo. ACF and PAFC let to estimate p and q ARIMA parameters. Tests were made with the following parameters: A: ARIMA(,1,1)*(1,1,)12 B: ARIMA(1,,)*(,,1)12 C: ARIMA(1,1,)*(,1,1)12 Then the best model among the three later models was chosen using Akaike s Information Criterion (AIC), obtaining the model B. As a consequence, the ARIMA model chosen is specified as ARIMA(1,,)*(,,1)12. Table 1, shows the obtained parameters of this model, where θ Q and θ q are the parameters of equation (9): Table 1: Firs set of ARIMA Parameters Parameter Value AR(1)= θ Q.61 SMA(1)= θ q.719 Replacing the later parameters in equation (9) we obtain: (1-B 12 )(1-B) t =(1-.61 B 12 ) (1-.719 B) ε t (1) Equation (1) represents the forecast equation, for estimating the new values t in the period t in function of residuals ε t in the same period. ISSN: 179-5117 44 ISBN: 978-96-6766-47-3
Finall the last step is to test the resulting ARIMA model b the examination of the one-step prediction residuals {ε t }. This was done b a statistical classical test based on the autocorrelation function of the residuals. Figure 2 shows how the model fit with the original time series. Figure 3 shows forecasting results with ARIMA model while its prediction errors are shown in Table 2. Table 2.Prediction Error with ARIMA model ARIMA Model MAE MAPE 29.35.159 9 8 7 6 Ene-97 Ene-98 Ene-99 Ene- Ene-1 Ene-2 Observed data Ene-3 Ene-4 Fit Data Fig.2 Observed data and how the model fits with the data series. Period: Januar 1996 - December 6. 6 Jan- Feb- Mar- Apr- Ma- Jun- Observed data Jul- Aug- Sep- Predicted data Oct- Ene-5 Nov- Ene-6 Dec- JAN 8 Fig.3, Observed data and predicted Series. Period: Januar 2 Januar 8. 4 Artificial Neural Network Even though the variet of neural networks is ver high, the multilaered perceptron is the most widespread neural network structure. Therefore, the ANN architecture used in this paper for forecast streamflow presented in this paper is the perceptron shown in figure 4.. Besides, this structure is ver efficient for TS forecasting [7]. Figure 4 is an example of an ANN with 4 laers totall interconnected; this ANN is feed in a forward direction (i.e. the information flows onl from the input to the output).. x 1 x 2 x 3 Input Laer n 1 n 2 n C Hidden Laer Output Laer Fig.4, Example of an ANN with feed forward direction The forecasting methodolog includes the steps shown in sections 4.1 to 4.4. 4.1 Obtaining Patterns of Training TS data are divided into two data sets: Training data set: Consisting of 8% of the TS data used for training the neural network.. Testing data set: Consisting of 2% of the TS data. These are the remaining data, once the training dataset were selected. This data set are used to evaluate the performance of the network. Then the next prediction equation is used: ˆ X φ { w + w φ w + w x } (11) t = co ho ch th t j h i Where w ch : Weights representing the connections among the input and the hidden neurons. w co : Weights representing the connection between the input and the output. w ih and w ho : Weights for the other connections among the input and the hidden neurons and among the hidden neurons and the output, respectivel. Φh and Φo: Activation functions for the hidden laer and the output respectivel. ANN weights are estimated b minimizing the sum of squares of the errors of the TS data used for training phase. ( x ) 2 t x S = ˆ t (12) Then S is obtained b minimizing equation (12) using the classical back propagation algorithm. ISSN: 179-5117 45 ISBN: 978-96-6766-47-3
4.2 Re-scaling data This step consists on transforming the TS data values into the range between and 1. This is done b using the following formula: z t t Min = (13) Max Min where: t : observed values of the time series Min: minimum value of the time series Max: maximum value of the time series z t : New values <z t < 1 obtained using (13). correspond to the forecasting horizon, i.e. to the number of forecasts to be simultaneousl calculated in the network output. Alternativel, a single output node can be used and all the future forecasts required are determined in iterative steps. Finall, it is also known that onl one hidden laer is required for man ANN applications. In practice, the single-step-ahead forecaster is most frequentl selected because it is relativel simple and guarantees the most accurate forecasting results. Therefore, the next parameters were fixed in the present application: Inputs: 12 Hidden laers: 1 Hidden laers nodes: 7 Outputs: 1 4.3 ANN Topolog and parameters In order to obtain the best result, ANN parameters are tuned. In this paper the next topolog was considered: ANN Tpe: Feed Forward Interconnection tpe: Totall inter-connected Activation function: sigmoid Training Algorithm: back propagation This is the main task for building an ANN structure. This task is ver hard because it requires from the designer a lot of practical experience and sensitivit. Therefore, this job is more a kind of art than an expert s routine. The principal activities to be carried out when the network architecture is designed are: Determination of input nodes required, Determination of output nodes, Selection of the number of hidden laers, Selection of the number of hidden neurons, Selection of the activit function of neurons. Determination of the required number of input nodes is a relativel eas task, because it depends predominantl on the number of independent variables presented in the data set. As a rule, each independent variable should be represented b its own input node. In the case of input data prepared for forecasting, the number of input nodes is directl determined b the number of lagged values to be used for forecasting of the next value; for instance x(t+1) = f [x(t), x(t-1), x(t-2),, x(t-n)]. It should be noticed that again, the determination of the number of output nodes, is a problem-oriented task. For the one step-ahead forecasting, it is known that onl one output node is required. Correspondingl, in the case of multistep-ahead forecasting, the number of output nodes should 4.4 Network training strateg A learning rate is fixed for all the weights during the training iterations. In order to prevent oscillations and to achieve convergence to the global minimum (or close to it), the learning rate must be kept as small as possible, but in order to reduce the training time the adaptive learning rate should tuned bigger that one. Therefore, and after experimentation the next parameters were fixed: Setting Parameters: Maximum number of iterations: Error convergence allowed:.1 Learning rate:.1 adaptive learning rate: 1.5 4.5 Experimental Results Figure 5 shows the results during the training phase of the ANN used. It can be observed how the model fits with TS data before the testing phase is done. It can be noticed that the results obtained during the training phase fits well with the TS data; statistical test confirmed this observation. Figure 6 shows ANN results during the testing phase. Prediction errors b using ANN are shown in table 3. Randoml comparisons of forecasting errors of ANN versus ARIMA showed that ARIMA has the better forecasting qualit; for instance for the results shown in tables 2 an 3 it can be observed that ARIMA is surprisingl much better that ANN. Table 3. Prediction error with ANN Modelo ANN MAE MAPE 11.73.19 ISSN: 179-5117 46 ISBN: 978-96-6766-47-3
9 8 7 6 Ene-96 Ene-97 Ene-98 Ene-99 Ene- Ene-1 Observed data Ene-2 Ene-3 Ene-4 Predicted data Fig.5, Observed data and ANN results obtained during the training phase. Period: Januar 1996 - December 6. Ene-5 Ene-6 Models, Turkish J. Eng. Env. Sci. 29, 5, pp. 9-2. [4] Wang D.C., Salas J.D. Forecasting Streamflow for Colorado River Sstems, Colorado State Universit, December 1991. [5] Makridakis, Spros G. Forecasting, methods and applications. John Wille & Sons. 1983. [6] Zaiong Tang, de Almeida, Chrs, Fishwick, Paul A. Time series forecasting using neural networks vs. Box-Jenkins methodolog, SIMULATION, 1991, pp. 33-31. [7] Palit, A. K. and Popovic, D. Computational Intelligence in Time Series Forecasting: Theor and Engineering Applications (Advances in Industrial Control). Springer-Verlag New York, Inc. 5. 6 Jan- Feb- Mar- Apr- Ma- Jun- Jul- Aug- Sep- Oct- Nov- Dec- Jan 8. Obseved data Predicted data Fig.6, Observed data and predicted Series. Period: Januar 2 Januar 8. 5 CONCLUSIONS Two methodologies for stream flow forecasting were analzed in this paper: ARIMA and Neural Netwoks. In the second case the perceptron model and Back propagation were used. Ten ears of data were collected from a real case and the were used for training these models. ANN parameters were carefull tuned experimentall. Experimental results showed that ARIMA obtained much better prediction results than ANN perceptron model. References:. [1] Wen Wang, Stochasticit, nonlinearit and forecasting of streamflow process, IOS Press, August 6 [2] Bowerman B.L, O Connell R.T. Forecasting and Time Series: an applied approach. Wadsworth, Inc. Third Ed. 1993. [3] Ozgur Kisi, Dail River Flow Forecasting Using Artificial Neural Networks and Auto-Regressive ISSN: 179-5117 4 ISBN: 978-96-6766-47-3