Improved the Forecasting of ANN-ARIMA Model Performance: A Case Study of Water Quality at the Offshore Kuala Terengganu, Terengganu, Malaysia Muhamad Safiih Lola1 Malaysia- safiihmd@umt.edu.my Mohd Noor Afiq Ramlee1 Malaysia- afiqramlee91@gmail.com Vigneswary a/p Ponniah1 Malaysia- vignes2538@yahoo.com Nurul Hila Zainuddin1 Malaysia- Hila.zainuddin@gmail Razak Zakariya2 School of Science Marin, University Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia-ajak@umt.edu.my Md Suffian Idris2 School of Science Marin, University Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia-suffian@umt.edu.my Idham Khalil2 School of Science Marin, University Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia- idham@umt.edu.my Abstract Researches obtain motivation to produce innovative models for managing water resources after precise predictions of time series have been made. Water quality time series data are complex in nature causing them to be difficult to predict. Hence, no single models of ARIMA and ANN are able to handle both the linear and nonlinear relationship well. In this study, a hybrid method that is able to utilize the advantages of time series approaches and also artificial neural networks. Findings revealed that that hybrid models water temperature, dissolved oxygen, ph and salinity provides much better prediction as compared to traditional model. Hence, hybrid approach can be an effective way in predicting water quality time series compared than using available models separately. Keywords: Artificial Neural Network, Water Quality, ARIMA, Hybrid Models 1School of Informatics and Applied Mathematics, University Malaysia Terengganu, Kuala Terengganu 2 School of Science Marin, University Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu Introduction Numerous research have been done using time series model, Seasonal Autoregressive Integrated Moving Average (SARIMA), Artificial Neural Network (ANN) and Autoregressive Integrated Moving Average (ARIMA). However, one of the major weakness of ARIMA is the time series which will be generated are from linear component. It has difficulties in capturing the nonlinear component. On the other hand, Artificial Neural Network (ANN) are of nonlinear nature is influenced by the behavior of neurons in them. It can approximate the function to a satisfying level of accuracy. Hence, a hybrid model combining ARIMA and neural network back propagation model is proposed. The use of hybrid models in water quality time series data could be an added advantage in capturing patterns of data sets and could improve the prediction accuracy. The motivation behind this hybrid approach is mainly due to the reason that water quality are real data sets which are complex and any single model approaches would not be sufficient to determine the patterns well. In this study, we will able to predict the water quality time series from the developed hybrid model, NNARIMA and evaluate its performance. (Zhang, 2003) Materials and Methods 1
Study area and water quality data This study was carried out around the coast of the South China Sea in the area of Kuala Terengganu, Kampung Marang, Kampung Setiu and Kuala Besut. The data that involved in this study are In-situ in 2015 (30th April to 3rd May). Figure 1 shows the research area in Terengganu in which the study was done. A 126 data was collected from 26 sampling stations at different depths. Figure 1: Research area at offshore Kuala Terengganu, Terengganu, Malaysia. ARIMA modelling approach ARIMA model is formed only when the series is not stationary. The model is represented by a general term ARIMA (p, d, q) as follows (1) where p and q are the number of autoregressive terms and the number of lagged forecast errors in the prediction equation, respectively. The number of p, d and q are obtained by looking at the ACF and PACF plots. The ARIMA modelling approach consist of three steps: model identification, parameter estimation and diagnostic checking. The model identification compromises of two steps that is determining whether the series are stationary and also the examining the ACF and PACF functions. The model with the minimum Akaike s Crriterion is chosen as the best fit model. Artificial Neural Network modeling One of the advantages of the neural network model compared to other nonlinear model is neural network is universal estimators that can emulate the class of functions with a high level of accuracy (Zhang et al., 1998). The strength of their estimation is a parallel information processing with data. The initial assumption is not necessary to establish this model when the model building process. Instead, the network model is largely determined by the characteristics of the data. A feed forward circuit (feed forward) are among the hidden layer model that is widely used to model time series and forecasting. This model is characterized by three mobile network layer of simple processing units linked by a series of a cyclic. The relationship between output ( ) and input ( ) has the following mathematical representation (Khashei and Bijari, 2011): q p (2) y w w. g( w w. x ) t o whereby (i=0,1,2,,p) and (j=0,1,2,,q) are parameter of the model which also called as connection weights, p, q, and and g are the number of input nodes, the number of hidden nodes, error term, weights of the arcs leaving from the bias terms and sigmoid equation, respectively. Activation functions consist of a few forms and are represented by the condition of neurons in the network (Khashei and Bijari, 2011) as follows: (3) ( ) ( ) ( ) j 1 j o i 1 ij t, i t, (4) ( ) ( ) Normally, an artificial neural networks model in Eqs. (1) perform in terms of nonlinear functional mapping from the past observations i.e., to the future values of, which are (4) ( ) where,, f(.)is a function determined by the network structure and connection weights and is error term.then,the general structure of neural network as shown in Figure 2. Figure 2: General structure of neural network 2
Hybrid Model of NNARIMA In order to produce more general model, a linear hybrid model and more accurate nonlinear is produced naming as neural network autoregressive integrated moving average (NNARIMA) hybrid model. This model, the time series is also considered as a function of linear and nonlinear components. (5) = where and are linear and nonlinear components respectively. In the first stage, the main objective is to obtain a linear model. Therefore, ARIMA model is used to model linear components. Error from the first stage which contains nonlinear relationship or could also be linear relationships which linear models could not capture or solve (Kashei and Bijari, 2011) here, we represent the error at time t as (6) The values of the predictable and linear modelling error is the result of the first stage of which will be used in the next stage. In addition, the linear trend magnified by the ARIMA model to be used in the second stage. In the second stage, the main focus is nonlinear model. Thus, the multi-layer perceptron is used to model the nonlinear relationship and the possibility of simultaneous linear model which remains in error linear models and linear and nonlinear relationships in the original data. Therefore, errors can be modelled using neural network to identify the nonlinear relationship. With n input nodes, neural network model for the error is as follows (7) ( ) where f is nonlinear function which is dependent on the neural network and is random variable. Then the combined forecasting model would be (8) Comparison of ARIMA, ANN and hybrid model of NNARIMA Both linear and non-linear model was used to set data, although more or less linearity have been found in this series. Only one step ahead predictions considered. Two key performance indicators including MAE (mean absolute error) and RMSE (mean square error), which is calculated from the following equation, will be used to measure the performance of the predicted models. RMSE checks the overall performance of the model while MAE evaluates the model. (9) (10) ( ) Results and discussion ARIMA modelling In this study, several steps are made to choose the ideal ARIMA model parameters. The model parameters that satisfy the residuals diagnostic checking. In the identification stage, the autocorrelation function (ACF) and partial autocorrelation function (PACF) were used to study the stationary of the data and to determine the possible best fit models. The best fit model then has been determined by using the Akaike s Criterion (AIC) for all the parameters that is water temperature, ph, salinity and dissolved oxygen. The models were then checked for adequacy by analyzing the independence of the residuals. Table 1: Best fit model for all parameters (a) Parameters Type of Model MSE AIC Water Temperature ARIMA(1,1,1) 1.364 0.3422 ph ARIMA(2,1,2) 0.03571-3.2688 Salinity ARIMA(0,1,2) 189.2 5.2749 Dissolved Oxygen(DO) ARIMA(1,1,1) 19.20 2.9871 (b) 3
(c) (d) Figure 3: Observed versus ARIMA model predicted data for each water quality parameters; (a) temperature, (b) ph, (c) salinity, (d) DO A suitable model to predict water quality time series were built using ARIMA. As shown Figure 3 although ARIMA models vary with the range, the model predictions are not adequate. This is due to the limitation of the linear modelling algorithm in ARIMA model which resulting in unsatisfactory in identifying and predicting nonlinear time series of water quality data. ANN modelling approach A neural network was developed to predict optimal model to predict water quality time series whereby water temperature, ph, and salinity and dissolved oxygen were used as the input data. The target would be changed simultaneously accordingly. For an example, if we need to predict water temperature, water temperature is the target and the other parameters would be set as the input data. There are 3 partition in neural network model which comprises of training testing and validation. During training the inputs data will be selected in the network and it will customize itself based on the error contain in the model. Next step is testing which is independent.lastly, validation is used to measure the network ability to generalize and is the stopping criteria for training sample. The total data used in this study is 70% for training, 20% for testing and 10% for validation purposes (a) (b) (c) (d) Figure 3: Observed versus ANN model predicted data for each water quality parameters; (a) temperature, (b) ph, (c) salinity, (d) DO 4
These results indicate that the neural network that was developed are able to detect the pattern in water quality parameters to provide prediction of the daily variations data due to the predicted graph is almost similar to the observed graph. The Hybrid Modelling Approach The testing and validation period for all parameters based on hybrid models are shown in Figure 4 (a),(b), (c) and (d). The figures shows that predict data are closed with the observed data for all water quality parameters. The predicted data was able to identify the pattern of the input data to provide desired and valid predictions better than the ARIMA and neural network models. Hence, hybrid models are proven to give the most reliable prediction when compared to single models. (a) (b) (c) (d) Figure 4: Observed versus hybrid model predicted data for each water quality parameters; (a) temperature, (b) ph, (c) salinity, (d) DO Comparison of the Models Performance To evaluate the performance of the models developed, this study used two statistical performance evaluation criteria, mean absolute error (MAE) and root mean square error (RMSE) as in Eqs (9) and (10). The comparative performance of ARIMA, ANN and NNARIMA for all parameters are tabulated in Table 2. Table 2: Comparative of models performances using MAE and RMSE Table 3: MAE and RMSE Reduced Error Percentage for all parameters (%) Parameters MAE Reduced Error (%) RMSE Reduced Error (%) ARIMA- ANN ARIMA- HYBRID ARIMA-ANN ARIMA- HYBRID Temperature( o 22.74 64.01 10.52 74.67 C) ph 44.68 66.92 38.02 58.34 Salinity(ppt) 64.05 68.00 61.81 74.21 DO(ppm) 30.18 44.11 32.53 87.87 From the Table 2, we can see that the hybrid model of NNARIMA has the lowest MAE and RMSE for all the parameters which indicates that it has the highest accuracy compared to the single models of ANN and ARIMA. The RMSE s of water temperature, DO, ph and salinity are 1.185 C, 0.2134pH, 18.9102ppt and 4.2905ppm for ARIMA modelling approach. Table 3, we can see that the MAE 5
reduced error percentage decrease by 22.73%, 44.67%, 64.06% and 30.18% for water temperature, ph, salinity and DO when ANN models were used. Applying the hybrid models, the MAE reduced error percentage the values decrease by 53.42%, 40.18%, 10.89% and 19.94% in the MAE values when hybrid model is used for water temperature, ph, salinity and DO respectively. Comparatively, the RMSE reduced error percentage decrease by 10.52%, 38.03%, 61.81%, and 32.55% for water temperature, ph, salinity and DO when ANN models were used. As the hybrid models were used the RMSE reduced error percentage decreases by 71.68%, 32.73%, 19.76%, 82.02% for the parameters. Conclusion This study used ARIMA, neural network and hybrid NNARIMA models to predict the water quality time series. The hybrid model developed would be able to utilize the benefits of both the traditional methods and ANN. The result obtained shows that ANN model is more reliable and suitable to be hybrid with ARIMA model in predicting water quality time series. The hybrid model developed in this study can be much useful in water quality management efforts to ensure that water resource are sustainable for the coming years. In this study, two accuracy measures, RMSE and MAE were formulated in order to demonstrate the performance of the developed models in predicting water quality time series. The hybrid models performance were compared relatively with single models respectively ANN and ARIMA. The least values of MAE and RMSE gives an improved performance in the predicting of water quality time series. References Khashei, M. & Bijari, M. (2011). An artificial neural network (p,d,q) model for time series forecasting. Expert Systems with Applications. 37, 479-489 Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50, 159 175. Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, 14, 35 62 6