Short-term water demand forecast based on deep neural network ABSTRACT

Size: px

Start display at page:

Download "Short-term water demand forecast based on deep neural network ABSTRACT"

Elizabeth McKenzie
5 years ago
Views:

1 Short-term water demand forecast based on deep neural network Guancheng Guo 1, Shuming Liu 2 1,2 School of Environment, Tsinghua University, , Beijing, China 2 shumingliu@tsinghua.edu.cn ABSTRACT Short-time water demand forecasting is essential for optimal control in water distribution system (WDS). The current methods (e.g., conventional artificial neural network) have limited power in practice due to the nonlinear nature of changes in water demand. In particular, 15-min time step forecasting may not be accurate when using conventional models. To tackle this problem, this paper investigates the potential of deep learning in short-term water demand forecasting, developing a gated recurrent unit network (GRUN) model to forecast water demand 15 minutes into the future and 24 hours into the future with a 15-min time step. The performance of GRUN was compared with a conventional artificial neural network (ANN) model. The results show that the deep neural network model like GRUN outperforms the ANN model for both 15 minute and 24 hour forecasts. These findings can provide more flexible and effective solutions for water demand forecasting. Keywords: demand forecast; artificial neural network; gated recurrent unit network; 1. Introduction Urban water demand forecasting, whether for the design, planning, operation or management of water distribution system (WDS), is essential for water utilities all over the world. Water demand forecasting as the basis of optimal scheduling plays an important role in the optimal operation of WDS. For instance, it helps water companies to make decisions about water allocation, water production, pricing policies, water use restrictions, pump station operation, and pipe network capacity [1]. However, forecasting water demand is a challenging task. Water demand at a given time in the future is usually related to past water demand, current operating conditions, and socioeconomic and meteorological factors such as relative humidity, air temperature, rainfall and pressure [2]. In the context of water demand forecasting, a wide variety of methods have been proposed which can be broadly classified into traditional methods and learning algorithms. Early works used traditional statistical models such as linear regression models and time series models [3, 4] to solve this question. However, change in water demand is nonlinear and may not be accurately predicted by linear methods. Learning algorithms belong to the group of nonlinear methods. The use of advanced data analysis, such as machine learning, enables learning algorithm models to achieve high accuracy. Support Vector Machine (SVM) is a popular method of machine learning which has been commonly used in water demand forecast [5]. Artificial Neural Network (ANN) and Extreme Learning Machine (ELM) [6] are all machine learning techniques that have been used to forecast water demand. In recent years, one remarkable and promising example of a learning algorithm is deep learning. It has produced state-of-the-art results in many fields such as sentiment classification and face recognition [7]. So far, deep learning methods are seldom used to forecast the water demand. This paper aims to 1) investigate the potential of deep learning methods in water demand forecasting; 2) compare the prediction performance with a conventional ANN model. To achieve these objectives, we developed a novel deep neural network, i.e., the gated recurrent unit network (GRUN) model and a conventional ANN model. The methodology is described in Section 2. In Section 3, using practical data from a district metering area (DMA) to evaluate the performance of these models.

2 2. Methodology 2.1 Research flowchart Figure 1 presents the research outline. Firstly, historical water demand data from a DMA in Changzhou city were collected and the features were extracted as the model inputs. In order to simulate a real-life situation, we use two predictive approaches: the first is a 15 minute prediction and the second is a 24 hour prediction with 15-min time steps. For 24 hour forecasts, the output of the previous moment will be used as one of the model inputs for the next moment until 96 values are forecasted. Adopting the same model to predict each value increases the efficiency of the prediction. Model performance was evaluated on the basis of prediction accuracy and model stability. Figure 1. Research outline. 2.2 GRUN model Recurrent neural networks (RNN) have shown promising results in some machine learning tasks, especially the recently introduced gated recurrent unit (GRU), which was proposed by Cho, et al. [8]. This has been successfully applied to deal with long sequences and been shown to have high processing efficiency [9]. GRU has a strong ability to deal with nonlinear data, especially for sequence processing. The use of memory modules rather than ordinary hidden units ensures that the gradient does not vanish or explode after a large number of iterations, which overcomes the difficulties encountered in traditional RNN training. This paper uses GRU as the core to build the GRUN model. Figure 2a illustrates the structure of the GRU. The process of calculating GRU can be briefly described by Eqs (1)-(4) [9]: r t =ReLU(W rx x t +W rh h t-1 +b r ) (1) z t =ReLU(W xz x t +W hz h t-1 +b z ) (2) H t =tanh(w xh x t +W H (r t h t-1 )+b H ) (3) h t =z t h t-1 +(1-z t ) H t (4) where r t and z t are the reset and update gates, and is an element multiplication. The tanh activation function ensures the output values are between -1 and 1. The ReLU activation function is f(x)=x for x>0, f(x)=0 otherwise. xt is the input, h t is the output, H t is the candidate output. W rx,w rh,w xz,w hz,w xh,w H are the related weight matrices. b r,b z,b H are the related biases.

3 Figure 2b presents the structure of the GRUN model. It is a deep neural network framework which has multiple processing layers to learn representations of data with multiple levels of abstractions. There are three GRU layers that represent water demand at different periods of time and these make up the first part of the network. The time axis can be divided into three fragments, denoting recent time, near time and distant time. The three GRU layers correspond to the aforementioned three fragments which can produce a memory state for the past water demand and establish dependencies between water demands of different periods of time. Then, the three GRU layers are connected to a merge layer (i.e., a layer that concatenates a list of inputs) by the tensors as the second part of the network. The merge layer integrates water demand information, assigns different weights according to the importance of water demand for different periods of time, and further deepens these relationships. The third part of the network consists of many layers (i.e., just regular fully-connected layer). These dense layers can further enhance the ability of the model to deal with nonlinear data. The fourth part is the output layer that directly outputs the prediction values. Figure 2. (a) Illustration of GRU; (b) Structure of gated GRUN model. 2.3 ANN model Over the past 20 years, artificial neural networks have been increasingly used in water demand prediction. Many studies use ANN models to predict hourly or daily water demand [10]. The most common ANN network is the feed-forward neural network with back-propagation learning algorithm. The ANN model used in this paper has three dense layers (i.e., input layer, hidden layer, output layer), which makes it a conventional neural network. We put the water demand for different periods of time into the ANN model and make some simple nonlinear transformations until the results are satisfactory. 3. Case study 3.1 Data description The data to be analyzed in this study is collected from Changzhou city. The DMA is in the northeast of Changzhou and has a population of about It is mainly the residential water and also contains

4 some commercial areas. The data ranges from February 1, 2016, to January 31, 2017, with a sampling interval of 15 minutes. The maximum value is 157 m 3 and the mean value is 82 m 3. The total dataset contains observations. 3.2 Feature extraction In order to better explore the characteristics of water demand time series, the timeline is divided into three fragments: recent time, near time and distant time. The first fragment is recent time. To model recent temporal dependence, we select i 15-min time steps of water demand data that are close to time t of today (Q t ). Let [Q t-1,q t-2,,q t-i ] be this recent dependent sequence, where t is predicted time, and i can be selected from between 1 and 12. The second fragment is near time. To model near temporal dependence, we select j 15-min time steps of water demand data that are close to time t of previous day (Q t-96 ). Let [Q t-96+j,,q t-96,,q t-96-j ] be this near dependent sequence, where t is predicted time, and j can be selected from between 0 and 6. The third fragment is distant time. To model distant temporal dependence, we select k 15-min time steps of water demand data that are close to time t of the day before yesterday (Q t-192 ). Let [Q t-192+k,,q t-192,,q t-192-k ] be this distant dependent sequence, where t is predicted time, and k can be selected from between 0 and 6. Parameter grid search is used to obtain the optimal value of i, j and k according to the minimum of mean square error on validation data. In this case, the value of i is 5, the value of j and k is 2. Both the ANN and GRUN models adopt the same features as model inputs, as shown in Table 1. Table 1. Features of model inputs. Model ANN model Feature extraction Qt-1, Qt-2, Qt-3, Qt-4, Qt-5,Qt-94, Qt-95, Qt-96, Qt-97, Qt-98, Qt-190, Qt-191, Qt-192, Qt-193, Qt-194 Recent sequence (GRU layer) Near sequence (GRU layer) Distant sequence (GRU layer) GRUN model Qt-1 Qt-2 Qt-94 Qt-95 Qt-190 Qt-191 Qt-3 Qt-96 Qt-192 Qt-4 Qt-97 Qt-193 Qt-5 Qt-98 Qt Development of prediction models The dataset is divided into training data (22500 samples), validation data (2500 samples) and testing data (2500 samples). Note that the training data and validation data are randomly selected. This paper used the mini-batch gradient decent method to train the model. The method divides the training data into several batches and updates the parameters by batches, so that the randomness of gradient descent in the training process is reduced. The validation data is used to select model parameters and earlystop our training algorithm for each model based on minimum validation loss. This efficiently avoids model over-fitting or under-fitting. At the beginning of every training epoch, the training/validation data will be shuffled. At the end of each training epochs, the state of the model is evaluated through the loss curve of training/validation data. The model parameters of each algorithm are optimized through parameter grid search, the results were shown in Table 2.

5 Table 2. Parameter optimization results. Parameter ANN model GRUN model dense layer GRU layer dense layer Number of layers Number of nodes 32,8,1 48,32,32 64,32,16,8,4,2,1 Learning rate Activation tanh, ReLU, Linear tanh, ReLU ReLU, Linear Optimizer Adam Adam Number of epochs Results Table 3 summarizes the forecasting performance obtained by applying the models to testing data. The results shown in Table 3 lead to the following observations. Firstly, for the 15 minute prediction, the best forecasting performances were obtained by using the GRUN model. It has a high prediction accuracy (e.g., MAPE value is 2.02%). Secondly, for the 24 hour prediction, the results suggest that more accurate forecasts can be obtained by using the GRUN model (e.g., MAPE value is 4.79%). So the GRUN model has a better performance than ANN model for both 15 minute and 24 hour predictions. Table 3. Performance indicators of prediction models Model Mean Absolute Error Root Mean Square Error Nash-Sutcliffe Model Efficiency Mean Absolute Percentage Error (%) ANN (15 minute) GRUN (15 minute) ANN (24 hour) GRUN (24 hour) A more significant evaluation of the performance of models is given in Figures 3 and 4. For the 15 minute prediction, Figure 3(a) shows the histogram of the relative error for ANN model. It indicates that 95% of the forecasted relative errors fall within the range of ±6.65%. Figure 3(b) shows the histogram of the relative error for GRUN model. It indicates that 95% of the forecasted relative errors fall within the range of ±5.48%. From the Figure 3(b), it seems that the relative errors are smaller than for the ANN model. As for the 24 hour prediction, Figure 4(a) shows the histogram of the relative error for ANN model. It indicates that 95% of the forecasted relative errors fall within the range of ±20.17%. Figure 4(b) shows the histogram of the relative error for GRUN model. It indicates that 95% of the forecasted relative errors fall within the range of ±12.65%. These results show that the bias of relative errors for the GRUN model is lower than for the ANN model, which implies that the GRUN model is more stable for both 15 minute and 24 hour predictions. The performance differences can be explained by methods itself since the GRUN model has eleven network layers to make complex nonlinear transformations of water demand data which can achieve a high prediction accuracy. And the GRU is the key to the GRUN model, it has a memory function that retains information on past water demand and establishes inter-dependencies in water demand

for different periods of time. By contrast, the ANN model only has three dense layers to make simple transformations which will result in less reliable forecasts. Figure 3.

6 for different periods of time. By contrast, the ANN model only has three dense layers to make simple transformations which will result in less reliable forecasts. Figure 3. Relative errors for 15 minute prediction. Figure 4. Relative errors for 24 hour prediction. In addition, the computation load is evaluated by two indicators, the first is Akaike information criterion (AIC) and the second is computation time spent in model development. As summarized in Table 4, the GRUN model has a much larger AIC than the ANN model. This indicates that the GRUN model has a more complex structure so it has a higher computational load. The computation time of the ANN model is faster than the GRUN model. From the perspective of model complexity, the ANN model indeed has some advantages, but its prediction accuracy and model stability are not as good as GRUN model. Table 4. Computation load on training data. Model AIC Time (s) ANN GRUN Conclusions and future work This study investigates the potential of deep learning in short-term water demand prediction. We developed the GRUN model to forecast water demand for 15 minutes and 24 hours into the future and compared it with the conventional ANN model. The conclusions of this work and suggestions for future work are listed below:

7 1. The deep learning-based method proposed in this study can achieve accurate and reliable water demand prediction for 15 minutes and 24 hours. The GRUN model predicts more accurately and has lower bias of relative errors than the conventional ANN model. 2. The ANN model has a lower computation load, but does not predict as accurately and is less stable than the GRUN model. 3. Future work should involve further testing for the proposed models on large amounts of real-time monitoring data in different DMAs. Meanwhile, investigating other factors that can affect shortterm water demand forecasts is necessary. 5. References [1] M. Herrera, L. Torgo, J. Izquierdo, and R. Pérez-García, "Predictive models for forecasting hourly urban water demand," Journal of Hydrology, vol. 387, no. 1-2, pp , [2] E. A. Donkor, T. A. Mazzuchi, R. Soyer, and J. A. Roberson, "Urban Water Demand Forecasting: Review of Methods and Models," (in English), Journal of Water Resources Planning and Management, vol. 140, no. 2, pp , Feb [3] J. Bougadis, K. Adamowski, and R. Diduch, "Short-term municipal water demand forecasting," Hydrological Processes, vol. 19, no. 1, pp , [4] J. S. Wong, Q. Zhang, and Y. D. Chen, "Statistical modeling of daily urban water consumption in Hong Kong: Trend, changing patterns, and forecast," Water Resources Research, vol. 46, no. 3, [5] B. M. Brentan, E. Luvizotto Jr, M. Herrera, J. Izquierdo, and R. Pérez-García, "Hybrid regression model for near real-time urban water demand forecasting," Journal of Computational and Applied Mathematics, vol. 309, pp , [6] S. Mouatadid and J. Adamowski, "Using extreme learning machines for short-term urban water demand forecasting," Urban Water Journal, vol. 14, no. 6, pp , [7] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp , May [8] K. Cho, B. Van Merrienboer, D. Bahdanau, and Y. Bengio, "On the Properties of Neural Machine Translation: Encoder-Decoder Approaches," Computer Science, [9] J. Chung, C. Gulcehre, K. H. Cho, and Y. Bengio, "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling," arxiv preprint arxiv: , [10] M. Romano and Z. Kapelan, "Adaptive water demand forecasting for near real-time management of smart water distribution systems," Environmental Modelling & Software, vol. 60, pp , 2014.

EE-559 Deep learning LSTM and GRU

EE-559 Deep learning LSTM and GRU EE-559 Deep learning 11.2. LSTM and GRU François Fleuret https://fleuret.org/ee559/ Mon Feb 18 13:33:24 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The Long-Short Term Memory unit (LSTM) by Hochreiter