A data mining approach for medium-term demand forecasting G. S. ~erra"~, M. C. S. Lopes1 & N. F. F. ~becken' I NU, COPPE, Universidade Federal do Rio de Janeiro 2 CEFET-Campos, UNED-Macae' Abstract The Brazilian government, in the past, monopolized the generation, transmission and distribution of electric energy. Following world-wide tendencies, nowadays the government assumes the regulation function in a competitive and horizontal market. The variable load, fundamental in the planning of the electrical and energy operation, in the studies of magnifying and reinforcement of the basic network, assumes strategic importance in the commercial area, improving the data storage and knowledge extraction process using computational techniques. In the present work, data mining techniques are used to produce monthly load forecasts in intervals of high, medium and low consumption, according to the needs of electrical energy distribution companies. The results of neural network models, when compared to statistical results, show improved performance, presenting an Average Relative Error about 0.5% lower. 1 Introduction The load forecasting models are usually called long (years and decades ahead), short (days, hours or minutes ahead) and medium (weeks and months) term. The Brazilian companies, responsible for the generation and distribution of electric energy, negotiate contracts of electric energy that will be consumed in years, decades ahead. When these contracts finished, the difference between the contracted and consumed energy is sold out monthly in the Wholesale Market of Energy (MAE), at MAE Price, for each market (North, South, Southeast and Northeast) and for each interval (Low, Medium and High electrical consumption), in what is called the spot market.
438 Data Mining IV The MAE Price is calculated through computational simulations and its objective is the valuation of the purchase and sale of energy in the spot market, whose credits and debits will be sold between the Agents in a centralized manner by the MAE. This Brazilian reality emphasizes the economical importance of medium-term forecasting models. Considering a medium company that supplies a market of 10 million MWhIyear, MAE price around $ 10/MWh, a forecast model that reduces the average of the relative errors by 1% can help the company to take more adequate decisions, with values around $ 1.000.000,00 (one million dollar) per year. This fact, associated with the needs of the expansion plan for the generation and distribution system, justifies the great interest that this area receives from specialized literature and improves the data storage and knowledge extraction process using computational techniques [l]. In the last years, many articles have been published describing systems to forecast load demand [2, 3, 41. Short, medium and long-term forecasting have been predicted by traditional statistical models and especially data mining techniques like neural networks 161, genetic algorithms, neuro-fuzzy models 171, and others. In particular, the neural model presents some characteristics that make it attractive in the forecast area [5], such as: (a) it is a self-adaptive method directed by the data itself, where the knowledge is captured by the model through examples, in other words, learning by experience; (b) after the learning it presents generalization capacity; (c) it approaches any continuous function in the desired precision; (d) it is a non-linear model, thus much more generic. Considering the existing relationship between the different data related to the phenomenon, associated to new change in habits of consumption as a result of the rationalization imposed by the government, the model based on this technique offers great potential to make predictions of the demand as will be shown in this article. In the present work, a medium-term (one month ahead) forecasting model has been developed from the historical series of consumption of electric energy and temperature, based on computational intelligence techniques - neural nets and genetic algorithm - and statistical methods. The development of this work was made in several stages. The first step was the selection of the data. Then, the selected data were verified, for identification of errors and inconsistencies. The posterior phase corresponds to the analysis of the data. In this phase some studies on the data were carried through for a better understanding: behavior, seasonal effect, tendency, etc. In the following stage, the models were generated and tested. After having generated the models, the best models, according to the chosen metric of evaluation, were selected and made available in a procedure to forecast the future consumption of electric energy. 2 Data selection and preparation The historical series of the hourly consumption of electric energy (HCEE) are formed by registers corresponding to the concession region of the Rio de Janeiro
Data Mining IV 439 Electricity Company (CERJ). They are organized in columns, representing the 24 hours, and lines, corresponding to the days, as illustrated at figure 1. The first column of this figure shows how the electric sector classifies the days of the week motivated by the difference in the daily load profile. The last column shows the difference, defined by MAE, caused by the presence or not of summer time. Figure 1: Hourly consumption of electric energy (HCEE). All data were reviewed for inconsistencies that could be explained by: l. incorrect data storage; 2. failure of electric energy supply caused by technical problems; 3. changes in the daily consumption profile caused by infrequent events (ex: Brazilian games at The World Cup). The inconsistencies were corrected using information from a similar and close day, which showed a similar curve of HCEE. From the series of the HCEE and the amounts of Sundaysholidays, Saturdays and working days it is possible to construct the monthly historical series for each level. The concept of monthly equivalent load for each level cy) ( c ~ ~ ~ ~ is similar ~, to the ~ monthly ~ ~ mean, of the HCEE for each level [8]. The procedures used in this calculation contemplate the calendar effect - the existence of a different numbers of Sundaysholidays, Saturdays and working days in each month. The monthly variations of the c,"::~,,~, are defined by eqn (1). The ACE^^,,, presents two important aspects. The first one is related to the transformation of a growth series in an oscillation series making easier the creation of neural network forecasting models. The second one is related to the
440 Data Mining IV development of a statistical model, which is being used as a benchmark. This level model forecasts ACmon,,year by eqn (2), which averages all values that happened in previous years for one determined month. The daily maximum and minimum temperatures series were measured by a meteorological station located at Gale20 International Arport, which is also located in Rio de Janeiro. The monthly averages of daily maximum and minimum temperatures are calculated by eqn (3) and the respective monthly variations by eqn (4). MaxlMin - MaxlMin- MaxlMin Tmonth, year = q - - ~Tmonih.day, year n &,=l MaxlMin = Maxl Min - T:UX l ~ in)/~:axl in ATmonth, year (4) The influence of climatic factors over consumption of electric energy Max l Min becomes clear in figure 2. The ~ L z i and ~ Tmonth,year, ~ ~ ~ series ~ are normalized level Max l Min to equal 1.0 in the maximum Cmonth,year and 0.5 in the maximum Tmon,,y,r avoid overlapping of curves. to level Max l Min Figure 2: Cm,,,year and Tmon,, (jad1995-jud2003). The correlation between these two series (AC,"~:~,~~~~,AT is higher than 0.7 in the majority of months, allowing the construction of a linear regression model. This causal model is not a forecasting model, because it is necessary to know all the daily temperatures in a determined month to calculate level ACmonth,year, through the regression straight line. In section 5 the results of the
Data Mining IV 44 1 neural and statistical models will be presented with the results of this causal model as another benchmark. A more detailed study about this relation can be found in [g]. The influence of the energy rationing, imposed by the federal government, on the sector in the period may-2001 to feb-2002, is also contemplated in this reference 3 Data mining technique (ANN and GA) Artificial Neural Networks (ANN) (as compared to real ones) are mathematical systems that are comprised of a number of "processing units" that are linked via weighted interconnections. A processing unit is essentially an equation, which is often referred to as a "transfer function". A processing unit takes weighted signals from other neurons, possibly combines them, transforms them and outputs a numeric result. Processing units are often considered crudely analogous to real neurons and since they are linked together in a mesh or network, the name Neural Networks was coined. Many neural networks have their neurons structured in "layers" that have similar characteristics and execute their transfer functions in synchronization (at the same time, virtually speaking). Nearly all neural networks have neurons that accept data and neurons that produce outputs. The behavior of neural networks, how they map input data to output data, is influenced primarily by the transfer functions of neurons, how they are interconnected and the weights of those interconnections. Typically, an architecture or structure of a neural network is established and one of a variety of mathematical algorithms is used to determine what the weights of the interconnections should be to maximize the accuracy of the outputs produced. Neural networks are "trained", meaning they use previous examples to establish (learn) the relationships between the input variables and the predicted variables be setting these weights. Once these relationships are established (the neural network is trained), the neural can be presented with new input variables and it will generate predictions. The applications use Genetic Algorithms (GA) to evolve neural network structures while simultaneously searching for significant input variables to maximize the predictive accuracy of the resulting neural network models. The effectiveness of the genetic algorithms capabilities cannot be overstated. For example, a problem consisting of finding the best combination (subset) of 20 inputs and up to 15 hidden nodes in a back propagation neural network is a combinatorial problem with over 16 million permutations. To train a network in a full search of all permutation would be a good project for a super computer. But with genetic algorithms, an excellent solution often appears in less than 1500 evaluations, which is 0.009% of the total possible configurations. Using some statistical data analysis to assist, highly fit networks are often found in the first 30 to 50 neural networks evaluated. This is clearly an efficient means for discovering effective network structurelinput combinations. Several tests were performed and some results are discussed in section 5.
442 Data Mining IV 4 Model characteristics During the development phase of neural forecasting models, combinations of both different architectures of neural nets and input data were tested with different metric errors. Due to the imposition of the rationing in year 2001 the dataset was segmented, as show in table 1. Table 1 : Segmentation of dataset. The input data used by forecasting neural models are classified in 3 groups, as show in table 2. Table 2: Classification of input data. lee& 8 %$;&:A,::. : &J%Y&Y 1 Efecijve Values Temwrpture (T), Equivalent Co~sumption (C) 2 Monthly Varlation AT, AC 3 Time month [l..iz), slne, coslne - - "-- %& I The first two categories of data were described in section 2. The third category represents the time by two cyclical forms: the month and the pair sine and cosine. This last representation presents an interesting characteristic: the Euclidean distances between nearby months are identical. Figure 3 detaches the equality between the distances, considering as example the pairs of months JanuaryFebruary and DecemberIJanuary. The time representation varying from 1 to 12 does not present this characteristic. May Jun Jul + Sep Apr I oct Mar Feb Jan Dec NO" Figure 3: Time representation. Five different neural models had been selected for each level totaling 15 models. This set allows, for each level, the combination of the forecasting values.
Data Mining IV 443 Table 3: Neural, statistical and cause-effect results performance model. 5 Results Table 3 compares results performance between neural, statistical and causeeffect models for each level, considering a global and monthly analysis. In the first one, the relative average error covers all 42 months (or 39, when the months influenced by the rationing are excluded) and in the second one, the calculation is made month by month. The results of the cause-effect model, which is based M& l Min on linear regression between the series AT rnonth,year and AC~:;;,,,, when MaxlMin corre~,~~, (AT,,,,, year, AC,":;,,~~~,.) is higher than 0.7, are shown in the last column of the table, for each level. As table 3 shows, some points, for each level, can be explored: low: the global neural models results (2.5%) are 0.3% better then statistical results (2.8%). Considering the analyses in the validation interval, this difference (0.3%) grows to 0.6%. The AC::;~~,~,, is more dispersive in the validation interval. January, March, April, July, September, November and December are months in which neural models made better predictions than statistical models.
444 Data Mining IV medium: neural models made better predictions than statistical models if the analyze is compared in the validation interval (0.4%). May and August are months in which statistical models made better predictions than neural ones. high: 6 Conclusions The AC,","~;,,,, is less dispersive in this level than in the others (low and medium level) and this is the main reason why the expected error is smaller. Neural models made better predictions than statistical models if the analysis is compared in the validation interval (0.3%). February, March, July, September, October, November and December are months in which neural models made better predictions than statistical models but in August this doesn't occur. The results in other months are equivalent. The neural network models used provided satisfactory results and, as commented in Roitman [9], showed superior performance than statistical models, which provided accurate forecasting, mainly in cases where there was not much input combination of historical series. During the development of the neural models some details deserved attention: the Time Delay Neural Network presents better results in the end of the process of optimization, directed by the genetic algorithm. After some tests with nets with more than one hidden layer, it was found out that only one hidden layer was necessary, with the maximum number of neurons inferior to 32; the optimization processes that presented best generalization capacity used the latest 12 months (January to December of 1999) as test interval; the fitness function used by the genetic algorithm, that presented the best performance, was the one that combined the accuracy of the forecasts measured by the Average Absolute Error with a measure of influence of the numbers of neurons in the entrance and hidden layers the validation interval used to validate the results of the neural models presented a higher dispersion of the monthly equivalent load variations for each platform higher than the one in the training interval. Each neural model can be understood as an expert in the load forecasting one month ahead. The Knowledge in each model, captured during the network training phase is different and is represented in their respective architectures. Once the models for each level presented a similar average absolute error it was
Data Mining IV 445 possible to combine their forecasting, generating a better one. That procedure has been used very well by CERJ. Another point is that the dependences between variables selected by forecasting models can change through time, which demands continuous tuning. The Knowledge extraction, as proposed by Hruschka and Ebecken [10, 1 l], can be used to make clear the hidden knowledge that exists in a neural network model. Considering the spot market needs, weekly forecasting models become relevant. In this way, the combination of both techniques, neural network and genetic algorithms, could be done with other new neural networks, like RBF as proposed in Barreto [12]. Future work focusing on the improvement of the forecasting models could be done using fuzzy logic and neuro-fuzzy techniques. References Chen,G. J., Li, K. K., Chung, T. S., Sun, H. B. &Tang, G. Q., Application of an innovative combined forecasting method in power system load forecasting, Electric Power Systems Research, 59; pp. l3 1-137, 2001. Hippert, H. S., Pedreira, C. E. & Souza, R. C., Neural Net-works for short-term load forecasting: A Review and Evaluation, In IEEE Transactions of Power Systems, 16(1), pp. 44-55,2001. Gavrilas, M., Ciutea, I. & Tanasa, C., Medium-term load forecasting with artificial neural network models, Proc. of the 16'~ International Conference and Exhibition on Electricity Distribution, Part 1: Contributions. CIRED. (IEE Conf. Pub1 No. 482), 6, pp. 167-171,2001. Chen, G. L., Li, K. K., Chung, T. S., Sun, H. B. & Tang, G. Q., Application of an innovative combined forecasting method in power system load forecasting, In Electric Power Systems Research, 59, pp. 131-137,2001. Zhang, G., Patuwo, B. E. & Hu, M. Y., Forecasting with artificial neural networks: The state of the art, International Journal of Forecasting, 14, pp. 35-62, 1998. Al-Saba, T., El-Amin, I., Artificial Neural Networks as applied to longterm demand forecasting, In Arti$cial Intelligence in Engineering, 13, pp. 189-197, 1999. Padmakumari, K., Mohandas, K. P. & Thiruvengadam, S., Long Term distribution demand forecasting using neuro fuzzy computations, In Electrical Power and Energy Systems, 21, pp. 315-322, 1999. Terra, G. S., Medium-term demand forecasting by a data mining approach, D. Sc. Thesis, COPPEIUFRJ, Rio de Janeiro, RJ, Brasil., 2003. Roitman, V. L., A computational model of neural networks for prediction of open unemployment rate, D. Sc. Thesis, COPPEIUFRJ, Rio de Janeiro, RJ, Brasil., 2001.
446 Data Mining IV [l01 Hruschka, E.R. and Ebecken, N.F.F, Rule Extraction from Neural Networks: Modified RX Algorithm, Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNWg), Washington DC, USA, July, 1999. [l11 Ebecken, N.F.F and Hruschka, E. R., Rules from Supervised Neural Networks in Data Mining, Frontier in Artijicial Intelligence and Applications, v. 71, In: Logic Artificial Intelligence and Robotics, J. M. Abe and J. I. Silva Filho, pp. 84-100, IOS Press, 2001. [l21 Barreto, A. M., Genetic orthogonal least squares algorithm for RBF networks training,. M. Sc. Thesis, COPPEAJFRJ, Rio de Janeiro, RJ, Brasil, 2003.