A data mining approach for medium-term demand forecasting

Similar documents
Day Ahead Hourly Load and Price Forecast in ISO New England Market using ANN

Multivariate Regression Model Results

Determine the trend for time series data

2018 Annual Review of Availability Assessment Hours

Load Forecasting Using Artificial Neural Networks and Support Vector Regression

Research Article Weather Forecasting Using Sliding Window Algorithm

Table 01A. End of Period End of Period End of Period Period Average Period Average Period Average

Chapter 3. Regression-Based Models for Developing Commercial Demand Characteristics Investigation

NASA Products to Enhance Energy Utility Load Forecasting

Short Term Load Forecasting Based Artificial Neural Network

WEATHER DEPENENT ELECTRICITY MARKET FORECASTING WITH NEURAL NETWORKS, WAVELET AND DATA MINING TECHNIQUES. Z.Y. Dong X. Li Z. Xu K. L.

ANN and Statistical Theory Based Forecasting and Analysis of Power System Variables

Chapter 8 - Forecasting

peak half-hourly Tasmania

peak half-hourly New South Wales

Integrated Electricity Demand and Price Forecasting

STATISTICAL FORECASTING and SEASONALITY (M. E. Ippolito; )

Artificial Neural Network for Energy Demand Forecast

SYSTEM BRIEF DAILY SUMMARY

LONG - TERM INDUSTRIAL LOAD FORECASTING AND PLANNING USING NEURAL NETWORKS TECHNIQUE AND FUZZY INFERENCE METHOD ABSTRACT

Short Term Load Forecasting Using Multi Layer Perceptron

Changing Hydrology under a Changing Climate for a Coastal Plain Watershed

Short-term wind forecasting using artificial neural networks (ANNs)

Drought in Southeast Colorado

SYSTEM BRIEF DAILY SUMMARY

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION

DAILY QUESTIONS 28 TH JUNE 18 REASONING - CALENDAR

Time Series and Forecasting

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption

NSP Electric - Minnesota Annual Report Peak Demand and Annual Electric Consumption Forecast

Industrial Engineering Prof. Inderdeep Singh Department of Mechanical & Industrial Engineering Indian Institute of Technology, Roorkee

EVALUATION OF ALGORITHM PERFORMANCE 2012/13 GAS YEAR SCALING FACTOR AND WEATHER CORRECTION FACTOR

Predicting the Electricity Demand Response via Data-driven Inverse Optimization

The Dayton Power and Light Company Load Profiling Methodology Revised 7/1/2017

Forecasting. Copyright 2015 Pearson Education, Inc.

PRELIMINARY DRAFT FOR DISCUSSION PURPOSES

YACT (Yet Another Climate Tool)? The SPI Explorer

Time Series Analysis

A Hybrid Model of Wavelet and Neural Network for Short Term Load Forecasting

CAISO Participating Intermittent Resource Program for Wind Generation

FORECASTING OF ECONOMIC QUANTITIES USING FUZZY AUTOREGRESSIVE MODEL AND FUZZY NEURAL NETWORK

An Improved Method of Power System Short Term Load Forecasting Based on Neural Network

Short Term Load Forecasting Of Chhattisgarh Grid Using Artificial Neural Network

2019 Settlement Calendar for ASX Cash Market Products. ASX Settlement

Introduction to Forecasting

LONG TERM LOAD FORECASTING OF POWER SYSTEMS USING ARTIFICIAL NEURAL NETWORK AND ANFIS

Wheat Outlook July 24, 2017 Volume 26, Number 44

peak half-hourly South Australia

Application of Artificial Neural Network for Short Term Load Forecasting

WHEN IS IT EVER GOING TO RAIN? Table of Average Annual Rainfall and Rainfall For Selected Arizona Cities

A Report on a Statistical Model to Forecast Seasonal Inflows to Cowichan Lake

Winter Season Resource Adequacy Analysis Status Report

Interannual variation of MODIS NDVI in Lake Taihu and its relation to climate in submerged macrophyte region

= observed volume on day l for bin j = base volume in jth bin, and = residual error, assumed independent with mean zero.

Sluggish Economy Puts Pinch on Manufacturing Technology Orders

ENGINE SERIAL NUMBERS

Four Basic Steps for Creating an Effective Demand Forecasting Process

Abram Gross Yafeng Peng Jedidiah Shirey

Total Market Demand Wed Jan 02 Thu Jan 03 Fri Jan 04 Sat Jan 05 Sun Jan 06 Mon Jan 07 Tue Jan 08

Influence of knn-based Load Forecasting Errors on Optimal Energy Production

FORECASTING COARSE RICE PRICES IN BANGLADESH

Published by ASX Settlement Pty Limited A.B.N Settlement Calendar for ASX Cash Market Products

Journal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article

MODELLING ENERGY DEMAND FORECASTING USING NEURAL NETWORKS WITH UNIVARIATE TIME SERIES

2017 Settlement Calendar for ASX Cash Market Products ASX SETTLEMENT

Jackson County 2013 Weather Data

PREPARED DIRECT TESTIMONY OF GREGORY TEPLOW SOUTHERN CALIFORNIA GAS COMPANY AND SAN DIEGO GAS & ELECTRIC COMPANY

Aalborg Universitet. CLIMA proceedings of the 12th REHVA World Congress Heiselberg, Per Kvols. Publication date: 2016

DROUGHT IN MAINLAND PORTUGAL

One-Hour-Ahead Load Forecasting Using Neural Network

Lecture Prepared By: Mohammad Kamrul Arefin Lecturer, School of Business, North South University

Statement of indicative wholesale water charges and charges scheme

A Fuzzy Logic Based Short Term Load Forecast for the Holidays

Univariate versus Multivariate Models for Short-term Electricity Load Forecasting

Monthly Magnetic Bulletin

Time Series and Forecasting

SOLAR POWER FORECASTING BASED ON NUMERICAL WEATHER PREDICTION, SATELLITE DATA, AND POWER MEASUREMENTS

JOURNAL OF INTERNATIONAL ACADEMIC RESEARCH FOR MULTIDISCIPLINARY Impact Factor 1.393, ISSN: , Volume 2, Issue 4, May 2014

STATISTICAL LOAD MODELING

Outage Coordination and Business Practices

SMART GRID FORECASTING

March 5, British Columbia Utilities Commission 6 th Floor, 900 Howe Street Vancouver, BC V6Z 2N3

2003 Water Year Wrap-Up and Look Ahead

AN APPROACH TO FIND THE TRANSITION PROBABILITIES IN MARKOV CHAIN FOR EARLY PREDICTION OF SOFTWARE RELIABILITY

Economics 390 Economic Forecasting

CIMA Professional

CIMA Professional

Responsive Traffic Management Through Short-Term Weather and Collision Prediction

USE OF FUZZY LOGIC TO INVESTIGATE WEATHER PARAMETER IMPACT ON ELECTRICAL LOAD BASED ON SHORT TERM FORECASTING

Stream-Based Electricity Load Forecast

MONTHLY RESERVOIR INFLOW FORECASTING IN THAILAND: A COMPARISON OF ANN-BASED AND HISTORICAL ANALOUGE-BASED METHODS

Nonparametric forecasting of the French load curve

Short Term Load Forecasting for Bakhtar Region Electric Co. Using Multi Layer Perceptron and Fuzzy Inference systems

THE HISTORICAL BASIS RECORD FOR GRAIN AND SOYBEANS IN DELAWARE; MARKETING YEARS 1996/97 to 2000/01. Philip L. Towle Carl L. German U. C.

The World Bank Haiti Business Development and Investment Project (P123974)

Markovian Models for Electrical Load Prediction in Smart Buildings

ANN based techniques for prediction of wind speed of 67 sites of India

A SEASONAL FUZZY TIME SERIES FORECASTING METHOD BASED ON GUSTAFSON-KESSEL FUZZY CLUSTERING *

DESIGN AND DEVELOPMENT OF ARTIFICIAL INTELLIGENCE SYSTEM FOR WEATHER FORECASTING USING SOFT COMPUTING TECHNIQUES

Transcription:

A data mining approach for medium-term demand forecasting G. S. ~erra"~, M. C. S. Lopes1 & N. F. F. ~becken' I NU, COPPE, Universidade Federal do Rio de Janeiro 2 CEFET-Campos, UNED-Macae' Abstract The Brazilian government, in the past, monopolized the generation, transmission and distribution of electric energy. Following world-wide tendencies, nowadays the government assumes the regulation function in a competitive and horizontal market. The variable load, fundamental in the planning of the electrical and energy operation, in the studies of magnifying and reinforcement of the basic network, assumes strategic importance in the commercial area, improving the data storage and knowledge extraction process using computational techniques. In the present work, data mining techniques are used to produce monthly load forecasts in intervals of high, medium and low consumption, according to the needs of electrical energy distribution companies. The results of neural network models, when compared to statistical results, show improved performance, presenting an Average Relative Error about 0.5% lower. 1 Introduction The load forecasting models are usually called long (years and decades ahead), short (days, hours or minutes ahead) and medium (weeks and months) term. The Brazilian companies, responsible for the generation and distribution of electric energy, negotiate contracts of electric energy that will be consumed in years, decades ahead. When these contracts finished, the difference between the contracted and consumed energy is sold out monthly in the Wholesale Market of Energy (MAE), at MAE Price, for each market (North, South, Southeast and Northeast) and for each interval (Low, Medium and High electrical consumption), in what is called the spot market.

438 Data Mining IV The MAE Price is calculated through computational simulations and its objective is the valuation of the purchase and sale of energy in the spot market, whose credits and debits will be sold between the Agents in a centralized manner by the MAE. This Brazilian reality emphasizes the economical importance of medium-term forecasting models. Considering a medium company that supplies a market of 10 million MWhIyear, MAE price around $ 10/MWh, a forecast model that reduces the average of the relative errors by 1% can help the company to take more adequate decisions, with values around $ 1.000.000,00 (one million dollar) per year. This fact, associated with the needs of the expansion plan for the generation and distribution system, justifies the great interest that this area receives from specialized literature and improves the data storage and knowledge extraction process using computational techniques [l]. In the last years, many articles have been published describing systems to forecast load demand [2, 3, 41. Short, medium and long-term forecasting have been predicted by traditional statistical models and especially data mining techniques like neural networks 161, genetic algorithms, neuro-fuzzy models 171, and others. In particular, the neural model presents some characteristics that make it attractive in the forecast area [5], such as: (a) it is a self-adaptive method directed by the data itself, where the knowledge is captured by the model through examples, in other words, learning by experience; (b) after the learning it presents generalization capacity; (c) it approaches any continuous function in the desired precision; (d) it is a non-linear model, thus much more generic. Considering the existing relationship between the different data related to the phenomenon, associated to new change in habits of consumption as a result of the rationalization imposed by the government, the model based on this technique offers great potential to make predictions of the demand as will be shown in this article. In the present work, a medium-term (one month ahead) forecasting model has been developed from the historical series of consumption of electric energy and temperature, based on computational intelligence techniques - neural nets and genetic algorithm - and statistical methods. The development of this work was made in several stages. The first step was the selection of the data. Then, the selected data were verified, for identification of errors and inconsistencies. The posterior phase corresponds to the analysis of the data. In this phase some studies on the data were carried through for a better understanding: behavior, seasonal effect, tendency, etc. In the following stage, the models were generated and tested. After having generated the models, the best models, according to the chosen metric of evaluation, were selected and made available in a procedure to forecast the future consumption of electric energy. 2 Data selection and preparation The historical series of the hourly consumption of electric energy (HCEE) are formed by registers corresponding to the concession region of the Rio de Janeiro

Data Mining IV 439 Electricity Company (CERJ). They are organized in columns, representing the 24 hours, and lines, corresponding to the days, as illustrated at figure 1. The first column of this figure shows how the electric sector classifies the days of the week motivated by the difference in the daily load profile. The last column shows the difference, defined by MAE, caused by the presence or not of summer time. Figure 1: Hourly consumption of electric energy (HCEE). All data were reviewed for inconsistencies that could be explained by: l. incorrect data storage; 2. failure of electric energy supply caused by technical problems; 3. changes in the daily consumption profile caused by infrequent events (ex: Brazilian games at The World Cup). The inconsistencies were corrected using information from a similar and close day, which showed a similar curve of HCEE. From the series of the HCEE and the amounts of Sundaysholidays, Saturdays and working days it is possible to construct the monthly historical series for each level. The concept of monthly equivalent load for each level cy) ( c ~ ~ ~ ~ is similar ~, to the ~ monthly ~ ~ mean, of the HCEE for each level [8]. The procedures used in this calculation contemplate the calendar effect - the existence of a different numbers of Sundaysholidays, Saturdays and working days in each month. The monthly variations of the c,"::~,,~, are defined by eqn (1). The ACE^^,,, presents two important aspects. The first one is related to the transformation of a growth series in an oscillation series making easier the creation of neural network forecasting models. The second one is related to the

440 Data Mining IV development of a statistical model, which is being used as a benchmark. This level model forecasts ACmon,,year by eqn (2), which averages all values that happened in previous years for one determined month. The daily maximum and minimum temperatures series were measured by a meteorological station located at Gale20 International Arport, which is also located in Rio de Janeiro. The monthly averages of daily maximum and minimum temperatures are calculated by eqn (3) and the respective monthly variations by eqn (4). MaxlMin - MaxlMin- MaxlMin Tmonth, year = q - - ~Tmonih.day, year n &,=l MaxlMin = Maxl Min - T:UX l ~ in)/~:axl in ATmonth, year (4) The influence of climatic factors over consumption of electric energy Max l Min becomes clear in figure 2. The ~ L z i and ~ Tmonth,year, ~ ~ ~ series ~ are normalized level Max l Min to equal 1.0 in the maximum Cmonth,year and 0.5 in the maximum Tmon,,y,r avoid overlapping of curves. to level Max l Min Figure 2: Cm,,,year and Tmon,, (jad1995-jud2003). The correlation between these two series (AC,"~:~,~~~~,AT is higher than 0.7 in the majority of months, allowing the construction of a linear regression model. This causal model is not a forecasting model, because it is necessary to know all the daily temperatures in a determined month to calculate level ACmonth,year, through the regression straight line. In section 5 the results of the

Data Mining IV 44 1 neural and statistical models will be presented with the results of this causal model as another benchmark. A more detailed study about this relation can be found in [g]. The influence of the energy rationing, imposed by the federal government, on the sector in the period may-2001 to feb-2002, is also contemplated in this reference 3 Data mining technique (ANN and GA) Artificial Neural Networks (ANN) (as compared to real ones) are mathematical systems that are comprised of a number of "processing units" that are linked via weighted interconnections. A processing unit is essentially an equation, which is often referred to as a "transfer function". A processing unit takes weighted signals from other neurons, possibly combines them, transforms them and outputs a numeric result. Processing units are often considered crudely analogous to real neurons and since they are linked together in a mesh or network, the name Neural Networks was coined. Many neural networks have their neurons structured in "layers" that have similar characteristics and execute their transfer functions in synchronization (at the same time, virtually speaking). Nearly all neural networks have neurons that accept data and neurons that produce outputs. The behavior of neural networks, how they map input data to output data, is influenced primarily by the transfer functions of neurons, how they are interconnected and the weights of those interconnections. Typically, an architecture or structure of a neural network is established and one of a variety of mathematical algorithms is used to determine what the weights of the interconnections should be to maximize the accuracy of the outputs produced. Neural networks are "trained", meaning they use previous examples to establish (learn) the relationships between the input variables and the predicted variables be setting these weights. Once these relationships are established (the neural network is trained), the neural can be presented with new input variables and it will generate predictions. The applications use Genetic Algorithms (GA) to evolve neural network structures while simultaneously searching for significant input variables to maximize the predictive accuracy of the resulting neural network models. The effectiveness of the genetic algorithms capabilities cannot be overstated. For example, a problem consisting of finding the best combination (subset) of 20 inputs and up to 15 hidden nodes in a back propagation neural network is a combinatorial problem with over 16 million permutations. To train a network in a full search of all permutation would be a good project for a super computer. But with genetic algorithms, an excellent solution often appears in less than 1500 evaluations, which is 0.009% of the total possible configurations. Using some statistical data analysis to assist, highly fit networks are often found in the first 30 to 50 neural networks evaluated. This is clearly an efficient means for discovering effective network structurelinput combinations. Several tests were performed and some results are discussed in section 5.

442 Data Mining IV 4 Model characteristics During the development phase of neural forecasting models, combinations of both different architectures of neural nets and input data were tested with different metric errors. Due to the imposition of the rationing in year 2001 the dataset was segmented, as show in table 1. Table 1 : Segmentation of dataset. The input data used by forecasting neural models are classified in 3 groups, as show in table 2. Table 2: Classification of input data. lee& 8 %$;&:A,::. : &J%Y&Y 1 Efecijve Values Temwrpture (T), Equivalent Co~sumption (C) 2 Monthly Varlation AT, AC 3 Time month [l..iz), slne, coslne - - "-- %& I The first two categories of data were described in section 2. The third category represents the time by two cyclical forms: the month and the pair sine and cosine. This last representation presents an interesting characteristic: the Euclidean distances between nearby months are identical. Figure 3 detaches the equality between the distances, considering as example the pairs of months JanuaryFebruary and DecemberIJanuary. The time representation varying from 1 to 12 does not present this characteristic. May Jun Jul + Sep Apr I oct Mar Feb Jan Dec NO" Figure 3: Time representation. Five different neural models had been selected for each level totaling 15 models. This set allows, for each level, the combination of the forecasting values.

Data Mining IV 443 Table 3: Neural, statistical and cause-effect results performance model. 5 Results Table 3 compares results performance between neural, statistical and causeeffect models for each level, considering a global and monthly analysis. In the first one, the relative average error covers all 42 months (or 39, when the months influenced by the rationing are excluded) and in the second one, the calculation is made month by month. The results of the cause-effect model, which is based M& l Min on linear regression between the series AT rnonth,year and AC~:;;,,,, when MaxlMin corre~,~~, (AT,,,,, year, AC,":;,,~~~,.) is higher than 0.7, are shown in the last column of the table, for each level. As table 3 shows, some points, for each level, can be explored: low: the global neural models results (2.5%) are 0.3% better then statistical results (2.8%). Considering the analyses in the validation interval, this difference (0.3%) grows to 0.6%. The AC::;~~,~,, is more dispersive in the validation interval. January, March, April, July, September, November and December are months in which neural models made better predictions than statistical models.

444 Data Mining IV medium: neural models made better predictions than statistical models if the analyze is compared in the validation interval (0.4%). May and August are months in which statistical models made better predictions than neural ones. high: 6 Conclusions The AC,","~;,,,, is less dispersive in this level than in the others (low and medium level) and this is the main reason why the expected error is smaller. Neural models made better predictions than statistical models if the analysis is compared in the validation interval (0.3%). February, March, July, September, October, November and December are months in which neural models made better predictions than statistical models but in August this doesn't occur. The results in other months are equivalent. The neural network models used provided satisfactory results and, as commented in Roitman [9], showed superior performance than statistical models, which provided accurate forecasting, mainly in cases where there was not much input combination of historical series. During the development of the neural models some details deserved attention: the Time Delay Neural Network presents better results in the end of the process of optimization, directed by the genetic algorithm. After some tests with nets with more than one hidden layer, it was found out that only one hidden layer was necessary, with the maximum number of neurons inferior to 32; the optimization processes that presented best generalization capacity used the latest 12 months (January to December of 1999) as test interval; the fitness function used by the genetic algorithm, that presented the best performance, was the one that combined the accuracy of the forecasts measured by the Average Absolute Error with a measure of influence of the numbers of neurons in the entrance and hidden layers the validation interval used to validate the results of the neural models presented a higher dispersion of the monthly equivalent load variations for each platform higher than the one in the training interval. Each neural model can be understood as an expert in the load forecasting one month ahead. The Knowledge in each model, captured during the network training phase is different and is represented in their respective architectures. Once the models for each level presented a similar average absolute error it was

Data Mining IV 445 possible to combine their forecasting, generating a better one. That procedure has been used very well by CERJ. Another point is that the dependences between variables selected by forecasting models can change through time, which demands continuous tuning. The Knowledge extraction, as proposed by Hruschka and Ebecken [10, 1 l], can be used to make clear the hidden knowledge that exists in a neural network model. Considering the spot market needs, weekly forecasting models become relevant. In this way, the combination of both techniques, neural network and genetic algorithms, could be done with other new neural networks, like RBF as proposed in Barreto [12]. Future work focusing on the improvement of the forecasting models could be done using fuzzy logic and neuro-fuzzy techniques. References Chen,G. J., Li, K. K., Chung, T. S., Sun, H. B. &Tang, G. Q., Application of an innovative combined forecasting method in power system load forecasting, Electric Power Systems Research, 59; pp. l3 1-137, 2001. Hippert, H. S., Pedreira, C. E. & Souza, R. C., Neural Net-works for short-term load forecasting: A Review and Evaluation, In IEEE Transactions of Power Systems, 16(1), pp. 44-55,2001. Gavrilas, M., Ciutea, I. & Tanasa, C., Medium-term load forecasting with artificial neural network models, Proc. of the 16'~ International Conference and Exhibition on Electricity Distribution, Part 1: Contributions. CIRED. (IEE Conf. Pub1 No. 482), 6, pp. 167-171,2001. Chen, G. L., Li, K. K., Chung, T. S., Sun, H. B. & Tang, G. Q., Application of an innovative combined forecasting method in power system load forecasting, In Electric Power Systems Research, 59, pp. 131-137,2001. Zhang, G., Patuwo, B. E. & Hu, M. Y., Forecasting with artificial neural networks: The state of the art, International Journal of Forecasting, 14, pp. 35-62, 1998. Al-Saba, T., El-Amin, I., Artificial Neural Networks as applied to longterm demand forecasting, In Arti$cial Intelligence in Engineering, 13, pp. 189-197, 1999. Padmakumari, K., Mohandas, K. P. & Thiruvengadam, S., Long Term distribution demand forecasting using neuro fuzzy computations, In Electrical Power and Energy Systems, 21, pp. 315-322, 1999. Terra, G. S., Medium-term demand forecasting by a data mining approach, D. Sc. Thesis, COPPEIUFRJ, Rio de Janeiro, RJ, Brasil., 2003. Roitman, V. L., A computational model of neural networks for prediction of open unemployment rate, D. Sc. Thesis, COPPEIUFRJ, Rio de Janeiro, RJ, Brasil., 2001.

446 Data Mining IV [l01 Hruschka, E.R. and Ebecken, N.F.F, Rule Extraction from Neural Networks: Modified RX Algorithm, Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNWg), Washington DC, USA, July, 1999. [l11 Ebecken, N.F.F and Hruschka, E. R., Rules from Supervised Neural Networks in Data Mining, Frontier in Artijicial Intelligence and Applications, v. 71, In: Logic Artificial Intelligence and Robotics, J. M. Abe and J. I. Silva Filho, pp. 84-100, IOS Press, 2001. [l21 Barreto, A. M., Genetic orthogonal least squares algorithm for RBF networks training,. M. Sc. Thesis, COPPEAJFRJ, Rio de Janeiro, RJ, Brasil, 2003.