Electricity Demand Forecasting using Multi-Task Learning

Similar documents
arxiv: v1 [cs.lg] 27 Dec 2015

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

How Accurate is My Forecast?

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Fuzzy Logic Based Short Term Load Forecast for the Holidays

Forecasting the electricity consumption by aggregating specialized experts

Robust Building Energy Load Forecasting Using Physically-Based Kernel Models

Day Ahead Hourly Load and Price Forecast in ISO New England Market using ANN

Weighted Fuzzy Time Series Model for Load Forecasting

Forecasting day-ahead electricity load using a multiple equation time series approach

A FUZZY TIME SERIES-MARKOV CHAIN MODEL WITH AN APPLICATION TO FORECAST THE EXCHANGE RATE BETWEEN THE TAIWAN AND US DOLLAR.

Kernel Learning via Random Fourier Representations

Load Forecasting Using Artificial Neural Networks and Support Vector Regression

CIS 520: Machine Learning Oct 09, Kernel Methods

Predicting intraday-load curve using High-D methods

Package TSPred. April 5, 2017

Lazy learning for control design

Gated Ensemble Learning Method for Demand-Side Electricity Load Forecasting

arxiv: v1 [stat.ap] 4 Mar 2016

Computational Intelligence Load Forecasting: A Methodological Overview

CHAPTER 6 CONCLUSION AND FUTURE SCOPE

Large-Scale Behavioral Targeting

Learning gradients: prescriptive models

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

10-701/ Recitation : Kernels

Power Prediction in Smart Grids with Evolutionary Local Kernel Regression

Short Term Load Forecasting Using Multi Layer Perceptron

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Conditional Autoregressif Hilbertian process

Probabilistic Energy Forecasting

A Sparse Linear Model and Significance Test. for Individual Consumption Prediction

ANN and Statistical Theory Based Forecasting and Analysis of Power System Variables

Short Term Load Forecasting Based Artificial Neural Network

Pattern Matching and Neural Networks based Hybrid Forecasting System

Short-Term Load Forecasting Using ARIMA Model For Karnataka State Electrical Load

Virtual Sensors and Large-Scale Gaussian Processes

Forward and Reverse Gradient-based Hyperparameter Optimization

One-Hour-Ahead Load Forecasting Using Neural Network

Modelling and Forecasting Australian Domestic Tourism

Less is More: Computational Regularization by Subsampling

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

Negatively Correlated Echo State Networks

Machine learning for crystal structure prediction

Predicting the Electricity Demand Response via Data-driven Inverse Optimization

WEATHER DEPENENT ELECTRICITY MARKET FORECASTING WITH NEURAL NETWORKS, WAVELET AND DATA MINING TECHNIQUES. Z.Y. Dong X. Li Z. Xu K. L.

Kaggle.

Less is More: Computational Regularization by Subsampling

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

Forecasting Hourly Electricity Load Profile Using Neural Networks

Mathematical Methods for Data Analysis

5.6 Nonparametric Logistic Regression

Short-term water demand forecast based on deep neural network ABSTRACT

Introduction to Course

Forecasting of Electric Consumption in a Semiconductor Plant using Time Series Methods

Stochastic optimization in Hilbert spaces

Approximate Kernel PCA with Random Features

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

Nonparametric forecasting of the French load curve

An Evaluation of Methods for Very Short-Term Load Forecasting Using Minute-by-Minute British Data

Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford

Dynamic Time Warping and FFT: A Data Preprocessing Method for Electrical Load Forecasting

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Understanding Travel Time to Airports in New York City Sierra Gentry Dominik Schunack

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

In-Database Factorised Learning fdbresearch.github.io

Markovian Models for Electrical Load Prediction in Smart Buildings

Large-scale Probabilistic Forecasting in Energy Systems using Sparse Gaussian Conditional Random Fields

Efficient Complex Output Prediction

ClassCast: A Tool for Class-Based Forecasting

Prashant Pant 1, Achal Garg 2 1,2 Engineer, Keppel Offshore and Marine Engineering India Pvt. Ltd, Mumbai. IJRASET 2013: All Rights are Reserved 356

Spatially-Explicit Prediction of Wholesale Electricity Prices

Forecasting demand in the National Electricity Market. October 2017

Approximate Kernel Methods

ANN based techniques for prediction of wind speed of 67 sites of India

Univariate versus Multivariate Models for Short-term Electricity Load Forecasting

Regularization in Reproducing Kernel Banach Spaces

Holiday Demand Forecasting in the Electric Utility Industry

Research Article Weather Forecasting Using Sliding Window Algorithm

Multi-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

On the energy demands of small appliances in homes

Data dependent operators for the spatial-spectral fusion problem

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction

Caesar s Taxi Prediction Services

Back to the future: Radial Basis Function networks revisited

Kernel-Based Retrieval of Atmospheric Profiles from IASI Data

A Comparison of the Forecast Performance of. Double Seasonal ARIMA and Double Seasonal. ARFIMA Models of Electricity Load Demand

15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation

FORECASTING THE RESIDUAL DEMAND FUNCTION IN

ABC random forest for parameter estimation. Jean-Michel Marin

A Support Vector Regression Model for Forecasting Rainfall

Hierarchical Boosting and Filter Generation

TEMPERATUTE PREDICTION USING HEURISTIC DATA MINING ON TWO-FACTOR FUZZY TIME-SERIES

Basis Expansion and Nonlinear SVM. Kai Yu

Sparse and Robust Optimization and Applications

Demand Forecasting. for. Microsoft Dynamics 365 for Operations. User Guide. Release 7.1. April 2018

arxiv: v1 [stat.ap] 17 Oct 2016

A Simple Algorithm for Multilabel Ranking

BUSI 460 Suggested Answers to Selected Review and Discussion Questions Lesson 7

Transcription:

Electricity Demand Forecasting using Multi-Task Learning Jean-Baptiste Fiot, Francesco Dinuzzo Dublin Machine Learning Meetup - July 2017 1 / 32

Outline 1 Introduction 2 Problem Formulation 3 Kernels 4 Experiments 5 Conclusion 2 / 32

Outline 1 Introduction 2 Problem Formulation 3 Kernels 4 Experiments 5 Conclusion 3 / 32

Electricity Demand Forecasting Electricity is a special commodity It cannot be stored efficiently (in large quantities) It looses value when being moved (line losses) Demand forecasting is critical Operations, bidding, demand response, maintenance, planning, etc. The game is changing Distributed renewable generation Higher volatility on markets Increased number of participants 4 / 32

Demand Forecasting Methods (Non-)linear variants of least-squares, ARMAX, fuzzy logic, etc. Black-box models based on neural networks [Hippert et al., 2001] Generalized Additive Models (GAM) Great performance [Fan and Hyndman, 2012, Ba et al., 2012] Efficient and scalable training algorithms Interpretability of the model Hippert, HS, et al. Neural networks for short-term load forecasting: A review and evaluation. Power Systems, IEEE Transactions on, 16(1):44 55, 2001. Fan, S and Hyndman, R. Short-term load forecasting based on a semi-parametric additive model. Power Systems, IEEE Transactions on, 27(1):134 141, 2012. Ba, A, et al. Adaptive learning of smoothing functions: application to electricity load forecasting. In Advances in Neural Information Processing Systems 25 (NIPS 2012), pages 2519 2527. 2012. 5 / 32

Demand Forecasting using Kernel Methods In 2001, kernel-based support vector regression won EUNITE (European Network on Intelligent Technologies for Smart Adaptive Systems) demand forecasting competition [Chen et al., 2004] Later, kernel-based regularizations and support vector techniques were successfully used [Espinoza et al., 2007, Hong, 2009, Elattar et al., 2010] Chen, B, et al. Load forecasting using support vector machines: A study on EUNITE competition 2001. Power Systems, IEEE Transactions on, 19(4):1821 1830, 2004. Espinoza, M, et al. Electric load forecasting. Control Systems, IEEE, 27(5):43 57, 2007. Hong, WC. Electric load forecasting by support vector model. Applied Mathematical Modelling, 33(5):2444 2454, 2009. Elattar, E, et al. Electric load forecasting based on locally weighted support vector regression. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 40(4):438 447, 2010. 6 / 32

Outline 1 Introduction 2 Problem Formulation 3 Kernels 4 Experiments 5 Conclusion 7 / 32

Electric Demand Forecasting ŷ = f (t, d, c, y l, u l, j, s j ), Time/Calendar features t [0, 24) is the time of day expressed in hours, d {1, 2,..., 365, 366} is the day of the year, c is the type of day, e.g. Monday to Sunday, Dynamic features y l is a real vector containing lagged values of the electric demand, u l is a real vector containing measurements of lagged values of exogenous variables other than the demand (such as temperature), Meter features j is the meter ID in the electricity network, s j is a vector of features describing the demande measured at j. 8 / 32

Electric Demand Forecasting ŷ = f (t, d, c, y l, u l, j, s j ), Time/Calendar features t [0, 24) is the time of day expressed in hours, d {1, 2,..., 365, 366} is the day of the year, c is the type of day, e.g. Monday to Sunday, Dynamic features y l is a real vector containing lagged values of the electric demand, u l is a real vector containing measurements of lagged values of exogenous variables other than the demand (such as temperature), Meter features j is the meter ID in the electricity network, s j is a vector of features describing the demande measured at j. 8 / 32

Electric Demand Forecasting ŷ = f (t, d, c, y l, u l, j, s j ), Time/Calendar features t [0, 24) is the time of day expressed in hours, d {1, 2,..., 365, 366} is the day of the year, c is the type of day, e.g. Monday to Sunday, Dynamic features y l is a real vector containing lagged values of the electric demand, u l is a real vector containing measurements of lagged values of exogenous variables other than the demand (such as temperature), Meter features j is the meter ID in the electricity network, s j is a vector of features describing the demande measured at j. 8 / 32

Solving Multiple Demand Forecasting Problems Consider m smart meters, indexed by j Goal: learn {f j : X R} 1 j m from datasets (x ij, y ij ) X R. 9 / 32

Optimisation Problem Letting f : X R m the function with components f j, we minimize R(f, L) = l m j (y ij f j (x ij ))) 2 + λ f 2 H L, (1) j=1 i=1 where λ > 0 is a regularization parameter, and H L is a Reproducing Kernel Hilbert Space (RKHS) of vector-valued functions with (matrix-valued) kernel H(x i, x j ) = K(x i, x j ) L, (2) K : X X R is the input kernel, and L R m m is the output kernel. Representer theorem: there exist functions ˆf j minimizing R(f, L) in the form: ˆf j (x) = m l k L jk k=1 i=1 c ik K(x ik, x). (3) 10 / 32

Fixing L = I: Independent Kernel Ridge Regression 11 / 32

Learning L = I: Output Kernel Learning Remark: B = (b ij ) is a Cholesky factor of L 12 / 32

Output Kernel Learning Joint optimization problem min L S m,p + min R(f, L) + λtr(l), f H L where S m,p + is the cone of p.s.d. matrices with rank p. Re-indexing the observations {x i } i=1,...,l, the solution becomes where ˆf j (x) = p b jk g k (x), g k (x) = k=1 l a ik K(x i, x), { b jk coefficients form a low-rank factor of L, g k functions can be seen as modes or typical profiles. It is sufficient to store (l + m)p parameters, which can be much smaller than m j=1 l j. i=1 13 / 32

Outline 1 Introduction 2 Problem Formulation 3 Kernels 4 Experiments 5 Conclusion 14 / 32

Multiple Seasonalities in Electricity Demand Figure: French National Demand (Réseau de Transport d Électricité data) 15 / 32

Capturing Demand Seasonalities with Kernels Time-of-day kernel K t (t 1, t 2 ) = exp ( h T ( t 1 t 2 )/σ t ), (4) Day-of-year kernel K d (d 1, d 2 ) = exp ( h D ( d 1 d 2 )/σ d ), (5) where h P (x) = min{x, P x} is a change of variable that yields P-periodic kernels over the square [0, P] 2. In our experiment, σ t and σ d were respectively set to 4 hours and 120 days. Day-type kernel K c (c 1, c 2 ) = { 1 if c 1 = c 2 0 if c 1 c 2.. (6) 16 / 32

Kernels for Electric Demand Forecasting To define K((t 1, d 1, c 1 ), (t 2, d 2, c 2 )), we combine the basis kernels Additive Models K t (t 1, t 2 ) + K d (d 1, d 2 ), (7) K t (t 1, t 2 ) + K d (d 1, d 2 ) + K c (c 1, c 2 ), (8) Semi-Additive Models K d (d 1, d 2 ) + K t (t 1, t 2 ) K c (c 1, c 2 ), (9) ( K t (t 1, t 2 ) + K d (d 1, d 2 ) ) K c (c 1, c 2 ), (10) Multiplicative Models K t (t 1, t 2 ) K d (d 1, d 2 ), (11) K t (t 1, t 2 ) K d (d 1, d 2 ) K c (c 1, c 2 ). (12) 17 / 32

Outline 1 Introduction 2 Problem Formulation 3 Kernels 4 Experiments 5 Conclusion 18 / 32

Commission for Energy Regulation (CER) Data 6435 smart meters 536 days (Jul 14, 2009 - Dec 31, 2010) Half-hour sampling 3 groups: residential, SME, others 19 / 32

Commission for Energy Regulation (CER) Data 6435 smart meters 536 days (Jul 14, 2009 - Dec 31, 2010) Half-hour sampling 3 groups: residential, SME, others 19 / 32

Commission for Energy Regulation (CER) Data 6435 smart meters 536 days (Jul 14, 2009 - Dec 31, 2010) Half-hour sampling 3 groups: residential, SME, others 19 / 32

Commission for Energy Regulation (CER) Data 6435 smart meters 536 days (Jul 14, 2009 - Dec 31, 2010) Half-hour sampling 3 groups: residential, SME, others 19 / 32

Removed two corrupted meters Corrected DST measurements Pre-processing Downsampled to 3-hour resolution Final dataset: m = 6433 smart meters l = 4288 time slots Customer group Meters Sparsity Residential 4225 0.028% Industrial (SME) 485 0.035% Others 1723 17% 20 / 32

Learning the Models Data split 1 year (2920 obs.) used for training (80%) and validation (20%) 0.5 year (1368 obs.) used for testing Independent Kernel Ridge Regression using the 6 kernels Output Kernel Learning using MM2 1 model for {residential} {others}, p = 200 to fit in memory 1 model for {SME}, full rank (p = 485) 21 / 32

Qualitative Analysis 8000 7000 6000 5000 4000 3000 2000 2010 11 28 2010 12 05 2010 12 12 2010 12 19 2010 12 26 12 10 8 6 4 2010 11 28 2010 12 05 2010 12 12 2010 12 19 2010 12 26 3 2.5 2 1.5 1 0.5 2010 11 28 2010 12 05 2010 12 12 2010 12 19 2010 12 26 Figure: Measured load (blue), indep. KRR (red) and multi-task OKL (black) forecasts for the aggregated demand (top), a single SME meter (middle), and a single residential meter (bottom). 22 / 32

Performance Metrics (1/2) Given a group of meters G and observation i, we define Absolute percentage error (APE) j G APE(i, G) = 100 i y ij j G i f j (t i, d i, c i ) j G i y ij, (13) where G i is the subset of meters with available observations at i. Normalized absolute error (NAE) j G NAE(i, G) = i y ij f j (t i, d i, c i ), (14) j G i y ij 23 / 32

Performance Metrics (2/2) Mean absolute percentage error (MAPE) MAPE(G) = 1 # T i T APE(i, G), (15) Mean normalized absolute error (MNAE) MNAE(G) = 1 # T i T NAE(i, G). (16) 24 / 32

Prediction Accuracy (1/2) MNAE and standard error 0.55 0.50 0.45 0.40 0.35 0.30 MAPE and standard error 18 16 14 Additive Model 1 12 Additive Model 2 Semi-Additive 10 Model 1 Semi-Additive Model 2 Multiplicative 8 Model 1 Multiplicative Model 2 6 Multi-Task OKL 4 2 Additive Model Additive Model 2 Semi-Additive Model 1 Semi-Additive Model 2 Multiplicative Model 1 Multiplicative Model 2 Multi-Task OKL 0.25 Overall Residential SME Others 0 Overall Residential SME Others 1 Multiplicative kernels outperform (semi-)additive models. Multiplicative kernels lead to a stricter selection of training obs. EUNITE winners discarded 90% of the dataset. 25 / 32

Prediction Accuracy (1/2) MNAE and standard error 0.55 0.50 0.45 0.40 0.35 0.30 MAPE and standard error 18 16 14 Additive Model 1 12 Additive Model 2 Semi-Additive 10 Model 1 Semi-Additive Model 2 Multiplicative 8 Model 1 Multiplicative Model 2 6 Multi-Task OKL 4 2 Additive Model Additive Model 2 Semi-Additive Model 1 Semi-Additive Model 2 Multiplicative Model 1 Multiplicative Model 2 Multi-Task OKL 0.25 Overall Residential SME Others 0 Overall Residential SME Others 1 Multiplicative kernels outperform (semi-)additive models. Multiplicative kernels lead to a stricter selection of training obs. EUNITE winners discarded 90% of the dataset. 2 Multi-task OKL outperforms independent kernel ridge regression The multi-task approach efficiently exploits the similarities 44% improvement of σ APE for SME against 2nd best method 25 / 32

Prediction Accuracy (2/2) 10 0 10 0 OKL MM 2 MM 1 SAM 2 SAM 1 AM 2 AM 1 10-1 10-2 10-3 OKL MM 2 MM 1 SAM 2 SAM 1 AM 2 AM 1 10-1 10-2 10-3 AM 1 AM 2 SAM 1 SAM 2 MM 1 MM 2 OKL (a) NAE 10-4 AM 1 AM 2 SAM 1 SAM 2 MM 1 MM 2 OKL (b) APE Figure: p-values of Welch t-test between the overall accuracies of all methods on the CER dataset 10-4 26 / 32

Basis Load Profiles g k Jul 14, 09 Jul 18, 09 Jul 22, 09 Jul 26, 09 Jul 30, 09 Aug 03, 09 Aug 07, 09 Figure: CER Data: Typical load profiles displayed over the horizon of one month, obtained from a low-rank OKL model with p = 10. 27 / 32

Number of Parameters In this experiment, the OKL model is 4.24 times more compact. Single-task: # params = # obs. = m j=1 l j 1.3 10 7 Multi-task OKL: # params = (l + m)p 3 10 6 28 / 32

Relationships between Smart Meters 1 0.8 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-1 Figure: CER data: entries of the normalized output kernel L n R m m for a subset containing 50 residential and 50 SME (small or medium L enterprise) customers. (L n) ij = ij Lii, i, j = 1,..., m. L jj 29 / 32

Outline 1 Introduction 2 Problem Formulation 3 Kernels 4 Experiments 5 Conclusion 30 / 32

Contributions 1 We formulated the problem of forecasting the demand measured on multiple lines of the network as a multi-task problem. 2 We designed kernels able to capture the seasonal effects present in electricity demand data. 3 We exposed the performance limits of the very popular additive models, showing that they are often outperformed by multiplicative kernel models. 4 We showed how MTL can be used to gain insights and interpretability on real demand data 31 / 32

32 / 32 Thank You Any question? Contact details Jean-Baptiste Fiot jean-baptiste.fiot@centraliens.net