Predicting intraday-load curve using High-D methods

Size: px

Start display at page:

Download "Predicting intraday-load curve using High-D methods"

Gwen Lane
5 years ago
Views:

1 Predicting intraday-load curve using High-D methods LPMA- Université Paris-Diderot-Paris 7 Mathilde Mougeot UPD, Vincent Lefieux RTE, Laurence Maillard RTE Horizon Maths 2013

2 Intraday load curve during a week Monday January 25 th to Sunday January 31 th 9 x

3 Intraday load curve forecasting -here 48h- 5.6 x Y 3.8 Yapx tpred

4 Forecasting procedure 1 Construction of a smart encyclopedia of past scenarios out of a data basis using learning algorithms. 2 Build a set of prediction experts consulting the encyclopedia. 3 Aggregate the prediction experts

5 Data basis The past data basis Electrical consumption of the past Meteorological input Endogenous variables: calendar data, functional bases

6 Electrical consumption of the past Recorded every half hour from January 1 st, 2003 to August 31 th, For this period of time, the global consumption signal is split into N = 2800 sub signals (Y 1,...,Y t,...,y N ). Y t R n, defines the intra day load curve for the t th day of size n = 48.

7 Intraday load curve for seven days Monday January 25 th to Sunday January 31 th 9 x

8 Example of calendar variations (seasonnal) 6 x x x x Figure: autumn, winter, spring and summer

9 Meteorological inputs A total of 371 (=2x39+293) meteorological variables recorded each day half-hourly over the 2800 days of the same period of time. Temperature: T k for k = 1,...,39 measured in 39 weather stations scattered all over the French territory. Cloud Cover: N k for k = 1,...,39 measured in the same 39 weather stations. Wind: W k for k = 1,...,293 available at 293 network points scattered all over the territory.

10 Weather stations Figure: Temperature and Cloud covering measurement stations. Wind stations

11 Brest- Lille- Marseille (a) T (b) CC (c) W Figure: Brest (blue line), Lille (red line) and Marseille (green).

12 Building the smart encyclopedia : several issues 1 Large dimension 2 All the variables (load curve, meteo) are highly correlated 3 Necessity to introduce typical patterns

13 Load curves : functional regression and clustering Compression of the intraday load curves : the signals Y t are treated as functions of the time and sparsely represented on a dictionary of functions (combination of Fourier basis and Haar basis). Clustering the previous sparse representations of the signals, into homogeneous groups. Pattern : define a pattern of consumption inside each group. Calendar attribution : Translate the clustering into calendar (predictable) variables

14 Reduced set of explanatory variables For each t index of the day of interest, we register the daily electrical consumption signal Y t and Z t = [P t M t ] = [C t B t [T] t [N] t [W] t ] is the concatenation of the "calendar and functional" variables and "climate variables" also in reduced dimension.

15 Sparse approximation on the learning set Sparse Approximation of each consumption day on a learning set of days ( ), using the reduced set of explanatory variables. For each day t of the learning set, we build an approximation Ŷ t of the (observed) signal Y t with the help of the new set of explanatory variables (Z t ): Ŷ t = G t (Z t )

16 Sparse approximation on the learning set Sparse Approximation For each day t of the learning set, Z t = [P t M t ] = [C t B t [T] t [N] t [W] t ] is the concatenation of "calendar" variables and "climate variables. We build an approximation Ŷ t of the (observed) signal Y t using (Z t ): Ŷ t = G t (Z t ) G t (Z t ) = Z tˆβt ( ) Sparse Approximation and Knowledge Extraction for Electrical Consumption Signals, 2012, M. Mougeot, D. P., K. Tribouley & V. Lefieux, L. Teyssier-Maillard

17 High dimensional Linear Models Y = Xβ +ǫ β IR p is the unknown parameter (to be estimated) ǫ = (ǫ 1,...,ǫ n ) is a (non observed) vector of random errors. It is assumed to be variables i.i.d. N(0,σ 2 ) X is a known matrix n p. High dimension : p >> n t

18 Smart Encyclopedia contents For each day t, 1 t 2800: the daily electrical consumption Y t. a qualitative description of t, given by calendar statements, clustering allocation. the meteorological indicators over the French territory M t = [T t N t W t ]. the estimated coefficient ˆβ t. the approximation of the daily consumption Ŷ t = Z t ˆβt.

19 Forecasting procedure Forecasting using the encyclopedia Construction of a set of forecasting experts. Aggregation of the experts.

20 Expert associated to the strategy M Forecasting experts Strategy : M a function, data dependent or not, from N to N such that for any d N,M(d) < d (purely non anticipative). Plug-in To the strategy M we associate the expert Ỹ M t : the prediction of the signal of day t using forecasting strategy M, Ỹ M t = G M(t) (Z t ) = Z tˆβm(t)

21 Examples of strategies : time depending tm1: Refer to the day before: (The coefficients used for prediction are those calculated the previous day) M(d) = d 1 Ỹ tm1 t = Z tˆβ t 1 tm7: Refer to one week before: M(d) = d 7 Ỹt tm7 = Z tˆβt 7

22 Experts introducing meteorological scenarios T: Find the day having the closest temperature indicators, regarding the sup distance (over the days, and over the indicators): M(d) = ArgMin t sup k {1,...,6}, i {1,...,48} T k d (i) Tk t (i) T m : Find the day having the closest median temperature with the sup distance (over the days): M(d) = ArgMin t sup i {1,...,48} T 3 d (i) T3 t(i)

23 Experts introducing a climatic configuration of the day constrained by the type of the day N s/j : Closest cloud covering indicators (min, max, med, std) regarding the sup distance for days which are of the same type as Y t : M = ArgMin J(d)=J(t) sup i,k N k d (i) Nk t(i) within the same type of day as Y t

24 MAPE error For day t, the prediction MAPE error over the interval [0,T] is defined by: MAPE(Y,Ỹ M t )(T) = 1 T MISE(Y,Ỹ M t )(T) = 1 T T i=1 ỸM t (i) Y t (i) Y t (i) T Ỹt M (i) Y t (i) 2 i=1

25 Prediction evaluation Names average median std Naive Yday Week T med T med /W T med /N T T/G T/D T/C N N/G N/D N/C W W/G W/D W/C

26 Prediction evaluation-comparing experts Yday Week Tm Tm/N Tm/W T T/g T/d T/c N N/g N/d N/c W W/g W/d W/c Figure: Frequencies of best performances computed for one year of data from September 1 th 2009 to August 31 th 2010.

27 Prediction evaluation-comparing experts on days 0.25 Ranking Predictor Performances per Day tm1 tm7 Ts Ts/N Tm Tm/N T/g N/g T/j N/j T/c N/c Figure: Percentage of best predictor among days (1:monday,... 7:sunday)

28 Prediction evaluation-comparing experts 0.18 Best Predictor Performances per Month tm1 tm7 Ts Ts/N Tm Tm/N T/g N/g T/j N/j T/c N/c Figure: Percentage of best predictor among month

29 Aggregation of predictors: Exponential weights (inspired by various theoretical results -see Lecue, Rigollet, Stolz, Tsybakov,...-) with M Ỹ wgt d m=1 = wm d Ỹm d M m=1 wm d w M d = exp( 1 Tθ T ỸM d (i) Y d (i) 2 ) i=1 θ is a parameter, (often called temperature in physic applications, see the discussion below) T = Tpred.

30 Forecasting (mape=0.7%). 5.6 x Y 3.8 Yapx tpred

31 Sparse methods

32 Sparse approximation on the learning set Sparse Approximation For each day t of the learning set, Z t = [P t M t ] = [C t B t [T] t [N] t [W] t ] is the concatenation of "calendar" variables and "climate variables. We build an approximation Ŷ t of the (observed) signal Y t using (Z t ): Ŷ t = G t (Z t ) G t (Z t ) = Z tˆβt

33 High dimensional Linear Models Y = Xβ +ǫ β IR p is the unknown parameter (to be estimated) ǫ = (ǫ 1,...,ǫ n ) is a (non observed) vector of random errors. It is assumed to be variables i.i.d. N(0,σ 2 ) X is a known matrix n p. High dimension : p >> n t ( ) M. Mougeot, D. P., K. Tribouley, JRSS B 2012,B Stat. Methodol. vol 74

34 Conditions generally required to solve the problem Sparsity conditions on the vector β restricted identity conditions on the matrix X

35 Sparsity conditions

36 Restricted identity property For C {1,... p}, denote X C the matrix X restricted to the raws which are in C and the associated Gram-matrix M(C) := 1 n Xt C X C Restricted identity property means that M(C) is almost the identity matrix for any C small enough.

37 Example 1: RIP RIP(m 0,ν) assumes that There exist 0 ν < 1 and m 0 1 such that : x IR m, x 2 l 2 (m) (1 ν) xt M(C)x x 2 l 2 (m) (1+ν),

38 Example 2: Coherence condition M := 1 n Xt X. M jj = 1 for all j. Coherence τ n = sup M lm = sup 1 l m l m n Coherence = RIP( ν/τ n,ν) n X il X im i=1

39 Sparsity conditions #{l {1,...,p}, β l 0} S β l q M, 0 < q < 1 (B q (M)) l SMALL NUMBER OF BIG COEFFICIENTS

40 Penalization for sparsity Many penalizations introduced historically in the regression framework (to put identification constraints on β) Ridge: E(β,λ) = Y Xβ 2 +λσ j β 2 j Lasso: E(β,λ) = Y Xβ 2 +λσ j β j Scad: E(β,λ) = Y Xβ 2 +λσ j w j g(β j ) Solutions based on: Convex Optimization for L1, non convex Opti. for Scad Fan & Lv (2008, 2010), Candes & Tao (2007)... Many others...

41 2-thresholding-step Procedures Y = Xβ +ǫ Y (n 1), X (n p) steps compute size Step1=pre-selection Find b Leaders X b (n, b) b < n << p REGRESSION on Leaders β = (X b X b ) 1 X by (1, b) Step2=denoising the coefficients ˆβ (1, Ŝ)

42 Winter forecast 9 x Figure: Forecast (solid blue line) and observed (dashed dark line) electrical consumption for a winter week from Monday February 1 st to Sunday January 7 th 2010.

43 Spring forecast 6 x Figure: Forecast (solid blue line) and observed (dashed dark line) electrical consumption for a spring week from Monday June 14 th to Sunday June 21 th 2010.

High dimensional statistical learning methods applied to energy data sets

High dimensional statistical learning methods applied to energy data sets Dominique Picard Université Paris-Diderot LPMA Data science challenge : Post-modern Rationality Replace human logic by statistical