OPERA: Online Prediction by ExpeRt Aggregation

Size: px

Start display at page:

Download "OPERA: Online Prediction by ExpeRt Aggregation"

April Payne
6 years ago
Views:

Copenhagen University Yannig Goude, EDF R&D, LMO

1 OPERA: Online Prediction by ExpeRt Aggregation Pierre Gaillard, Department of Mathematical Sciences of Copenhagen University Yannig Goude, EDF R&D, LMO University of Paris-Sud Orsay UseR conference, Standford June 2016.

2 Context and methods

OPERA Online Prediction by ExpeRt Aggregation Developed

Paris-Sud Online Robust Aggregation, joined work of:

Normale Supérieure, Paris, France) Yannig Goude (EDF

Paris Sud) Pierre Gaillard Gilles Stoltz Should have

3 OPERA Online Prediction by ExpeRt Aggregation Developed by Pierre Gaillard during his PhD at EDF R&D/Université Paris-Sud Online Robust Aggregation, joined work of: Gilles Stoltz (CNRS-HEC Paris) Marie Devaine (Ecole Normale Supérieure, Paris, France) Yannig Goude (EDF R&D-Univ. Paris Sud) Pierre Gaillard Gilles Stoltz Should have been: OREL Online Robust Ensemble Learning OREO Online Robust Expert aggregation.. Marie Devaine Yannig Goude

4 Ensemble Learning What is it?

Ensemble Learning Popular methods Random Forest

Gradient Boosting Machines Friedman 2001

Roberts 1965) Leamer 1978 Robust aggregation of

5 Ensemble Learning Popular methods Random Forest Breiman 2001 Bootstrap aggregating Breiman 1994 Gradient Boosting Machines Friedman 2001 Stacking Wolpert 1992 Bayesian Model Averaging Roberts 1965) Leamer 1978 Robust aggregation of experts Vovk (1990); Littlestone and Warmuth (1994)

Ensemble Learning what people usually do to win ML competition During the nearly 3 years of the Netifx competition, there were two main factors which improved the overall

The meta-modelling then got us to the first position.

that they can contribute meaningfully in the ensemble.

6 Ensemble Learning what people usually do to win ML competition During the nearly 3 years of the Netifx competition, there were two main factors which improved the overall accuracy: The quality of the individual algorithms and the ensemble idea. BellKor's Pragmatic Chaos Team 1 million $ The XGBoost model got us to top 10. The meta-modelling then got us to the first position. Marios Michailidis, Mathias Müller, HJ van Veen $ The idea was always to get models that are individually good on their own but have as little correlation as possible so that they can contribute meaningfully in the ensemble. Andreas Merentitis, Alexander Bauer, Nurlanbek Duishoev $ In our final model, we had XGBoost as an ensemble model, which included 20 XGBoost models, 5 random forests, 6 randomized decision tree models, 3 regularized greedy forests, 3 logistic regression models, 5 ANN models, 3 elastic net models and 1 SVM model $ Darius, Davut, Song I've tried ensembling XGBoost and RNN models but nothing could beat the simpler linear model. Alejandro Mosquera $

7 Online robust aggregation algorithms Framework We want to forecast a sequence of observations Observations and predictions are made in a sequential fashion predictions of are based on past observations/predictions No stochastic assumptions are made

8 Online robust aggregation algorithms recipe data stream (sleeping) experts: human, machine learning methods loss function experts aggregation algorithms

9 Online robust aggregation algorithms Framework At each instant t, expert j outputs a forecast The aggregation algorithm determines his prediction based on Past observations Current and past expert forecasts At each instant, the aggregation algorithm computes a weight vector and forms

10 Robust Sequential Expert Aggregation Exponentially Weighted Aggregation Loss of the expert j at time i Parameters Weights update Oracle bounds Prediction, Learning, and Games Nicolò Cesa-Bianchi et Gábor Lugosi

11 Robust Sequential Expert Aggregation ML-Poly Parameters More weights to an expert with high regret Focus on worst cases for the aggregation: cases with positive regret Confidency to «stable» expert (low error variance) update Oracle bounds

12 Robust Sequential Expert Aggregation ML-Poly Empirical studies (here load forecasting) show that optimal predictions are obtained with therotical calibration for ML-Poly NO CALIBRATION NEEDED!

13 The package OPERA

Opera oracle calculation Y: the data stream to predict experts: the set of experts model: oracle, 'expert best fixed (constant over time) expert oracle, 'convex' best fixed convex combination,

14 Opera oracle calculation Y: the data stream to predict experts: the set of experts model: oracle, 'expert best fixed (constant over time) expert oracle, 'convex' best fixed convex combination, 'linear best fixed linear combination of expert, 'shifting for all number m of switches the sequence of experts with at most m shifts that would have performed the best to predict the sequence of observations in Y. loss.type: the loss function, 'square', 'absolute', 'percentage', or 'pinball (quantile reg.) awake: a matrix specifying the activation coefficients of the experts, lie in [0,1] lambda: a positive number used by the 'linear' oracle only. A possible $L_2$ regularization parameter for computing the linear oracle (if the design matrix is not identifiable) See

Opera the mixture fonction Y: the data stream to predict experts: the set of experts model: aggregation algorithm, 'EWA Exponential Weight Aggregation, 'FS Fixed Share, 'Ridge Ridge regression

15 Opera the mixture fonction Y: the data stream to predict experts: the set of experts model: aggregation algorithm, 'EWA Exponential Weight Aggregation, 'FS Fixed Share, 'Ridge Ridge regression 'Mlpol', Polynomial Potential aggregation, 'OGD Online Gradient Descent loss.type: the loss function, 'square', 'absolute', 'percentage', or 'pinball (quantile reg.) loss.gradient: should it take the gradient of the loss or not coefficients: prior weights of the experts (not possible for 'MLpol') awake: a matrix specifying the activation coefficients of the experts, lie in [0,1] parameters: optional parameters for the aggregation rule See

16 Opera Online Prediction by ExpeRt Aggregation Package + electricity load data to test it

17 An example on GEFCOM14 Electricity Load Data set Y: electricity demand on the US grid, daily mean, / X: lag load, temperature 25 weather stations, calendar information

18 Experts 50 experts from R Linear lasso, lars2, lars, enet, foba, icr, leapbackward, leapforward, leapseq, lm, lmstepaic, spikeslab, glm, BstLm, glm, glmboost, glmnet, glmstepaic Generalised Additive Models bagearth, bagearthgcv, bsttree, earth, gamloess, gamspline, gcvearth Projection based pcr, ppr, pls, plsrglm, simpls Regression tree: Gbm, blackboost, ctree, ctree2, rpart1se, rpart2, treebag, xgbtree Kernel Kernelpls, svmlinear, svmpoly, svmradial, svmradialsigma, svmradialcost, knn, kknn and their stacked version

19 Experts Calibration 50 experts from R Calibrate default parameter of the caret::train function, by cross validation 5-fold

20 Opera S3 methods for class oracle

21 Opera S3 methods for class oracle

22 Opera S3 methods for class oracle

23 Opera S3 methods for class mixture

24 Opera S3 methods for class mixture

25 Opera S3 methods for class mixture

26 Bibliography Kuhn, M. The caret Package 2009 Devaine, M.; Gaillard, P.; Goude, Y. & Stoltz, G. Forecasting electricity consumption by aggregating specialized experts - A review of the sequential aggregation of specialized experts, with an application to Slovakian and French country-wide one-day-ahead (half-)hourly predictions Machine Learning, 2013, 90, Gaillard, P. & Goude, Y. Forecasting Electricity Consumption by Aggregating Experts; How to Design a Good Set of Experts Modeling and Stochastic Learning for Forecasting in High Dimensions, Springer International Publishing, 2015, 217, Gaillard, P. Contributions à l'agrégation séquentielle robuste d'experts~: travaux sur l'erreur d'approximation et la prévision en loi. Applications à la prévision pour les marchés de l'énergie Université Paris-Sud 11, 2015 Gaillard, P.; Goude, Y. & Nedellec, R. Additive models and robust aggregation for GEFCom2014 probabilistic electric load and electricity price forecasting International Journal of Forecasting, Elsevier, 2016, 32, THANKS!

Forecasting the electricity consumption by aggregating specialized experts

Forecasting the electricity consumption by aggregating specialized experts Pierre Gaillard (EDF R&D, ENS Paris) with Yannig Goude (EDF R&D) Gilles Stoltz (CNRS, ENS Paris, HEC Paris) June 2013 WIPFOR Goal