PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou

Similar documents
GAUSSIAN PROCESS REGRESSION

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Gaussian Process Regression

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Introduction to SVM and RVM

Kernel methods, kernel SVM and ridge regression

Model Selection for Gaussian Processes

Advanced Introduction to Machine Learning CMU-10715

Pattern Recognition 2018 Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Jeff Howbert Introduction to Machine Learning Winter

CMU-Q Lecture 24:

Support Vector Machine (continued)

Bayesian Support Vector Machines for Feature Ranking and Selection

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Brief Introduction of Machine Learning Techniques for Content Analysis

Introduction to Support Vector Machines

Prediction of double gene knockout measurements

Introduction to Gaussian Process

Perceptron Revisited: Linear Separators. Support Vector Machines

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Relevance Vector Machines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

NON-FIXED AND ASYMMETRICAL MARGIN APPROACH TO STOCK MARKET PREDICTION USING SUPPORT VECTOR REGRESSION. Haiqin Yang, Irwin King and Laiwan Chan

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

LMS Algorithm Summary

References. Lecture 7: Support Vector Machines. Optimum Margin Perceptron. Perceptron Learning Rule

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Kernel Methods. Machine Learning A W VO

A Magiv CV Theory for Large-Margin Classifiers

Canonical Correlation Analysis with Kernels

Nonparameteric Regression:

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Support Vector Machine Regression for Volatile Stock Market Prediction

Gaussian Processes for Machine Learning

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Probabilistic Graphical Models Lecture 20: Gaussian Processes

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Variational Principal Components

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Relevance Vector Machines for Earthquake Response Spectra

Midterm exam CS 189/289, Fall 2015

PCA, Kernel PCA, ICA

Machine Learning. Support Vector Machines. Manfred Huber

GWAS V: Gaussian processes

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Classifier Complexity and Support Vector Classifiers

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Introduction to Machine Learning

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

SMO Algorithms for Support Vector Machines without Bias Term

CIS 520: Machine Learning Oct 09, Kernel Methods

Gaussian with mean ( µ ) and standard deviation ( σ)

Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: Support Vector Machine and Large Margin Classifier

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Neutron inverse kinetics via Gaussian Processes

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machine (SVM) and Kernel Methods

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

An Introduction to Gaussian Processes for Spatial Data (Predictions!)

Reliability Monitoring Using Log Gaussian Process Regression

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Data Mining - SVM. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - SVM 1 / 55

Gaussian Processes (10/16/13)

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Machine Learning: Assignment 1

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February

Review: Support vector machines. Machine learning techniques and image analysis

Lecture Notes on Support Vector Machine

Pattern Recognition and Machine Learning

Table 1-2. TMY3 data header (line 2) 1-68 Data field name and units (abbreviation or mnemonic)

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Kernels and the Kernel Trick. Machine Learning Fall 2017

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Lecture 5: GPs and Streaming regression

Least Squares SVM Regression

Linear regression example Simple linear regression: f(x) = ϕ(x)t w w ~ N(0, ) The mean and covariance are given by E[f(x)] = ϕ(x)e[w] = 0.

Support Vector Machines

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Variational Model Selection for Sparse Gaussian Process Regression

Recent Advances in Bayesian Inference Techniques

Support Vector Machines

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

References. Lecture 3: Shrinkage. The Power of Amnesia. Ockham s Razor

Transcription:

PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou

Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable Goal: Creating automatic prediction models Predicting future solar power intensity given weather forecasts.

Data Source NREL National Solar Radiation Database 1991 2010 Hourly weather and solar intensity data for 20 years Station: ST LOUIS LAMBERT INT'L ARPT, MO Input: (combination of 9 weather metrics) Date Time Opaque Sky Cover Dry-bulb Temperature Dew-point Temperature Relative Humidity Station Pressure Wind Speed Liquid Precipitation Depth Output: Amount of solar radiation (Wh/m 2 ) received in a collimated beam on a surface normal to the sun

Method Regression: To learn a mapping from some input space X = R n of n-dimensional vectors to an output space Y = R n of real-valued targets Linear least squares regression Support vector regression (SVR) using multiple kernel functions Gaussian processes

Linear Model y = f X = X T a + e where y R n : measurement (solar intensity) X T R n p+1 : each row is a p-dimensional input a R p+1 : unknown coefficient e R n : random noise Loss function(square error): y y 2 minimize y y 2 = minimize y X T a 2

Linear Model Generated Prediction Model from training set: SolarIntensity = 171.02 9.9119*Date 33.881*Time 50.509*SkyCover + 580.45*Temp 522.17*DewPoint + 102.89*Humidity + 33.533*Pressure + 33.715*Wind Speed 3.7929*Precipitation Applying on the Test set, Prediction mean square error: 217.6391

Support Vector Regression Given training data {(x 1, y 1 ), (x 2, y 2 ) (x n, y n )} Linear ε-svr Model: f x = w, x + b = w T x + b minimize 1 2 w 2 +C (ξ i +ξ i ) i subject to y i f(x i ) ε + ξ i f(x i ) y i ε + ξ i ξ i, ξ i 0 Loss function: (epsilon intensive) ξ ε 0 ξ ε if ξ ε otherwise.

Dual Problem Construct a Lagrange function from the objective function and the corresponding constraints optimal solution: w = i α i α i x i thus, f(x) = i α i α i x i, x + b dual optimization problem: maximize 1 2 i,j α i α i α j α j x i, x j ε α i + α i + i i y i α i α i subject to α i α i = 0 i and α i, α i 0, C

Kernel Trick for SVR the kernel trick is a way of mapping observations from a general set S (Input space) into an inner product space V (high dimensional feature space) Φ: R n R m m n ω = α i α i ϕ x i i f(x) = α i α i k(x i, x) + b i where k x i, x = ϕ x i, ϕ x.

SVR Model Using matlab LibSVM library Cross Validation to select appropriate kernel function and parameters. Optimal selection: Radial Basis Function(RBF) kernel: K x, z = exp(γ x z 2 ) Cost = 17.1828 (trade-off) Epsilon ε = 0.001 (the width of the ε-insensitive zone) Gamma γ = 0.177828

Data Total number of data points: 24(h)*365(d)*20(y) = 175,200 20% of data to train (around 4 years of training data) Training data sets size: 35,064 Input: 35064 9 matrix Output: 30564 1 vector 10% of data to test Test data sets size: 17,532

Prediction error Linear regression SVM regression MSE: 215.7884 MSE: 130.1537 Improved 39.68%

Principal Component Analysis (PCA) Some weather metrics correlate strongly Such as: Temperature & Time of the day Applying PCA to remove redundant information

PCA The eigenvalues represent the distribution of the source data s energy among each of the eigenvectors, where the eigenvectors form a basis for the data λ9=0.0034

Errors applying PCA Feature Set dimension Training error Test error 9 122.0676 130.1537 8 122.1306 130.1335 7 131.7175 137.3897 6 138.8723 142.9340 5 145.5919 149.2520 4 244.2318 248.5261 3 255.2125 256.5611 2 258.0127 258.0480 1 321.8330 316.8262

Gaussian Processes Given training set D {(x i, y i ) i = 1,, n} GP regression model: y i = f x i + ε i, where noise ε i ~N(0, σ 2 I) Assume a zero mean GP prior distribution over inference functions f. In particular, f x 1,..., f x n ~N 0, K, where K is the covariance function, or kernel, which specifies the covariance between pairs of random variables. K p,q = Cov(f x p, f x q ) = K(x p, x q )

Gaussian Processes To make predictions y at test points X, where y = f X + ε According to GP prior, joint distribution of f and f : f f ~ N 0, K X, X K X, X K X, X K X, X From i.i.d. noise assumption: ε ε ~ N 0, It follows that where σ 2 I 0 0 σ 2 I p y D, X = N(μ, Σ) μ = K X, X [K(X, X) + σ 2 I] 1 y Σ = K X, X K X, X K X, X + σ 2 I 1 K X, X.

GP Model Matlab GPML library Use 5% of data to train, 5% data to test Apply PCA to reduce the input dimension to eight Choose ARD covariance function: K x p, x q = c 2 exp( 1 2 xp x q ) T P 1 x p x q, hyper-parameters: θ = c, P Optimize the marginal likelihood: p θ X, y p y X, θ = N 0, K X, X + σ 2 I θ ML = argmax θ Optimal hyper-parameters selection: σ = 0.1475, c = 0.45175, P 11 = 25.2541, P 22 = 2.2544, P 33 = 1.4701, P 44 = 53.5211, P 55 = 0.3354, P 66 = 1.6194, P 77 = 6.8729, P 88 = 4.2863 p y X, θ

Sparse Pseudo-input GP (SPGP) GPs are prohibitive for large data sets due to the inversion of the covariance matrix. Consider a model parameterized by a pseudo data set D of size m n, where n is the number of real data points. Reduce training cost from O n 3 to O m 2 n, and prediction cost from O n 2 to O m 2

Sparse Pseudo-input GP (SPGP) Pseudo data set D: X = x i i=1 m, f = f i i=1 m Prior on pseudo targets: p f X = N(0, K M ) Likelihood: p y x, X, f = N K x T K M 1 f, K xx K x T K M 1 K x + σ 2 Posterior distribution over f : p f D, X = N K M Q M 1 K MN (Λ + σ 2 I) 1 y, K M Q M 1 K M where Q M = K M + K MN (Λ + σ 2 I) 1 K NM Given new input x, the predictive distribution: p y D, X = dfp y x, X, f p f D, X = N μ, Σ where μ = K T Q M 1 K MN (Λ + σ 2 I) 1 y Σ = K K T K M 1 Q M 1 K + σ 2.

SPGP Further apply SPGPs on the same training data set (5%) Use a random subset of the training points as pseudo inputs. Compare the result: Full-size GP 1/4-size SPGP 1/8-size SPGP Running Time (sec) 42.218 12.642 3.131 Mean Square Error 128.0459 129.0493 129.9594 1/8-size SPGP: 92.6% faster, match full GP performance Be able to train on 20%, and test on 10% data set.

Prediction error SVM regression SPGP regression MSE: 130.1335 MSE: 122.9167 Improved 5.46%

24-hour prediction Predicting Error LR 191.5258 SVR 93.2988 GP 90.2835

Summary Employed machine learning techniques to automatically model the function of predicting solar generation from weather forecast Used NREL National Solar Radiation Database to train and test the model Applied and compared three different models to solve the regression problem Gaussian processes achieved lowest prediction error among the three methods Limitations: SVM and GP are both time consuming. Computational complexity of GP is high due to matrix inversion, which can be resolved by the SPGP method

THANKS!