PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou

Size: px

Start display at page:

Download "PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou"

Carol Craig
5 years ago
Views:

1 PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou

2 Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable Goal: Creating automatic prediction models Predicting future solar power intensity given weather forecasts.

3 Data Source NREL National Solar Radiation Database Hourly weather and solar intensity data for 20 years Station: ST LOUIS LAMBERT INT'L ARPT, MO Input: (combination of 9 weather metrics) Date Time Opaque Sky Cover Dry-bulb Temperature Dew-point Temperature Relative Humidity Station Pressure Wind Speed Liquid Precipitation Depth Output: Amount of solar radiation (Wh/m 2 ) received in a collimated beam on a surface normal to the sun

4 Method Regression: To learn a mapping from some input space X = R n of n-dimensional vectors to an output space Y = R n of real-valued targets Linear least squares regression Support vector regression (SVR) using multiple kernel functions Gaussian processes

5 Linear Model y = f X = X T a + e where y R n : measurement (solar intensity) X T R n p+1 : each row is a p-dimensional input a R p+1 : unknown coefficient e R n : random noise Loss function(square error): y y 2 minimize y y 2 = minimize y X T a 2

6 Linear Model Generated Prediction Model from training set: SolarIntensity = *Date *Time *SkyCover *Temp *DewPoint *Humidity *Pressure *Wind Speed *Precipitation Applying on the Test set, Prediction mean square error:

7 Support Vector Regression Given training data {(x 1, y 1 ), (x 2, y 2 ) (x n, y n )} Linear ε-svr Model: f x = w, x + b = w T x + b minimize 1 2 w 2 +C (ξ i +ξ i ) i subject to y i f(x i ) ε + ξ i f(x i ) y i ε + ξ i ξ i, ξ i 0 Loss function: (epsilon intensive) ξ ε 0 ξ ε if ξ ε otherwise.

8 Dual Problem Construct a Lagrange function from the objective function and the corresponding constraints optimal solution: w = i α i α i x i thus, f(x) = i α i α i x i, x + b dual optimization problem: maximize 1 2 i,j α i α i α j α j x i, x j ε α i + α i + i i y i α i α i subject to α i α i = 0 i and α i, α i 0, C

9 Kernel Trick for SVR the kernel trick is a way of mapping observations from a general set S (Input space) into an inner product space V (high dimensional feature space) Φ: R n R m m n ω = α i α i ϕ x i i f(x) = α i α i k(x i, x) + b i where k x i, x = ϕ x i, ϕ x.

10 SVR Model Using matlab LibSVM library Cross Validation to select appropriate kernel function and parameters. Optimal selection: Radial Basis Function(RBF) kernel: K x, z = exp(γ x z 2 ) Cost = (trade-off) Epsilon ε = (the width of the ε-insensitive zone) Gamma γ =

11 Data Total number of data points: 24(h)*365(d)*20(y) = 175,200 20% of data to train (around 4 years of training data) Training data sets size: 35,064 Input: matrix Output: vector 10% of data to test Test data sets size: 17,532

12 Prediction error Linear regression SVM regression MSE: MSE: Improved 39.68%

13 Principal Component Analysis (PCA) Some weather metrics correlate strongly Such as: Temperature & Time of the day Applying PCA to remove redundant information

14 PCA The eigenvalues represent the distribution of the source data s energy among each of the eigenvectors, where the eigenvectors form a basis for the data λ9=0.0034

15 Errors applying PCA Feature Set dimension Training error Test error

16 Gaussian Processes Given training set D {(x i, y i ) i = 1,, n} GP regression model: y i = f x i + ε i, where noise ε i ~N(0, σ 2 I) Assume a zero mean GP prior distribution over inference functions f. In particular, f x 1,..., f x n ~N 0, K, where K is the covariance function, or kernel, which specifies the covariance between pairs of random variables. K p,q = Cov(f x p, f x q ) = K(x p, x q )

17 Gaussian Processes To make predictions y at test points X, where y = f X + ε According to GP prior, joint distribution of f and f : f f ~ N 0, K X, X K X, X K X, X K X, X From i.i.d. noise assumption: ε ε ~ N 0, It follows that where σ 2 I 0 0 σ 2 I p y D, X = N(μ, Σ) μ = K X, X [K(X, X) + σ 2 I] 1 y Σ = K X, X K X, X K X, X + σ 2 I 1 K X, X.

18 GP Model Matlab GPML library Use 5% of data to train, 5% data to test Apply PCA to reduce the input dimension to eight Choose ARD covariance function: K x p, x q = c 2 exp( 1 2 xp x q ) T P 1 x p x q, hyper-parameters: θ = c, P Optimize the marginal likelihood: p θ X, y p y X, θ = N 0, K X, X + σ 2 I θ ML = argmax θ Optimal hyper-parameters selection: σ = , c = , P 11 = , P 22 = , P 33 = , P 44 = , P 55 = , P 66 = , P 77 = , P 88 = p y X, θ

19 Sparse Pseudo-input GP (SPGP) GPs are prohibitive for large data sets due to the inversion of the covariance matrix. Consider a model parameterized by a pseudo data set D of size m n, where n is the number of real data points. Reduce training cost from O n 3 to O m 2 n, and prediction cost from O n 2 to O m 2

20 Sparse Pseudo-input GP (SPGP) Pseudo data set D: X = x i i=1 m, f = f i i=1 m Prior on pseudo targets: p f X = N(0, K M ) Likelihood: p y x, X, f = N K x T K M 1 f, K xx K x T K M 1 K x + σ 2 Posterior distribution over f : p f D, X = N K M Q M 1 K MN (Λ + σ 2 I) 1 y, K M Q M 1 K M where Q M = K M + K MN (Λ + σ 2 I) 1 K NM Given new input x, the predictive distribution: p y D, X = dfp y x, X, f p f D, X = N μ, Σ where μ = K T Q M 1 K MN (Λ + σ 2 I) 1 y Σ = K K T K M 1 Q M 1 K + σ 2.

21 SPGP Further apply SPGPs on the same training data set (5%) Use a random subset of the training points as pseudo inputs. Compare the result: Full-size GP 1/4-size SPGP 1/8-size SPGP Running Time (sec) Mean Square Error /8-size SPGP: 92.6% faster, match full GP performance Be able to train on 20%, and test on 10% data set.

22 Prediction error SVM regression SPGP regression MSE: MSE: Improved 5.46%

23 24-hour prediction Predicting Error LR SVR GP

24 Summary Employed machine learning techniques to automatically model the function of predicting solar generation from weather forecast Used NREL National Solar Radiation Database to train and test the model Applied and compared three different models to solve the regression problem Gaussian processes achieved lowest prediction error among the three methods Limitations: SVM and GP are both time consuming. Computational complexity of GP is high due to matrix inversion, which can be resolved by the SPGP method

25 THANKS!

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The