PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou

PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou

Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable Goal: Creating automatic prediction models Predicting future solar power intensity given weather forecasts.

Data Source NREL National Solar Radiation Database 1991 2010 Hourly weather and solar intensity data for 20 years Station: ST LOUIS LAMBERT INT'L ARPT, MO Input: (combination of 9 weather metrics) Date Time Opaque Sky Cover Dry-bulb Temperature Dew-point Temperature Relative Humidity Station Pressure Wind Speed Liquid Precipitation Depth Output: Amount of solar radiation (Wh/m 2 ) received in a collimated beam on a surface normal to the sun

Method Regression: To learn a mapping from some input space X = R n of n-dimensional vectors to an output space Y = R n of real-valued targets Linear least squares regression Support vector regression (SVR) using multiple kernel functions Gaussian processes

Linear Model y = f X = X T a + e where y R n : measurement (solar intensity) X T R n p+1 : each row is a p-dimensional input a R p+1 : unknown coefficient e R n : random noise Loss function(square error): y y 2 minimize y y 2 = minimize y X T a 2

Linear Model Generated Prediction Model from training set: SolarIntensity = 171.02 9.9119*Date 33.881*Time 50.509*SkyCover + 580.45*Temp 522.17*DewPoint + 102.89*Humidity + 33.533*Pressure + 33.715*Wind Speed 3.7929*Precipitation Applying on the Test set, Prediction mean square error: 217.6391

Support Vector Regression Given training data {(x 1, y 1 ), (x 2, y 2 ) (x n, y n )} Linear ε-svr Model: f x = w, x + b = w T x + b minimize 1 2 w 2 +C (ξ i +ξ i ) i subject to y i f(x i ) ε + ξ i f(x i ) y i ε + ξ i ξ i, ξ i 0 Loss function: (epsilon intensive) ξ ε 0 ξ ε if ξ ε otherwise.

Dual Problem Construct a Lagrange function from the objective function and the corresponding constraints optimal solution: w = i α i α i x i thus, f(x) = i α i α i x i, x + b dual optimization problem: maximize 1 2 i,j α i α i α j α j x i, x j ε α i + α i + i i y i α i α i subject to α i α i = 0 i and α i, α i 0, C

Kernel Trick for SVR the kernel trick is a way of mapping observations from a general set S (Input space) into an inner product space V (high dimensional feature space) Φ: R n R m m n ω = α i α i ϕ x i i f(x) = α i α i k(x i, x) + b i where k x i, x = ϕ x i, ϕ x.

SVR Model Using matlab LibSVM library Cross Validation to select appropriate kernel function and parameters. Optimal selection: Radial Basis Function(RBF) kernel: K x, z = exp(γ x z 2 ) Cost = 17.1828 (trade-off) Epsilon ε = 0.001 (the width of the ε-insensitive zone) Gamma γ = 0.177828

Data Total number of data points: 24(h)*365(d)*20(y) = 175,200 20% of data to train (around 4 years of training data) Training data sets size: 35,064 Input: 35064 9 matrix Output: 30564 1 vector 10% of data to test Test data sets size: 17,532

Prediction error Linear regression SVM regression MSE: 215.7884 MSE: 130.1537 Improved 39.68%

Principal Component Analysis (PCA) Some weather metrics correlate strongly Such as: Temperature & Time of the day Applying PCA to remove redundant information

PCA The eigenvalues represent the distribution of the source data s energy among each of the eigenvectors, where the eigenvectors form a basis for the data λ9=0.0034

Errors applying PCA Feature Set dimension Training error Test error 9 122.0676 130.1537 8 122.1306 130.1335 7 131.7175 137.3897 6 138.8723 142.9340 5 145.5919 149.2520 4 244.2318 248.5261 3 255.2125 256.5611 2 258.0127 258.0480 1 321.8330 316.8262

Gaussian Processes Given training set D {(x i, y i ) i = 1,, n} GP regression model: y i = f x i + ε i, where noise ε i ~N(0, σ 2 I) Assume a zero mean GP prior distribution over inference functions f. In particular, f x 1,..., f x n ~N 0, K, where K is the covariance function, or kernel, which specifies the covariance between pairs of random variables. K p,q = Cov(f x p, f x q ) = K(x p, x q )

Gaussian Processes To make predictions y at test points X, where y = f X + ε According to GP prior, joint distribution of f and f : f f ~ N 0, K X, X K X, X K X, X K X, X From i.i.d. noise assumption: ε ε ~ N 0, It follows that where σ 2 I 0 0 σ 2 I p y D, X = N(μ, Σ) μ = K X, X [K(X, X) + σ 2 I] 1 y Σ = K X, X K X, X K X, X + σ 2 I 1 K X, X.

GP Model Matlab GPML library Use 5% of data to train, 5% data to test Apply PCA to reduce the input dimension to eight Choose ARD covariance function: K x p, x q = c 2 exp( 1 2 xp x q ) T P 1 x p x q, hyper-parameters: θ = c, P Optimize the marginal likelihood: p θ X, y p y X, θ = N 0, K X, X + σ 2 I θ ML = argmax θ Optimal hyper-parameters selection: σ = 0.1475, c = 0.45175, P 11 = 25.2541, P 22 = 2.2544, P 33 = 1.4701, P 44 = 53.5211, P 55 = 0.3354, P 66 = 1.6194, P 77 = 6.8729, P 88 = 4.2863 p y X, θ

Sparse Pseudo-input GP (SPGP) GPs are prohibitive for large data sets due to the inversion of the covariance matrix. Consider a model parameterized by a pseudo data set D of size m n, where n is the number of real data points. Reduce training cost from O n 3 to O m 2 n, and prediction cost from O n 2 to O m 2

Sparse Pseudo-input GP (SPGP) Pseudo data set D: X = x i i=1 m, f = f i i=1 m Prior on pseudo targets: p f X = N(0, K M ) Likelihood: p y x, X, f = N K x T K M 1 f, K xx K x T K M 1 K x + σ 2 Posterior distribution over f : p f D, X = N K M Q M 1 K MN (Λ + σ 2 I) 1 y, K M Q M 1 K M where Q M = K M + K MN (Λ + σ 2 I) 1 K NM Given new input x, the predictive distribution: p y D, X = dfp y x, X, f p f D, X = N μ, Σ where μ = K T Q M 1 K MN (Λ + σ 2 I) 1 y Σ = K K T K M 1 Q M 1 K + σ 2.

SPGP Further apply SPGPs on the same training data set (5%) Use a random subset of the training points as pseudo inputs. Compare the result: Full-size GP 1/4-size SPGP 1/8-size SPGP Running Time (sec) 42.218 12.642 3.131 Mean Square Error 128.0459 129.0493 129.9594 1/8-size SPGP: 92.6% faster, match full GP performance Be able to train on 20%, and test on 10% data set.

Prediction error SVM regression SPGP regression MSE: 130.1335 MSE: 122.9167 Improved 5.46%

24-hour prediction Predicting Error LR 191.5258 SVR 93.2988 GP 90.2835

Summary Employed machine learning techniques to automatically model the function of predicting solar generation from weather forecast Used NREL National Solar Radiation Database to train and test the model Applied and compared three different models to solve the regression problem Gaussian processes achieved lowest prediction error among the three methods Limitations: SVM and GP are both time consuming. Computational complexity of GP is high due to matrix inversion, which can be resolved by the SPGP method

THANKS!