Modelling and forecasting of offshore wind power fluctuations with Markov-Switching models

Modelling and forecasting of offshore wind power fluctuations with Markov-Switching models 02433 - Hidden Markov Models Pierre-Julien Trombe, Martin Wæver Pedersen, Henrik Madsen Course week 10 MWP, compiled June 7, 2011

Offshore vs Onshore Wind power There is a large difference in the magnitude of the fluctuations in on and offshore wind power production. Since there is no smoothing effect in offshore environments fluctuations can become larger. Fluctuations at large offshore wind farms have a significant impact on the control and management strategies of their power output. Critical issue: keep a balance between power demand and power generation. The presence of different dynamics in the power fluctutaions indicates the need for regime switching models which can be formulated as HMMs. 2

Classical regime-switching approach In the classical regime-switching approach switches occur between three predefined intervals of the power range: low, medium and high power. This approach can be refined by combining HMMs and dynamical models. Power [% of Pn] 0 20 40 60 80 100 High power? Medium power? Low power? Aug 05 Aug 07 Aug 09 Aug 11 Aug 13 Time [year 2005] 3

Offshore wind power data Frequency Case study: Horns Rev I offshore windfarm, installed capacity 160 MW. 0 2000 4000 6000 8000 0 20 40 60 80 100 Power [% of Pn] - Time-series of wind power averaged over 10 minutes intervals. - 49536 data points (16 Feb 2005 to 26 Jan 2006). - 7885 (16%) missing data. - Bounded on [0,100] % of Nominal power. 3023 (6.1%) data points lie at max production. 870 (1.8%) data points lie a zero production. The power production fluctuates differently when at max (100% of Pn) or min (0% of Pn) capacity compared to when at mid capacity. This is indicated by the histogram above, where Pn is the nominal production capacity (installed capacity). 4

Heteroscedasticity of data The variance of the data is not homogeneous. This can be accounted for in Auto Regressive Conditional Heteroscedastic (ARCH) models or Generalized ARCH (GARCH) models. 0.4 Squared residuals 0.3 0.2 0.1 0.0 Sept 05 Oct 05 Nov 05 Dec 05 5

Model specification To model the stochastic behavior of a given wind power time series x t, a MS(s)-AR(r)-GARCH(p, q) model is proposed as follows: x t = θ (Ct ) 0 + r i=1 θ (Ct ) i x t i + ɛ t ɛ t N(0, σt 2 ) q σt 2 = a (Ct ) 0 + a (Ct ) j ɛ 2 t j + j=1 p i=1 b (Ct ) i σt i 2 with C t a first order Markov chain, which governs the switches between different parameter values in the dynamical models (AR or GARCH) in each regime. As always Γ is the probability transition matrix for C t. 6

Estimation scheme Numerical maximizaton of the likelihood is computationally infeasible in general. A Bayesian estimation approach with Markov Chain Monte Carlo (MCMC) is simpler. MCMC algorithm: loop until convergence over Sample the state sequence (Data Augmentation). Sample the transition probabilities (Dirichlet distribution). Sample the state dependent AR and GARCH parameters (Giddy-Gibbs sampler). Compute the Bayes estimates (posterior distributions). 7

Data Augmentation (1/2) For a given t in {1,..., T } and for all states i in the finite space {1,..., m}: Pr(C t = i C t, X (T ) = x (T ), Θ) = Pr(C(T ), X (T ) = x (T ), Θ) Pr(C t, X (T ) = x (T ), Θ) = Pr(X(T ) = x (T ) C (T ), Θ)Pr(C (T ), Θ) Pr(X (T ) = x (T ) C t, Θ)Pr(C t, Θ) = Pr(X(T ) = x (T ) C t, Θ)Pr(C t = i C t, Θ), Pr(X (T ) = x (T ) C t, Θ) where C t is the states at all times except time t. We therefore get Pr(C t = i C t, X (T ) = x (T ), Θ) Pr(X (T ) = x (T ) C (T ), Θ)Pr(C t = i C t, Θ). So, C t can be sampled using the Gibbs sampler. 8

Data augmentation (2/2) Pr(X (T ) = x (T ) C (T ), Θ) is, for a given sample of the state sequence C (T ), the likelihood function Pr(X (T ) = x (T ) C (T ), Θ) = = T Pr(X t = x t X (t 1) = x (t 1), C (T ) Θ) t=1 T t=1 where the Markov Property leads to ( ) 1 (xt θ(ct 0 + ) r ) i=1 exp θ(ct i x t i ) 2, 2πσ 2 t 2σt 2 Pr(C t = i C t, Θ) = Pr(C t = i C t 1 = j, C t+1 = k) γ ji γ ik = m i=1 γ. jiγ ik 9

Simulation of synthetic data Simulating data from a MS(2)-AR(2)-GARCH(1,1) process: Top is the log-variance and bottom is the simulated C t, i.e. the switching sequence. 5 0 5 10 15 0 500 1000 1500 2000 state 1.0 1.4 1.8 0 500 1000 1500 2000 10

Results, synthetic data True values Prior bounds Means Std. Dev. θ (1) 0 0.1 [-0.3 0.5] 0.080 0.050 θ (1) 1 0.3 [-0.2 0.7] 0.227 0.057 θ (1) 2-0.1 [-0.5 0.3] -0.074 0.057 a (1) 0 0.05 [0 0.2] 0.067 0.012 b (1) 1 0.65 [0.2 1] 0.547 0.044 a (1) 1 0.2 [0 0.5] 0.307 0.047 θ (2) 0 0.5 [0.1 0.9] 0.478 0.052 θ (2) 1 0.7 [0.2 1.3] 0.731 0.060 θ (2) 2 0.25 [-0.3 0.7] 0.183 0.056 a (2) 0 0.25 [0 0.35] 0.319 0.026 b (2) 1 0.8 [0 0.7] 0.670 0.020 a (2) 1 0.1 [0 0.5] 0.233 0.033 γ 11 0.98 [0 1] 0.977 0.005 γ 22 0.95 [0 1] 0.958 0.009 11

Posterior distributions of the AR coefficients intercept lag 1 lag 2 state 1 0 100 200 300 400 500 0 200 400 600 800 1000 0 200 400 600 800 1000 0.05 0.05 0.15 0.25 0.1 0.2 0.3 0.4 0.5 0.3 0.1 0.1 0.2 state 2 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 0 200 400 600 800 1000 0.3 0.4 0.5 0.6 0.7 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 12

Results: Regime characterization on wind power data Regime 1 has little process noise, regime 2 is the most common with moderate process noise, regime 3 accounts for spikes in the data. Pr. regime 3 Pr. regime 2 Pr. regime 1 Wind power 1 0.5 0 1 0.5 0 1 0.5 0 1 0.5 0 Inference on the regimes 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400 Time 13

Results: Point Forecast The performance of the point forecast is evaluated as the root mean square error (R) of the conditional expectation of the one-step-ahead observation µ t+1 = E(X t+1 X (t) = x (t) ), i.e. R = 1 T 1 T 1 (X t+1 µ t+1) 2, where R has the unit percentage of nominal capacity i.e. [% of Pn]. t=1 No. Model R [% of Pn] R [% of Pn] in sample out-of-sample 1 Persistence (naive) 4.62 4.49 2 AR(3)-GARCH(1,1) 4.65 4.48 3 MS(3)-AR(3)-GARCH(1,1) 4.49 4.29 Clearly the model allowing switches between different dynamical models (no. 3) has a lower out-of-sample R when compared to the two other modelling approaches. 14

Exercise a) Download and plot the wind data set (winddata.txt) from the course website. b) Fit AR models with varying number of lags to the data using R s built in arima function. c) Fit AR models with varying number of lags to the data by writing your own code. Ensure that your optimal likelihood value is similar to the one obtained with the arima function. d) Fit a two state MSAR model to the data by extending your code for fitting AR models. Assume γ 11 = γ 22 = 0.95. Compare the AIC for this model with the AIC for the simple AR model. 15

End of example 16