A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu September 30, 2016 1 / 15
History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2 / 15
History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2009: CAISO proposes Proxy Demand Resource Product to facilitate the participation of existing retail demand programs in the ISO market Proxy Demand Resources (PDRs) offer bids to reflect flexibility to adjust load in response to market schedules 2 / 15
History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2009: CAISO proposes Proxy Demand Resource Product to facilitate the participation of existing retail demand programs in the ISO market Proxy Demand Resources (PDRs) offer bids to reflect flexibility to adjust load in response to market schedules CPUC Resolution E-4728 (July 2015) approves an auction mechanism for demand response capacity, called the demand response auction mechanism (DRAM) (Residential) Demand Response Providers with the ability to aggregate customers capable of reducing load participate in the ISO day-ahead, real-time ancillary services market 2 / 15
History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2009: CAISO proposes Proxy Demand Resource Product to facilitate the participation of existing retail demand programs in the ISO market Proxy Demand Resources (PDRs) offer bids to reflect flexibility to adjust load in response to market schedules CPUC Resolution E-4728 (July 2015) approves an auction mechanism for demand response capacity, called the demand response auction mechanism (DRAM) (Residential) Demand Response Providers with the ability to aggregate customers capable of reducing load participate in the ISO day-ahead, real-time ancillary services market Utilities can obtain resource adequacy through residential resources Benchmarking of reduction potential by using consumption baselines 2 / 15
History of Residential Demand Response Measuring Reduction in Consumption Estimate the demand reduction by using suitable baselines (counterfactuals) Figure: Courtesy of Maximilian Balandat 3 / 15
Previous Analysis Previous Work 1 Short-term load forecasting on the individual household level 70 Mean Absolute Percentage Error by Method MAPE [%] 60 50 40 30 20 10 0 BL DT KNN L1 L2 OLS SVR 1 D. Zhou, M. Balandat, and C. Tomlin. Residential Demand Response Targeting Using Observational Data, 55th Conference on Decision and Control, December 2016 (to appear). 4 / 15
Previous Analysis Previous Work 1 Short-term load forecasting on the individual household level K-means clustering of daily electricity load shapes MAPE [%] 70 60 50 40 30 20 10 0 Mean Absolute Percentage Error by Method BL DT KNN L1 L2 OLS SVR Consumption Consumption 0.06 # 1 # 2 0.08 # 3 0.06 x5306 0.05 x4360 x1595 0.06 0.04 0.04 0.04 Consumption0.08 0.02 0.03 0.02 0 8 16 24 0 8 16 24 0 8 16 24 0.08 0.075 # 4 # 5 0.060 # 6 x1924 0.06 x2136 x5056 0.050 0.04 0.045 0.025 0.02 0.030 0 8 16 24 0 8 16 24 0 8 16 24 0.10 0.12 # 7 # 8 # 9 0.06 x4441 0.08 x2690 x2434 0.08 0.06 0.04 0.04 0.04 Consumption0.08 0.02 0.02 0 8 16 24 0 8 16 24 0 8 16 24 0.10 0.09 # 10 0.075 # 11 # 12 x724 x1119 0.08 x3033 0.06 0.050 0.06 0.03 0 8 16 24 Hour of Day 0.025 0 8 16 24 Hour of Day 0.04 0.02 0 8 16 24 Hour of Day 1 D. Zhou, M. Balandat, and C. Tomlin. Residential Demand Response Targeting Using Observational Data, 55th Conference on Decision and Control, December 2016 (to appear). 4 / 15
Previous Analysis Previous Work 1 Short-term load forecasting on the individual household level K-means clustering of daily electricity load shapes MAPE [%] 70 60 50 40 30 20 10 0 Mean Absolute Percentage Error by Method BL DT KNN L1 L2 OLS SVR Consumption Consumption 0.06 # 1 # 2 0.08 # 3 0.06 x5306 0.05 x4360 x1595 0.06 0.04 0.04 0.04 Consumption0.08 0.02 0.03 0.02 0 8 16 24 0 8 16 24 0 8 16 24 0.08 0.075 # 4 # 5 0.060 # 6 x1924 0.06 x2136 x5056 0.050 0.04 0.045 0.025 0.02 0.030 0 8 16 24 0 8 16 24 0 8 16 24 0.10 0.12 # 7 # 8 # 9 0.06 x4441 0.08 x2690 x2434 0.08 0.06 0.04 0.04 0.04 Consumption0.08 0.02 0.02 0 8 16 24 0 8 16 24 0 8 16 24 0.10 0.09 # 10 0.075 # 11 # 12 x724 x1119 0.08 x3033 0.06 0.050 0.06 0.03 0 8 16 24 Hour of Day 0.025 0 8 16 24 Hour of Day 0.04 0.02 0 8 16 24 Hour of Day Estimation of counterfactual consumption to estimate DR reduction Measured the variability of users with entropy Targeting: Users with high variability seem to reduce more during DR events 1 D. Zhou, M. Balandat, and C. Tomlin. Residential Demand Response Targeting Using Observational Data, 55th Conference on Decision and Control, December 2016 (to appear). 4 / 15
Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting 5 / 15
Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting Inclusion of physically motivated latent variables to improve prediction accuracy Key idea: Estimate whether a user has high or low consumption If consumption at t is high, it is probably high at t + 1, as well If consumption at t is low, it is probably low at t + 1, as well 5 / 15
Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting Inclusion of physically motivated latent variables to improve prediction accuracy Key idea: Estimate whether a user has high or low consumption If consumption at t is high, it is probably high at t + 1, as well If consumption at t is low, it is probably low at t + 1, as well Bayesian Perspective in STLF: Encode this consumption variable as an unobserved, latent variable Conditional Gaussian Mixture Model Hidden Markov Model 5 / 15
Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting Inclusion of physically motivated latent variables to improve prediction accuracy Key idea: Estimate whether a user has high or low consumption If consumption at t is high, it is probably high at t + 1, as well If consumption at t is low, it is probably low at t + 1, as well Bayesian Perspective in STLF: Encode this consumption variable as an unobserved, latent variable Conditional Gaussian Mixture Model Hidden Markov Model Forecasting Algorithms to estimate DR Reduction Numerical Results of DR Case Study 5 / 15
Machine Learning Methods Machine Learning for Short-Term Load Forecasting Regression Problem: y = f (X, β), where X is the covariate matrix 6 / 15
Machine Learning Methods Machine Learning for Short-Term Load Forecasting Regression Problem: y = f (X, β), where X is the covariate matrix Ordinary Least Squares ˆβ OLS = arg min β y X β 2 2 Ridge (L2) and Lasso (L1) Regression ˆβ L1 = arg min y X β 2 2 + λ β 1 ŷ L1 = X ˆβ L1 y ˆβ L1 = arg min y X β 2 2 + λ β 2 ŷ L2 = X ˆβ L2 y 6 / 15
Machine Learning Methods Machine Learning for Short-Term Load Forecasting Regression Problem: y = f (X, β), where X is the covariate matrix Ordinary Least Squares ˆβ OLS = arg min β y X β 2 2 Ridge (L2) and Lasso (L1) Regression ˆβ L1 = arg min y X β 2 2 + λ β 1 ŷ L1 = X ˆβ L1 y ˆβ L1 = arg min y X β 2 2 + λ β 2 ŷ L2 = X ˆβ L2 y k-nearest Neighbors Regression ŷ KNN = (y 1 +... + y k )/k Decision Tree Regression: Cross-validation on max. depth and min. samples per node Support Vector Regression: Cross-validation on slack variables 6 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures Mixture of Linear Regression Models Consider K linear regression models, each governed by own weight w k Assume a common noise variance σ 2 Mixture distribution with mixing proportions {π k }: K P(y w, σ 2, π) = π k N (y w k x, σ 2 ) k=1 7 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures Mixture of Linear Regression Models Consider K linear regression models, each governed by own weight w k Assume a common noise variance σ 2 Mixture distribution with mixing proportions {π k }: K P(y w, σ 2, π) = π k N (y w k x, σ 2 ) k=1 Idea: Interpret electricity consumption as a result of behavioral archetypes k = 1,..., K Expectation-Maximization Algorithm yields update rules: γ nk := E[z nk ] = E [ l(z x n, θ old ) ] = π kn (y n w k x n, σ 2 ) K j=1 π jn (y n w j x n, σ 2 ) π k = 1 N σ 2 = 1 N N γ nk, w k = [ X DX ] 1 X DY, D = diag(γ 1k,..., γ nk ) n=1 N K γ nk (y n w k x n ) 2 n=1 k=1 7 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures (cont d.) Mixture Models for Prediction IID assumption of Mixture Models cannot capture the temporal correlation in time series data Prediction Method 1: Convex combination of learners: ŷ = K ˆπ k ŵ k x k=1 8 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures (cont d.) Mixture Models for Prediction IID assumption of Mixture Models cannot capture the temporal correlation in time series data Prediction Method 1: Convex combination of learners: ŷ = K ˆπ k ŵ k x k=1 Prediction Method 2: If model i s covariates are spatially separated from model j: j = arg min x i x 2 1 i N K ŷ = γ jk ŵ k x k=1 8 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models Data often is not i.i.d. Use on sequential data, e.g. time series, speech recognition, DNA sequences 9 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models Data often is not i.i.d. Use on sequential data, e.g. time series, speech recognition, DNA sequences q 0 q 1 q 2... q T 1 q T y 0 y 1 y 2... y T 1 y T Figure: Hidden Markov Model. Hidden States q, Observations y 9 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models Data often is not i.i.d. Use on sequential data, e.g. time series, speech recognition, DNA sequences q 0 q 1 q 2... q T 1 q T y 0 y 1 y 2... y T 1 y T Figure: Hidden Markov Model. Hidden States q, Observations y Figure: 3 state latent variable, Gaussian emission model 9 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Elements of HMMs Hidden Layer: Transition between states is governed by a Markov Process a ij = P(q t = j q t 1 = i), a ij > 0, 1 i, j M, t = 0, 1, 2,..., N 10 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Elements of HMMs Hidden Layer: Transition between states is governed by a Markov Process a ij = P(q t = j q t 1 = i), a ij > 0, 1 i, j M, t = 0, 1, 2,..., N 0... 5 6H... 19H 20... 23 6L... 19L Figure: Markov Transition Diagram, 24 hour periodicity 10 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Elements of HMMs Hidden Layer: Transition between states is governed by a Markov Process a ij = P(q t = j q t 1 = i), a ij > 0, 1 i, j M, t = 0, 1, 2,..., N 0... 5 6H... 19H 20... 23 6L... 19L Figure: Markov Transition Diagram, 24 hour periodicity Observables: Hidden states emit observations according to some distribution, e.g. Gaussian: 1 P(y t = y q t = q) = exp ( (y µ q) 2 ) σ q 2π 2σ 2 q 10 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Parameter Estimation Maximum Likelihood Estimation and EM Algorithm Use Bayes Rule and conditional independencies of graphical model: P(q t y) =: α(q t)p(y t q t )β(q t ) P(y) P(q t, q t+1 y) = α(q t)β(q t+1 )a qt,q t+1 P(y t q t )P(y t+1 q t+1 ) P(y) Compute α(q t ), β(q t ) with the famous α-β-recursion 11 / 15
Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Parameter Estimation Maximum Likelihood Estimation and EM Algorithm Use Bayes Rule and conditional independencies of graphical model: P(q t y) =: α(q t)p(y t q t )β(q t ) P(y) P(q t, q t+1 y) = α(q t)β(q t+1 )a qt,q t+1 P(y t q t )P(y t+1 q t+1 ) P(y) Compute α(q t ), β(q t ) with the famous α-β-recursion Inference: Filtering, Predicting, Smoothing P(q t y 0,..., y t ) = α(q t)p(y t q t ) P(y 0,..., y t ) P(q t+1 y 0,..., y t ) = α(q t+1) P(y 0,..., y t ) P(q t y 0,..., y n ) = α(q t)p(y t q t )β(q t ), 0 < n < t P(y 0,..., y n ) 11 / 15
Experiments on Data HMM Case Study Application of HMM on electricity consumption of residential customers Fit a HMM on the electricity consumption of 273 customers Use estimated latent variable as an additional covariate for parametric regression on electricity consumption 12 / 15
Experiments on Data HMM Case Study Application of HMM on electricity consumption of residential customers Fit a HMM on the electricity consumption of 273 customers Use estimated latent variable as an additional covariate for parametric regression on electricity consumption 12 / 15
Experiments on Data HMM Case Study Application of HMM on electricity consumption of residential customers Fit a HMM on the electricity consumption of 273 customers Use estimated latent variable as an additional covariate for parametric regression on electricity consumption Covariates 5 previous hourly consumptions 5 previous hourly ambient air temperatures Model 1: Hour of Day, one-hot encoded Model 2: Hour of Day interacted with latent variable, one-hot encoded 12 / 15
Experiments on Data HMM Results of Case Study (cont d.) Comparison of Prediction Accuracy Mean Absolute Percentage Error (MAPE): MAPE = 1 N ŷ t y t N y t t=1 13 / 15
Experiments on Data HMM Results of Case Study (cont d.) Comparison of Prediction Accuracy Mean Absolute Percentage Error (MAPE): MAPE = 1 N ŷ t y t N y t 80 Green: Mean Red: Median 60 t=1 MAPEs by Model 40 20 OLS Mix. OLS+ KNNKNN+ DT DT+ SVR SVR+ Figure: MAPEs across 273 users for different ML methods and with / without latent variable 13 / 15
Experiments on Data HMM Results of Case Study (cont d.) Estimated Reductions During DR Hours by Hour of Day 0.3 0.2 0.1 0.0 0.1 0.2 0.05 0.00 0.05 0.10 0.15 Estimated Hourly DR Reductions with HMM vs. Placebo Tests, High State Automated Users Non-Automated Users 8 9 10 11 12 13 14 15 16 17 18 19 Hour of Day 8 9 10 11 12 13 14 15 16 17 18 19 Hour of Day 8 9 10 11 12 13 14 15 16 17 18 19 Estimated Hourly DR Reductions with HMM vs. Placebo Tests, Low State Automated Users Non-Automated Users 8 9 10 11 12 13 14 15 16 17 18 19 Figure: Boxplots for estimated DR reduction by hour of the day. Top row: High consumption, Bottom Row: Low consumption, Left Column: Automated users, Right Column: Non-automated users 14 / 15
Thank You! The End Questions? 15 / 15