A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

Similar documents
A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

arxiv: v1 [cs.sy] 12 Aug 2016

Residential Demand Response Targeting Using Machine Learning with Observational Data

STA 414/2104: Machine Learning

STA 4273H: Statistical Machine Learning

Linear Dynamical Systems

Linear Dynamical Systems (Kalman filter)

Master 2 Informatique Probabilistic Learning and Data Analysis

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Probabilistic Machine Learning

Statistical learning. Chapter 20, Sections 1 4 1

Learning from Sequential and Time-Series Data

Dynamic Approaches: The Hidden Markov Model

Regularized Regression A Bayesian point of view

Algorithmisches Lernen/Machine Learning

Introduction to Machine Learning CMU-10701

order is number of previous outputs

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Regression, Ridge Regression, Lasso

Recent Advances in Bayesian Inference Techniques

Advanced Data Science

Note Set 5: Hidden Markov Models

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Forecasting Wind Ramps

Non-Parametric Bayes

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Mixtures of Gaussians. Sargur Srihari

Hidden Markov Models

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Introduction to Machine Learning

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Robust Building Energy Load Forecasting Using Physically-Based Kernel Models

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Statistical Pattern Recognition

L11: Pattern recognition principles

Probabilistic Graphical Models

Lecture 3: Statistical Decision Theory (Part II)

Pattern Recognition and Machine Learning

p L yi z n m x N n xi

Linear Models for Regression

Chapter 14 Combining Models

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Expectation Maximization (EM)

CS Homework 3. October 15, 2009

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Machine Learning Techniques for Computer Vision

Machine Learning for Signal Processing Bayes Classification

Sparse Linear Models (10/7/13)

Based on slides by Richard Zemel

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

PATTERN RECOGNITION AND MACHINE LEARNING

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Lecture : Probabilistic Machine Learning

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Density Estimation. Seungjin Choi

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Predicting the Electricity Demand Response via Data-driven Inverse Optimization

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Statistical Pattern Recognition

Graphical Models for Collaborative Filtering

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Lecture 3: Pattern Classification

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

ECE521 Lecture 19 HMM cont. Inference in HMM

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Chapter 17: Undirected Graphical Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

A Sparse Linear Model and Significance Test. for Individual Consumption Prediction

Collaborative topic models: motivations cont

Hidden Markov Models in Language Processing

But if z is conditioned on, we need to model it:

Household Energy Disaggregation based on Difference Hidden Markov Model

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Machine Learning Linear Classification. Prof. Matteo Matteucci

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Lecture 4: Probabilistic Learning

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Switching-state Dynamical Modeling of Daily Behavioral Data

Bayesian Networks BY: MOHAMAD ALSABBAGH

Hidden Markov Models Part 2: Algorithms

Bayesian Machine Learning - Lecture 7

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Linear Models in Machine Learning

Probabilistic Reasoning in Deep Learning

Posterior Regularization

Bayesian Networks Inference with Probabilistic Graphical Models

Introduction to Graphical Models

Towards a Bayesian model for Cyber Security

Cheng Soon Ong & Christian Walder. Canberra February June 2018

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

Transcription:

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu September 30, 2016 1 / 15

History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2 / 15

History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2009: CAISO proposes Proxy Demand Resource Product to facilitate the participation of existing retail demand programs in the ISO market Proxy Demand Resources (PDRs) offer bids to reflect flexibility to adjust load in response to market schedules 2 / 15

History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2009: CAISO proposes Proxy Demand Resource Product to facilitate the participation of existing retail demand programs in the ISO market Proxy Demand Resources (PDRs) offer bids to reflect flexibility to adjust load in response to market schedules CPUC Resolution E-4728 (July 2015) approves an auction mechanism for demand response capacity, called the demand response auction mechanism (DRAM) (Residential) Demand Response Providers with the ability to aggregate customers capable of reducing load participate in the ISO day-ahead, real-time ancillary services market 2 / 15

History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2009: CAISO proposes Proxy Demand Resource Product to facilitate the participation of existing retail demand programs in the ISO market Proxy Demand Resources (PDRs) offer bids to reflect flexibility to adjust load in response to market schedules CPUC Resolution E-4728 (July 2015) approves an auction mechanism for demand response capacity, called the demand response auction mechanism (DRAM) (Residential) Demand Response Providers with the ability to aggregate customers capable of reducing load participate in the ISO day-ahead, real-time ancillary services market Utilities can obtain resource adequacy through residential resources Benchmarking of reduction potential by using consumption baselines 2 / 15

History of Residential Demand Response Measuring Reduction in Consumption Estimate the demand reduction by using suitable baselines (counterfactuals) Figure: Courtesy of Maximilian Balandat 3 / 15

Previous Analysis Previous Work 1 Short-term load forecasting on the individual household level 70 Mean Absolute Percentage Error by Method MAPE [%] 60 50 40 30 20 10 0 BL DT KNN L1 L2 OLS SVR 1 D. Zhou, M. Balandat, and C. Tomlin. Residential Demand Response Targeting Using Observational Data, 55th Conference on Decision and Control, December 2016 (to appear). 4 / 15

Previous Analysis Previous Work 1 Short-term load forecasting on the individual household level K-means clustering of daily electricity load shapes MAPE [%] 70 60 50 40 30 20 10 0 Mean Absolute Percentage Error by Method BL DT KNN L1 L2 OLS SVR Consumption Consumption 0.06 # 1 # 2 0.08 # 3 0.06 x5306 0.05 x4360 x1595 0.06 0.04 0.04 0.04 Consumption0.08 0.02 0.03 0.02 0 8 16 24 0 8 16 24 0 8 16 24 0.08 0.075 # 4 # 5 0.060 # 6 x1924 0.06 x2136 x5056 0.050 0.04 0.045 0.025 0.02 0.030 0 8 16 24 0 8 16 24 0 8 16 24 0.10 0.12 # 7 # 8 # 9 0.06 x4441 0.08 x2690 x2434 0.08 0.06 0.04 0.04 0.04 Consumption0.08 0.02 0.02 0 8 16 24 0 8 16 24 0 8 16 24 0.10 0.09 # 10 0.075 # 11 # 12 x724 x1119 0.08 x3033 0.06 0.050 0.06 0.03 0 8 16 24 Hour of Day 0.025 0 8 16 24 Hour of Day 0.04 0.02 0 8 16 24 Hour of Day 1 D. Zhou, M. Balandat, and C. Tomlin. Residential Demand Response Targeting Using Observational Data, 55th Conference on Decision and Control, December 2016 (to appear). 4 / 15

Previous Analysis Previous Work 1 Short-term load forecasting on the individual household level K-means clustering of daily electricity load shapes MAPE [%] 70 60 50 40 30 20 10 0 Mean Absolute Percentage Error by Method BL DT KNN L1 L2 OLS SVR Consumption Consumption 0.06 # 1 # 2 0.08 # 3 0.06 x5306 0.05 x4360 x1595 0.06 0.04 0.04 0.04 Consumption0.08 0.02 0.03 0.02 0 8 16 24 0 8 16 24 0 8 16 24 0.08 0.075 # 4 # 5 0.060 # 6 x1924 0.06 x2136 x5056 0.050 0.04 0.045 0.025 0.02 0.030 0 8 16 24 0 8 16 24 0 8 16 24 0.10 0.12 # 7 # 8 # 9 0.06 x4441 0.08 x2690 x2434 0.08 0.06 0.04 0.04 0.04 Consumption0.08 0.02 0.02 0 8 16 24 0 8 16 24 0 8 16 24 0.10 0.09 # 10 0.075 # 11 # 12 x724 x1119 0.08 x3033 0.06 0.050 0.06 0.03 0 8 16 24 Hour of Day 0.025 0 8 16 24 Hour of Day 0.04 0.02 0 8 16 24 Hour of Day Estimation of counterfactual consumption to estimate DR reduction Measured the variability of users with entropy Targeting: Users with high variability seem to reduce more during DR events 1 D. Zhou, M. Balandat, and C. Tomlin. Residential Demand Response Targeting Using Observational Data, 55th Conference on Decision and Control, December 2016 (to appear). 4 / 15

Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting 5 / 15

Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting Inclusion of physically motivated latent variables to improve prediction accuracy Key idea: Estimate whether a user has high or low consumption If consumption at t is high, it is probably high at t + 1, as well If consumption at t is low, it is probably low at t + 1, as well 5 / 15

Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting Inclusion of physically motivated latent variables to improve prediction accuracy Key idea: Estimate whether a user has high or low consumption If consumption at t is high, it is probably high at t + 1, as well If consumption at t is low, it is probably low at t + 1, as well Bayesian Perspective in STLF: Encode this consumption variable as an unobserved, latent variable Conditional Gaussian Mixture Model Hidden Markov Model 5 / 15

Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting Inclusion of physically motivated latent variables to improve prediction accuracy Key idea: Estimate whether a user has high or low consumption If consumption at t is high, it is probably high at t + 1, as well If consumption at t is low, it is probably low at t + 1, as well Bayesian Perspective in STLF: Encode this consumption variable as an unobserved, latent variable Conditional Gaussian Mixture Model Hidden Markov Model Forecasting Algorithms to estimate DR Reduction Numerical Results of DR Case Study 5 / 15

Machine Learning Methods Machine Learning for Short-Term Load Forecasting Regression Problem: y = f (X, β), where X is the covariate matrix 6 / 15

Machine Learning Methods Machine Learning for Short-Term Load Forecasting Regression Problem: y = f (X, β), where X is the covariate matrix Ordinary Least Squares ˆβ OLS = arg min β y X β 2 2 Ridge (L2) and Lasso (L1) Regression ˆβ L1 = arg min y X β 2 2 + λ β 1 ŷ L1 = X ˆβ L1 y ˆβ L1 = arg min y X β 2 2 + λ β 2 ŷ L2 = X ˆβ L2 y 6 / 15

Machine Learning Methods Machine Learning for Short-Term Load Forecasting Regression Problem: y = f (X, β), where X is the covariate matrix Ordinary Least Squares ˆβ OLS = arg min β y X β 2 2 Ridge (L2) and Lasso (L1) Regression ˆβ L1 = arg min y X β 2 2 + λ β 1 ŷ L1 = X ˆβ L1 y ˆβ L1 = arg min y X β 2 2 + λ β 2 ŷ L2 = X ˆβ L2 y k-nearest Neighbors Regression ŷ KNN = (y 1 +... + y k )/k Decision Tree Regression: Cross-validation on max. depth and min. samples per node Support Vector Regression: Cross-validation on slack variables 6 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures Mixture of Linear Regression Models Consider K linear regression models, each governed by own weight w k Assume a common noise variance σ 2 Mixture distribution with mixing proportions {π k }: K P(y w, σ 2, π) = π k N (y w k x, σ 2 ) k=1 7 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures Mixture of Linear Regression Models Consider K linear regression models, each governed by own weight w k Assume a common noise variance σ 2 Mixture distribution with mixing proportions {π k }: K P(y w, σ 2, π) = π k N (y w k x, σ 2 ) k=1 Idea: Interpret electricity consumption as a result of behavioral archetypes k = 1,..., K Expectation-Maximization Algorithm yields update rules: γ nk := E[z nk ] = E [ l(z x n, θ old ) ] = π kn (y n w k x n, σ 2 ) K j=1 π jn (y n w j x n, σ 2 ) π k = 1 N σ 2 = 1 N N γ nk, w k = [ X DX ] 1 X DY, D = diag(γ 1k,..., γ nk ) n=1 N K γ nk (y n w k x n ) 2 n=1 k=1 7 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures (cont d.) Mixture Models for Prediction IID assumption of Mixture Models cannot capture the temporal correlation in time series data Prediction Method 1: Convex combination of learners: ŷ = K ˆπ k ŵ k x k=1 8 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures (cont d.) Mixture Models for Prediction IID assumption of Mixture Models cannot capture the temporal correlation in time series data Prediction Method 1: Convex combination of learners: ŷ = K ˆπ k ŵ k x k=1 Prediction Method 2: If model i s covariates are spatially separated from model j: j = arg min x i x 2 1 i N K ŷ = γ jk ŵ k x k=1 8 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models Data often is not i.i.d. Use on sequential data, e.g. time series, speech recognition, DNA sequences 9 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models Data often is not i.i.d. Use on sequential data, e.g. time series, speech recognition, DNA sequences q 0 q 1 q 2... q T 1 q T y 0 y 1 y 2... y T 1 y T Figure: Hidden Markov Model. Hidden States q, Observations y 9 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models Data often is not i.i.d. Use on sequential data, e.g. time series, speech recognition, DNA sequences q 0 q 1 q 2... q T 1 q T y 0 y 1 y 2... y T 1 y T Figure: Hidden Markov Model. Hidden States q, Observations y Figure: 3 state latent variable, Gaussian emission model 9 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Elements of HMMs Hidden Layer: Transition between states is governed by a Markov Process a ij = P(q t = j q t 1 = i), a ij > 0, 1 i, j M, t = 0, 1, 2,..., N 10 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Elements of HMMs Hidden Layer: Transition between states is governed by a Markov Process a ij = P(q t = j q t 1 = i), a ij > 0, 1 i, j M, t = 0, 1, 2,..., N 0... 5 6H... 19H 20... 23 6L... 19L Figure: Markov Transition Diagram, 24 hour periodicity 10 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Elements of HMMs Hidden Layer: Transition between states is governed by a Markov Process a ij = P(q t = j q t 1 = i), a ij > 0, 1 i, j M, t = 0, 1, 2,..., N 0... 5 6H... 19H 20... 23 6L... 19L Figure: Markov Transition Diagram, 24 hour periodicity Observables: Hidden states emit observations according to some distribution, e.g. Gaussian: 1 P(y t = y q t = q) = exp ( (y µ q) 2 ) σ q 2π 2σ 2 q 10 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Parameter Estimation Maximum Likelihood Estimation and EM Algorithm Use Bayes Rule and conditional independencies of graphical model: P(q t y) =: α(q t)p(y t q t )β(q t ) P(y) P(q t, q t+1 y) = α(q t)β(q t+1 )a qt,q t+1 P(y t q t )P(y t+1 q t+1 ) P(y) Compute α(q t ), β(q t ) with the famous α-β-recursion 11 / 15

Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Parameter Estimation Maximum Likelihood Estimation and EM Algorithm Use Bayes Rule and conditional independencies of graphical model: P(q t y) =: α(q t)p(y t q t )β(q t ) P(y) P(q t, q t+1 y) = α(q t)β(q t+1 )a qt,q t+1 P(y t q t )P(y t+1 q t+1 ) P(y) Compute α(q t ), β(q t ) with the famous α-β-recursion Inference: Filtering, Predicting, Smoothing P(q t y 0,..., y t ) = α(q t)p(y t q t ) P(y 0,..., y t ) P(q t+1 y 0,..., y t ) = α(q t+1) P(y 0,..., y t ) P(q t y 0,..., y n ) = α(q t)p(y t q t )β(q t ), 0 < n < t P(y 0,..., y n ) 11 / 15

Experiments on Data HMM Case Study Application of HMM on electricity consumption of residential customers Fit a HMM on the electricity consumption of 273 customers Use estimated latent variable as an additional covariate for parametric regression on electricity consumption 12 / 15

Experiments on Data HMM Case Study Application of HMM on electricity consumption of residential customers Fit a HMM on the electricity consumption of 273 customers Use estimated latent variable as an additional covariate for parametric regression on electricity consumption 12 / 15

Experiments on Data HMM Case Study Application of HMM on electricity consumption of residential customers Fit a HMM on the electricity consumption of 273 customers Use estimated latent variable as an additional covariate for parametric regression on electricity consumption Covariates 5 previous hourly consumptions 5 previous hourly ambient air temperatures Model 1: Hour of Day, one-hot encoded Model 2: Hour of Day interacted with latent variable, one-hot encoded 12 / 15

Experiments on Data HMM Results of Case Study (cont d.) Comparison of Prediction Accuracy Mean Absolute Percentage Error (MAPE): MAPE = 1 N ŷ t y t N y t t=1 13 / 15

Experiments on Data HMM Results of Case Study (cont d.) Comparison of Prediction Accuracy Mean Absolute Percentage Error (MAPE): MAPE = 1 N ŷ t y t N y t 80 Green: Mean Red: Median 60 t=1 MAPEs by Model 40 20 OLS Mix. OLS+ KNNKNN+ DT DT+ SVR SVR+ Figure: MAPEs across 273 users for different ML methods and with / without latent variable 13 / 15

Experiments on Data HMM Results of Case Study (cont d.) Estimated Reductions During DR Hours by Hour of Day 0.3 0.2 0.1 0.0 0.1 0.2 0.05 0.00 0.05 0.10 0.15 Estimated Hourly DR Reductions with HMM vs. Placebo Tests, High State Automated Users Non-Automated Users 8 9 10 11 12 13 14 15 16 17 18 19 Hour of Day 8 9 10 11 12 13 14 15 16 17 18 19 Hour of Day 8 9 10 11 12 13 14 15 16 17 18 19 Estimated Hourly DR Reductions with HMM vs. Placebo Tests, Low State Automated Users Non-Automated Users 8 9 10 11 12 13 14 15 16 17 18 19 Figure: Boxplots for estimated DR reduction by hour of the day. Top row: High consumption, Bottom Row: Low consumption, Left Column: Automated users, Right Column: Non-automated users 14 / 15

Thank You! The End Questions? 15 / 15