A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

Size: px
Start display at page:

Download "A Bayesian Perspective on Residential Demand Response Using Smart Meter Data"

Transcription

1 A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, September 30, / 15

2 History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2 / 15

3 History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2009: CAISO proposes Proxy Demand Resource Product to facilitate the participation of existing retail demand programs in the ISO market Proxy Demand Resources (PDRs) offer bids to reflect flexibility to adjust load in response to market schedules 2 / 15

4 History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2009: CAISO proposes Proxy Demand Resource Product to facilitate the participation of existing retail demand programs in the ISO market Proxy Demand Resources (PDRs) offer bids to reflect flexibility to adjust load in response to market schedules CPUC Resolution E-4728 (July 2015) approves an auction mechanism for demand response capacity, called the demand response auction mechanism (DRAM) (Residential) Demand Response Providers with the ability to aggregate customers capable of reducing load participate in the ISO day-ahead, real-time ancillary services market 2 / 15

5 History of Residential Demand Response Residential Demand Response Background Reliable grid operation is dependent on adequate supply and flexible resources 2009: CAISO proposes Proxy Demand Resource Product to facilitate the participation of existing retail demand programs in the ISO market Proxy Demand Resources (PDRs) offer bids to reflect flexibility to adjust load in response to market schedules CPUC Resolution E-4728 (July 2015) approves an auction mechanism for demand response capacity, called the demand response auction mechanism (DRAM) (Residential) Demand Response Providers with the ability to aggregate customers capable of reducing load participate in the ISO day-ahead, real-time ancillary services market Utilities can obtain resource adequacy through residential resources Benchmarking of reduction potential by using consumption baselines 2 / 15

6 History of Residential Demand Response Measuring Reduction in Consumption Estimate the demand reduction by using suitable baselines (counterfactuals) Figure: Courtesy of Maximilian Balandat 3 / 15

7 Previous Analysis Previous Work 1 Short-term load forecasting on the individual household level 70 Mean Absolute Percentage Error by Method MAPE [%] BL DT KNN L1 L2 OLS SVR 1 D. Zhou, M. Balandat, and C. Tomlin. Residential Demand Response Targeting Using Observational Data, 55th Conference on Decision and Control, December 2016 (to appear). 4 / 15

8 Previous Analysis Previous Work 1 Short-term load forecasting on the individual household level K-means clustering of daily electricity load shapes MAPE [%] Mean Absolute Percentage Error by Method BL DT KNN L1 L2 OLS SVR Consumption Consumption 0.06 # 1 # # x x4360 x Consumption # 4 # # 6 x x2136 x # 7 # 8 # x x2690 x Consumption # # 11 # 12 x724 x x Hour of Day Hour of Day Hour of Day 1 D. Zhou, M. Balandat, and C. Tomlin. Residential Demand Response Targeting Using Observational Data, 55th Conference on Decision and Control, December 2016 (to appear). 4 / 15

9 Previous Analysis Previous Work 1 Short-term load forecasting on the individual household level K-means clustering of daily electricity load shapes MAPE [%] Mean Absolute Percentage Error by Method BL DT KNN L1 L2 OLS SVR Consumption Consumption 0.06 # 1 # # x x4360 x Consumption # 4 # # 6 x x2136 x # 7 # 8 # x x2690 x Consumption # # 11 # 12 x724 x x Hour of Day Hour of Day Hour of Day Estimation of counterfactual consumption to estimate DR reduction Measured the variability of users with entropy Targeting: Users with high variability seem to reduce more during DR events 1 D. Zhou, M. Balandat, and C. Tomlin. Residential Demand Response Targeting Using Observational Data, 55th Conference on Decision and Control, December 2016 (to appear). 4 / 15

10 Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting 5 / 15

11 Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting Inclusion of physically motivated latent variables to improve prediction accuracy Key idea: Estimate whether a user has high or low consumption If consumption at t is high, it is probably high at t + 1, as well If consumption at t is low, it is probably low at t + 1, as well 5 / 15

12 Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting Inclusion of physically motivated latent variables to improve prediction accuracy Key idea: Estimate whether a user has high or low consumption If consumption at t is high, it is probably high at t + 1, as well If consumption at t is low, it is probably low at t + 1, as well Bayesian Perspective in STLF: Encode this consumption variable as an unobserved, latent variable Conditional Gaussian Mixture Model Hidden Markov Model 5 / 15

13 Agenda Today s Talk Agenda Overview of Machine Learning tools for short-term load forecasting Inclusion of physically motivated latent variables to improve prediction accuracy Key idea: Estimate whether a user has high or low consumption If consumption at t is high, it is probably high at t + 1, as well If consumption at t is low, it is probably low at t + 1, as well Bayesian Perspective in STLF: Encode this consumption variable as an unobserved, latent variable Conditional Gaussian Mixture Model Hidden Markov Model Forecasting Algorithms to estimate DR Reduction Numerical Results of DR Case Study 5 / 15

14 Machine Learning Methods Machine Learning for Short-Term Load Forecasting Regression Problem: y = f (X, β), where X is the covariate matrix 6 / 15

15 Machine Learning Methods Machine Learning for Short-Term Load Forecasting Regression Problem: y = f (X, β), where X is the covariate matrix Ordinary Least Squares ˆβ OLS = arg min β y X β 2 2 Ridge (L2) and Lasso (L1) Regression ˆβ L1 = arg min y X β λ β 1 ŷ L1 = X ˆβ L1 y ˆβ L1 = arg min y X β λ β 2 ŷ L2 = X ˆβ L2 y 6 / 15

16 Machine Learning Methods Machine Learning for Short-Term Load Forecasting Regression Problem: y = f (X, β), where X is the covariate matrix Ordinary Least Squares ˆβ OLS = arg min β y X β 2 2 Ridge (L2) and Lasso (L1) Regression ˆβ L1 = arg min y X β λ β 1 ŷ L1 = X ˆβ L1 y ˆβ L1 = arg min y X β λ β 2 ŷ L2 = X ˆβ L2 y k-nearest Neighbors Regression ŷ KNN = (y y k )/k Decision Tree Regression: Cross-validation on max. depth and min. samples per node Support Vector Regression: Cross-validation on slack variables 6 / 15

17 Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures Mixture of Linear Regression Models Consider K linear regression models, each governed by own weight w k Assume a common noise variance σ 2 Mixture distribution with mixing proportions {π k }: K P(y w, σ 2, π) = π k N (y w k x, σ 2 ) k=1 7 / 15

18 Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures Mixture of Linear Regression Models Consider K linear regression models, each governed by own weight w k Assume a common noise variance σ 2 Mixture distribution with mixing proportions {π k }: K P(y w, σ 2, π) = π k N (y w k x, σ 2 ) k=1 Idea: Interpret electricity consumption as a result of behavioral archetypes k = 1,..., K Expectation-Maximization Algorithm yields update rules: γ nk := E[z nk ] = E [ l(z x n, θ old ) ] = π kn (y n w k x n, σ 2 ) K j=1 π jn (y n w j x n, σ 2 ) π k = 1 N σ 2 = 1 N N γ nk, w k = [ X DX ] 1 X DY, D = diag(γ 1k,..., γ nk ) n=1 N K γ nk (y n w k x n ) 2 n=1 k=1 7 / 15

19 Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures (cont d.) Mixture Models for Prediction IID assumption of Mixture Models cannot capture the temporal correlation in time series data Prediction Method 1: Convex combination of learners: ŷ = K ˆπ k ŵ k x k=1 8 / 15

20 Bayesian Perspective: Latent Variable Models Latent Variable Models - Mixtures (cont d.) Mixture Models for Prediction IID assumption of Mixture Models cannot capture the temporal correlation in time series data Prediction Method 1: Convex combination of learners: ŷ = K ˆπ k ŵ k x k=1 Prediction Method 2: If model i s covariates are spatially separated from model j: j = arg min x i x 2 1 i N K ŷ = γ jk ŵ k x k=1 8 / 15

21 Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models Data often is not i.i.d. Use on sequential data, e.g. time series, speech recognition, DNA sequences 9 / 15

22 Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models Data often is not i.i.d. Use on sequential data, e.g. time series, speech recognition, DNA sequences q 0 q 1 q 2... q T 1 q T y 0 y 1 y 2... y T 1 y T Figure: Hidden Markov Model. Hidden States q, Observations y 9 / 15

23 Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models Data often is not i.i.d. Use on sequential data, e.g. time series, speech recognition, DNA sequences q 0 q 1 q 2... q T 1 q T y 0 y 1 y 2... y T 1 y T Figure: Hidden Markov Model. Hidden States q, Observations y Figure: 3 state latent variable, Gaussian emission model 9 / 15

24 Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Elements of HMMs Hidden Layer: Transition between states is governed by a Markov Process a ij = P(q t = j q t 1 = i), a ij > 0, 1 i, j M, t = 0, 1, 2,..., N 10 / 15

25 Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Elements of HMMs Hidden Layer: Transition between states is governed by a Markov Process a ij = P(q t = j q t 1 = i), a ij > 0, 1 i, j M, t = 0, 1, 2,..., N H... 19H L... 19L Figure: Markov Transition Diagram, 24 hour periodicity 10 / 15

26 Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Elements of HMMs Hidden Layer: Transition between states is governed by a Markov Process a ij = P(q t = j q t 1 = i), a ij > 0, 1 i, j M, t = 0, 1, 2,..., N H... 19H L... 19L Figure: Markov Transition Diagram, 24 hour periodicity Observables: Hidden states emit observations according to some distribution, e.g. Gaussian: 1 P(y t = y q t = q) = exp ( (y µ q) 2 ) σ q 2π 2σ 2 q 10 / 15

27 Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Parameter Estimation Maximum Likelihood Estimation and EM Algorithm Use Bayes Rule and conditional independencies of graphical model: P(q t y) =: α(q t)p(y t q t )β(q t ) P(y) P(q t, q t+1 y) = α(q t)β(q t+1 )a qt,q t+1 P(y t q t )P(y t+1 q t+1 ) P(y) Compute α(q t ), β(q t ) with the famous α-β-recursion 11 / 15

28 Bayesian Perspective: Latent Variable Models Latent Variable Models - Hidden Markov Models (cont d.) Parameter Estimation Maximum Likelihood Estimation and EM Algorithm Use Bayes Rule and conditional independencies of graphical model: P(q t y) =: α(q t)p(y t q t )β(q t ) P(y) P(q t, q t+1 y) = α(q t)β(q t+1 )a qt,q t+1 P(y t q t )P(y t+1 q t+1 ) P(y) Compute α(q t ), β(q t ) with the famous α-β-recursion Inference: Filtering, Predicting, Smoothing P(q t y 0,..., y t ) = α(q t)p(y t q t ) P(y 0,..., y t ) P(q t+1 y 0,..., y t ) = α(q t+1) P(y 0,..., y t ) P(q t y 0,..., y n ) = α(q t)p(y t q t )β(q t ), 0 < n < t P(y 0,..., y n ) 11 / 15

29 Experiments on Data HMM Case Study Application of HMM on electricity consumption of residential customers Fit a HMM on the electricity consumption of 273 customers Use estimated latent variable as an additional covariate for parametric regression on electricity consumption 12 / 15

30 Experiments on Data HMM Case Study Application of HMM on electricity consumption of residential customers Fit a HMM on the electricity consumption of 273 customers Use estimated latent variable as an additional covariate for parametric regression on electricity consumption 12 / 15

31 Experiments on Data HMM Case Study Application of HMM on electricity consumption of residential customers Fit a HMM on the electricity consumption of 273 customers Use estimated latent variable as an additional covariate for parametric regression on electricity consumption Covariates 5 previous hourly consumptions 5 previous hourly ambient air temperatures Model 1: Hour of Day, one-hot encoded Model 2: Hour of Day interacted with latent variable, one-hot encoded 12 / 15

32 Experiments on Data HMM Results of Case Study (cont d.) Comparison of Prediction Accuracy Mean Absolute Percentage Error (MAPE): MAPE = 1 N ŷ t y t N y t t=1 13 / 15

33 Experiments on Data HMM Results of Case Study (cont d.) Comparison of Prediction Accuracy Mean Absolute Percentage Error (MAPE): MAPE = 1 N ŷ t y t N y t 80 Green: Mean Red: Median 60 t=1 MAPEs by Model OLS Mix. OLS+ KNNKNN+ DT DT+ SVR SVR+ Figure: MAPEs across 273 users for different ML methods and with / without latent variable 13 / 15

34 Experiments on Data HMM Results of Case Study (cont d.) Estimated Reductions During DR Hours by Hour of Day Estimated Hourly DR Reductions with HMM vs. Placebo Tests, High State Automated Users Non-Automated Users Hour of Day Hour of Day Estimated Hourly DR Reductions with HMM vs. Placebo Tests, Low State Automated Users Non-Automated Users Figure: Boxplots for estimated DR reduction by hour of the day. Top row: High consumption, Bottom Row: Low consumption, Left Column: Automated users, Right Column: Non-automated users 14 / 15

35 Thank You! The End Questions? 15 / 15

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Fifty-fourth Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 27-3, 216 A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong Zhou, Maximilian Balandat,

More information

arxiv: v1 [cs.sy] 12 Aug 2016

arxiv: v1 [cs.sy] 12 Aug 2016 A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong Zhou, Maximilian Balandat, and Claire Tomlin arxiv:168.3862v1 [cs.sy] 12 Aug 216 Abstract The widespread deployment of

More information

Residential Demand Response Targeting Using Machine Learning with Observational Data

Residential Demand Response Targeting Using Machine Learning with Observational Data Residential Demand Response Targeting Using Machine Learning with Observational Data Datong Zhou, Maximilian Balandat, and Claire Tomlin providers receive for helping to fulfill Resource Adequacy requirements).

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Master 2 Informatique Probabilistic Learning and Data Analysis

Master 2 Informatique Probabilistic Learning and Data Analysis Master 2 Informatique Probabilistic Learning and Data Analysis Faicel Chamroukhi Maître de Conférences USTV, LSIS UMR CNRS 7296 email: chamroukhi@univ-tln.fr web: chamroukhi.univ-tln.fr 2013/2014 Faicel

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Learning from Sequential and Time-Series Data

Learning from Sequential and Time-Series Data Learning from Sequential and Time-Series Data Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/? Sequential and Time-Series Data Many real-world applications

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Regularized Regression A Bayesian point of view

Regularized Regression A Bayesian point of view Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Advanced Data Science

Advanced Data Science Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Robust Building Energy Load Forecasting Using Physically-Based Kernel Models

Robust Building Energy Load Forecasting Using Physically-Based Kernel Models Article Robust Building Energy Load Forecasting Using Physically-Based Kernel Models Anand Krishnan Prakash 1, Susu Xu 2, *,, Ram Rajagopal 3 and Hae Young Noh 2 1 Energy Science, Technology and Policy,

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

CS Homework 3. October 15, 2009

CS Homework 3. October 15, 2009 CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Predicting the Electricity Demand Response via Data-driven Inverse Optimization

Predicting the Electricity Demand Response via Data-driven Inverse Optimization Predicting the Electricity Demand Response via Data-driven Inverse Optimization Workshop on Demand Response and Energy Storage Modeling Zagreb, Croatia Juan M. Morales 1 1 Department of Applied Mathematics,

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

ECE521 Lecture 19 HMM cont. Inference in HMM

ECE521 Lecture 19 HMM cont. Inference in HMM ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

A Sparse Linear Model and Significance Test. for Individual Consumption Prediction

A Sparse Linear Model and Significance Test. for Individual Consumption Prediction A Sparse Linear Model and Significance Test 1 for Individual Consumption Prediction Pan Li, Baosen Zhang, Yang Weng, and Ram Rajagopal arxiv:1511.01853v3 [stat.ml] 21 Feb 2017 Abstract Accurate prediction

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Household Energy Disaggregation based on Difference Hidden Markov Model

Household Energy Disaggregation based on Difference Hidden Markov Model 1 Household Energy Disaggregation based on Difference Hidden Markov Model Ali Hemmatifar (ahemmati@stanford.edu) Manohar Mogadali (manoharm@stanford.edu) Abstract We aim to address the problem of energy

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Switching-state Dynamical Modeling of Daily Behavioral Data

Switching-state Dynamical Modeling of Daily Behavioral Data Switching-state Dynamical Modeling of Daily Behavioral Data Randy Ardywibowo Shuai Huang Cao Xiao Shupeng Gui Yu Cheng Ji Liu Xiaoning Qian Texas A&M University University of Washington IBM T.J. Watson

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Towards a Bayesian model for Cyber Security

Towards a Bayesian model for Cyber Security Towards a Bayesian model for Cyber Security Mark Briers (mbriers@turing.ac.uk) Joint work with Henry Clausen and Prof. Niall Adams (Imperial College London) 27 September 2017 The Alan Turing Institute

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017

39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 1 / 36 Joint work

More information