Discovery Through Situational Awareness

Similar documents
Hunting for Anomalies in PMU Data

Data Mining Based Anomaly Detection In PMU Measurements And Event Detection

6.036 midterm review. Wednesday, March 18, 15

Course in Data Science

Applying Machine Learning for Gravitational-wave Burst Data Analysis

Introduction to Machine Learning

Learning with multiple models. Boosting.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Predictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Delta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto

Predicting flight on-time performance

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction to Machine Learning and Cross-Validation

Multi-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

Click Prediction and Preference Ranking of RSS Feeds

CS 6375 Machine Learning

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Intelligence and statistics for rapid and robust earthquake detection, association and location

A novel machine learning model to predict abnormal Runway Occupancy Times and observe related precursors

David John Gagne II, NCAR

How to do backpropagation in a brain

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Structure-Activity Modeling - QSAR. Uwe Koch

Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes. December 2, 2012

Math 6330: Statistical Consulting Class 5

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

Application and Challenges of Artificial Intelligence in Exploration

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Anomaly Detection in Logged Sensor Data. Master s thesis in Complex Adaptive Systems JOHAN FLORBÄCK

Holdout and Cross-Validation Methods Overfitting Avoidance

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes

Unsupervised Anomaly Detection for High Dimensional Data

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Chart types and when to use them

Understanding Travel Time to Airports in New York City Sierra Gentry Dominik Schunack

Nonlinear Classification

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Unsupervised Learning Methods

VBM683 Machine Learning

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Performance Evaluation

Gaussian Process Regression with K-means Clustering for Very Short-Term Load Forecasting of Individual Buildings at Stanford

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Sparse Approximation and Variable Selection

DANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence

Learning Theory Continued

ECE 5424: Introduction to Machine Learning

15 Introduction to Data Mining

CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE

INTRODUCTION TO DATA SCIENCE

Data science and engineering for local weather forecasts. Nikhil R Podduturi Data {Scientist, Engineer} November, 2016

IE598 Big Data Optimization Introduction

W vs. QCD Jet Tagging at the Large Hadron Collider

The Perceptron. Volker Tresp Summer 2016

Support Vector Machine. Industrial AI Lab.

CÁTEDRA ENDESA DE LA UNIVERSIDAD DE SEVILLA

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Will it rain tomorrow?

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io

Hierarchical models for the rainfall forecast DATA MINING APPROACH

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

A Community Gridded Atmospheric Forecast System for Calibrated Solar Irradiance

ECE 5984: Introduction to Machine Learning

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Brief Introduction of Machine Learning Techniques for Content Analysis

Learning Air Traffic Controller Workload. from Past Sector Operations

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Bagging and Other Ensemble Methods

Advanced Statistical Methods: Beyond Linear Regression

Predicting MTA Bus Arrival Times in New York City

Species Distribution Modeling

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Support Vector Machines

The Union of Intersections (UoI) Method for Interpretable Data Driven Discovery and Prediction Kristofer E. Bouchard 1,2,3*, Alejandro Bujan 3,

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Bayesian Networks Inference with Probabilistic Graphical Models

ECS289: Scalable Machine Learning

CSCI-567: Machine Learning (Spring 2019)

Lecture 4: Feed Forward Neural Networks

Electrical and Computer Engineering Department University of Waterloo Canada

Machine Learning 4771

ANN based techniques for prediction of wind speed of 67 sites of India

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Gapfilling of EC fluxes

Training the linear classifier

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Algorithms for Classification: The Basic Methods

Dynamic Data-Driven Adaptive Sampling and Monitoring of Big Spatial-Temporal Data Streams for Real-Time Solar Flare Detection

Lecture 2: Linear Algebra Review

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Statistical aspects of prediction models with high-dimensional data

Reward-modulated inference

Transcription:

Discovery Through Situational Awareness BRETT AMIDAN JIM FOLLUM NICK BETZSOLD TIM YIN (UNIVERSITY OF WYOMING) SHIKHAR PANDEY (WASHINGTON STATE UNIVERSITY) Pacific Northwest National Laboratory February 21, 2018 1

Project Scope Work is being performed under the DOE GMLC (Grid Modernization Laboratory Consortium) program Create a tool that applies statistical and machine-learning algorithms in context of big data analytics to investigate and implement anomaly and event detection algorithms in near real-time Current Focus Working with the Eastern Interconnect Initial focus on phase angle pair analyses Provide the EI partners with a frequent (i.e. daily or weekly) report of the findings

Power Grid Statistical Analytics: Our Historical Journey Aircraft safety Morning Report w/ NASA Analytics Using State Estimator Data w/ EI Data Investigations Using PMU Data (uncovering data quality issues, etc.) GMLC and Beyond Machine learning basis Many additional data streams Predictive analytics DISAT Data Integrity Situational Awareness Tool (PMU Data Analytics w/ BPA)

Pre-Processing Steps Read raw Phasor Measurement Unit (PMU) data Develop and then use data quality filters to clean poor quality data Frequency Remove bad data 1 day of 60 Hz PMU data (54 PMUs) = 26 GB February 21, 2018 b.amidan@pnnl.gov 4

Feature Extraction (Data Signatures) Regression fits through the data calculate estimates of value, slope, curvature (acceleration), and noise. Can be calculated in the presence of missing or data quality flagged values. Summaries of these features are used in the analyses. February 21, 2018 b.amidan@pnnl.gov 5

Multivariate Baselining Baseline captures what normal behavior is expected to be Group similar behavior Time periods that group together indicate normal grid behavior Variables that group together indicate highly correlated variables and may be candidates for feature reduction Identify data that does not belong with the normal behavior Time period contains data that is unusual (possible abnormal grid behavior) Variable is unlike other variables, or something has happened to indicate a behavioral change in the variable February 21, 2018 6

Creating a Baseline Unsupervised Learning Training Data: Historical PMU Data Baselining Learning Algorithm Real Time PMU Data Model Class 1 Class 2 Class 3 Class 4 Class 5 February 21, 2018 7

Identifying Data Driven Atypical Events Using multivariate statistical techniques to establish baselines of typical behavior, atypical moments in time can be discovered and the variables responsible can be identified. February 21, 2018 8

Recent Atypicality Example Processed 8 months of PMU data (15 PMUs) Focused on phase angle pairs Spatial plot showing phase angle pairs (red means contributing to the atypicality) February 21, 2018 9

Phase Angle Pairs Clustering Unsupervised learning (clustering) used to determine which variables are most similar during Time Period A. Proximity on tree indicates similarity February 21, 2018 b.amidan@pnnl.gov 10

Phase Angle Pairs Clustering Time Period B (two months later) Phase Angle Pair #2 is no longer like Pair #1. Why? February 21, 2018 b.amidan@pnnl.gov 11

Supervised Learning Training Data: Historical PMU Data Baselining Learning Algorithm Class 1 Class 2 Class 3 Class 4 Weather Normal Voltage Drop Fault Class 5 Maintenance Real Time PMU Data Predictive Model Weather Normal Voltage Drop Fault Maintenance February 21, 2018 12

Specificity 40 50 60 70 80 90 100 Event Classification A list of 199 frequency events was obtained during an 8 month time frame. The following supervised learning techniques were studied: DT Decision Tree RF Random Forest SVM Support Vector Machine ANN Neural Network GLM GLM Net (Lasso) AB Adaptive Boosting GBM Gradient Boosting SVM Example Frequency Event RF AB GBM DT GLM Sensitivity probability of correctly identifying frequency events. Specificity probability of correctly identifying non-frequency events. ANN 96 97 98 99 100 Sensitivity February 21, 2018 13

Next Steps Put our anomaly detection and oscillation algorithms into an EPG prototypical tool to be used on Eastern Interconnect PMU Data Review findings and tune algorithms Continue machine learning approach to find events and patterns, including other power grid data and other data streams like weather, social media, etc. Use spatial statistical techniques to take advantage of spatial relationships Apply predictive analytics February 21, 2018 14