An Overview of Outlier Detection Techniques and Applications

Similar documents
Anomaly (outlier) detection. Huiping Cao, Anomaly 1

Anomaly Detection. Jing Gao. SUNY Buffalo

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

A graph based approach to semi-supervised learning

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

Unsupervised Anomaly Detection for High Dimensional Data

Anomaly Detection in Logged Sensor Data. Master s thesis in Complex Adaptive Systems JOHAN FLORBÄCK

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

FRaC: A Feature-Modeling Approach for Semi-Supervised and Unsupervised Anomaly Detection

Anomaly Detection via Over-sampling Principal Component Analysis

MASTER. Anomaly detection on event logs an unsupervised algorithm on ixr-messages. Severins, J.D. Award date: Link to publication

When Dictionary Learning Meets Classification

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Pointwise Exact Bootstrap Distributions of Cost Curves

Detection of Unauthorized Electricity Consumption using Machine Learning

Lecture 4 Discriminant Analysis, k-nearest Neighbors

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

Roberto Perdisci^+, Guofei Gu^, Wenke Lee^ presented by Roberto Perdisci. ^Georgia Institute of Technology, Atlanta, GA, USA

LECTURE NOTE #3 PROF. ALAN YUILLE

Anomaly Detection via Online Oversampling Principal Component Analysis

Linear and Logistic Regression. Dr. Xiaowei Huang

CC283 Intelligent Problem Solving 28/10/2013

An Introduction to Machine Learning

Graph-Based Anomaly Detection with Soft Harmonic Functions

Weak Signals: machine-learning meets extreme value theory

Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

SYSTEMATIC CONSTRUCTION OF ANOMALY DETECTION BENCHMARKS FROM REAL DATA. Outlier Detection And Description Workshop 2013

ECE 592 Topics in Data Science

Machine Learning Linear Classification. Prof. Matteo Matteucci

Model Accuracy Measures

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Logic and machine learning review. CS 540 Yingyu Liang

PATTERN RECOGNITION AND MACHINE LEARNING

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection. I.Priyanka 1. Research Scholar,Bharathiyar University. G.

NEAREST NEIGHBOR CLASSIFICATION WITH IMPROVED WEIGHTED DISSIMILARITY MEASURE

Anomaly Detection via Online Over-Sampling Principal Component Analysis

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Learning with multiple models. Boosting.

Mitosis Detection in Breast Cancer Histology Images with Multi Column Deep Neural Networks

Performance evaluation of binary classifiers

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

Anomaly Detection for the CERN Large Hadron Collider injection magnets

What is semi-supervised learning?

Be able to define the following terms and answer basic questions about them:

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems

ECE521 Lecture7. Logistic Regression

Global Scene Representations. Tilke Judd

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

International Journal of Scientific & Engineering Research, Volume 7, Issue 2, February ISSN

Kaviti Sai Saurab. October 23,2014. IIT Kanpur. Kaviti Sai Saurab (UCLA) Quantum methods in ML October 23, / 8

Scuola di Calcolo Scientifico con MATLAB (SCSM) 2017 Palermo 31 Luglio - 4 Agosto 2017

Iterative Laplacian Score for Feature Selection

Intelligence and statistics for rapid and robust earthquake detection, association and location

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

One-class Label Propagation Using Local Cone Based Similarity

Support Measure Data Description for Group Anomaly Detection

Learning Theory Continued

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

ENTROPY FILTER FOR ANOMALY DETECTION WITH EDDY CURRENT REMOTE FIELD SENSORS

Multi-criteria Anomaly Detection using Pareto Depth Analysis

Bayesian Decision Theory

Sparse representation classification and positive L1 minimization

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Performance Evaluation

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Introduction to Signal Detection and Classification. Phani Chavali

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Introduction to Statistical Inference

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

8. Classifier Ensembles for Changing Environments

Corners, Blobs & Descriptors. With slides from S. Lazebnik & S. Seitz, D. Lowe, A. Efros

Analysis of Spectral Kernel Design based Semi-supervised Learning

Predictive analysis on Multivariate, Time Series datasets using Shapelets

New version of the Spatial Data Modeler tool: ArcSDM 5. ArcSDM 5 Final Seminar May 4 th 2018, Rovaniemi

Quantum machine learning for quantum anomaly detection CQT AND SUTD, SINGAPORE ARXIV:

Machine Learning Concepts in Chemoinformatics

Machine Learning and Deep Learning! Vincent Lepetit!

Metric-based classifiers. Nuno Vasconcelos UCSD

Concentration-based Delta Check for Laboratory Error Detection

Jeff Howbert Introduction to Machine Learning Winter

Microarray Data Analysis: Discovery

Evaluating Electricity Theft Detectors in Smart Grid Networks. Group 5:

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Traction Diesel Engine Anomaly Detection Using Vibration Analysis in Octave Bands

Linear Classifiers as Pattern Detectors

Kernel Methods. Charles Elkan October 17, 2007

Performance Evaluation and Hypothesis Testing

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Classification and Pattern Recognition

IV Course Spring 14. Graduate Course. May 4th, Big Spatiotemporal Data Analytics & Visualization

Data Mining algorithms

Be able to define the following terms and answer basic questions about them:

Lecture 3 Classification, Logistic Regression

ANOMALY (or outlier) detection aims to identify a small

A Quantum-inspired version of the Nearest Mean Classifier

Transcription:

Machine Learning Rhein-Neckar Meetup An Overview of Outlier Detection Techniques and Applications Ying Gu connygy@gmail.com 28.02.2016

Anomaly/Outlier Detection What are anomalies/outliers? The set of data points that are considerably different than the remainder of the data anomaly On a scatter plot of the data, they lie far away from other data Goal of anomaly detection Find objects that are different from most other objects Anomaly detection, deviation detection, outlier detection, novelty detection...

Applications Fraud Detection Intrusion Detection Ecosystem Disturbances Medicine

Definition Definition of Hawkins [Hawkins 1980] An outlier is an observation that differs so much from other observations as to arouse suspicion that it was generated by a different mechanism Statistics-based intuition Normal data objects follow a generating mechanism, e.g. some given statistical process Abnormal objects deviate from this generating mechanism

Cause of Anomalies Data from different classes

Causes of Anomalies Natural Variation Data Measurement and Collection Errors

Type of Anomalies Point Anomalies Contextual Anomalies The two anomalies x1 and x2 are easily identifiable, x4 could also be an anomaly with respect to its direct neighborhood whereas x3 seems to be one of the normal instances.

Example of Contextual Anomalies Measurements of the average monthly temperatures in Germany over ten years from 2001 to 2010. (N=120 measurements). A histogram plot of the average monthly temperatures in Germany from 2001 to 2010. One obvious anomaly is one very hot month in the bin 22-23 C

Example of Contextual Anomalies Take the context into account. The following figure plotted the measurements with respect to the context time We can not only see the previously mentioned extreme outliers, but also two very mild winters in 2006 and 2007, which we could not observe previously.

Example of Contextual Anomalies We could add the month as a numerical attribute. A point anomaly detection algorithm would now be able to detect a few more outliers besides the extreme temperature values The month of October in 2003 turns out to be an anomaly with only 5.9 C on average, which is only half of the average temperature in that month.

Type of Anomalies Collective Anomalies A combination of multiple instances forms an anomaly, whereas each single one of these contributing instances is not necessarily an anomaly in itself. They are the most complex type of anomalies.

Approaches to Anomaly Detection Model-based Techniques Build a model for the data, the anomalies do not fit the model well. statistic methods, classifications Proximity-based techniques Define a proximity measure between objects, anomalous objects are those that are distant from most of the other objects. Also called distance-based outlier detection techniques. k-nn based methods Clustering-based techniques Outlier detection can be regarded as complementary of clusterings. k-means

The Use of Class Labels Supervised anomaly detection Uses normal instances and anomalies for training and testing. The model typically classifies the instances into two classes Semi-supervised anomaly detection Uses only normal data for training. The model should be able to detect deviations in the test data from that norm.

The Use of Class Labels Unsupervised anomaly detection Uses no labeling information at all. The algorithm only takes intrinsic information of the data into account in order to detect anomalous instances being different from the majority.

Output Binary class label, normal or anomalous (Early years) Numerical output (anomaly score), can be converted to binary output Used by semi-supervised and unsupervised algorithms The interpretation of outlier score depends on the algorithm. Some algorithms: only an ordering according to the scores make sense(ordinal scale) Other algorithms: reference point, 1.0 is normal behavior Probability (not possible with most algorithms), so the scores can be compared by different data set.

Approach - Detecting Outliers in Gaussian Distribution Gaussian Distribution has a nice property: p(x>μ+σ) and p(x<μ-σ)=31.73% p( x>μ+2σ) and p(x<μ-2σ)=4.5% p( x>μ+3σ) and p(x<μ-3σ)=0.27% In practice, define: x >μ+2σ, the x is an anomaly. This is the reason why outliers should not more than 5%

Approach - k-nn Distance-based Based on the distances to the neighbors The distance to the k-th Nearest-Neighbor is computed The average distance to all the k-nearest-neighbors is used as a score Nk(x): the k-nearest neighbor set d(x,o): the distance between x and o This approach is preferable, because it results in a much more robust local density estimation

Evaluation ROC Curve (Receiver Operating Characteristic) ROC curve can also be used in the evaluation of the algorithms Computation is the same as the typical ROC computation : TPR: True Positive Ratio FPR: False Positive Ratio Notice: The different rates are computed by varying the threshold of the outlier score. AUC (area under curve) is the integral of the ROC. It is the probability that an anomaly detection algorithm will assign a randomly chosen normal instance a lower score than an randomly chosen outlying instance. Hence, the AUC is a good good quality measure, the higher the value, the more likely it is that anomalies are detected.

Example of ROC Curves The blue ROC originates from the LOF algorithm applied on the 2D artificial dataset The red curve represents a random guessing approach and the green curve is the result of a perfect algorithm.

Time Series Anomaly Detection Anomalies are defined not by their own characteristics, but in contrast to what is normal. -- Ted Dunning Methods: statistic methods Turn the time series into points

Research Speed up the Algorithms Big Data (Cloudera Hadoop, Apache Flink) Evolving Algorithms Time can changing everything 1997 2015

Projects (DFKI) 1. Find the faking documents 2. Intrusion Detector (with Telecom) 3. Indoor location tracking (with VW) 4. Machine Learning for Self-propelled Harvesters (with CLAAS) a. Fault Detection and Maintenance Studies b. Productivity Studies

Outlier Detection with RapidMiner

Dataset 1 DFKI-artificial-3000: download: http://www.madm. eu/_media/downloads/dfki-artificial-3000unsupervised-ad.zip 3000 Records, 2 attributes outlier_label: outlier (37) normal (2963)

Dataset 1

Dataset 2 Diagnose breast cancer Download: https://dataverse.harvard. edu/api/access/datafile/2711924? format=original 30 Attributes, 367 Records Label: o = Outlier (Malignant, 10) n = Normal (Benign, 357)