Machine Learning Approaches to Network Anomaly Detection

Similar documents
Mining Anomalies Using Traffic Feature Distributions

Decomposable Principal Component Analysis

Anomaly Extraction in Backbone Networks using Association Rules

References for online kernel methods

SENATUS: An Approach to Joint Traffic Anomaly Detection and Root Cause Analysis

Diagnosing Network-Wide Traffic Anomalies

SPECTRAL ANOMALY DETECTION USING GRAPH-BASED FILTERING FOR WIRELESS SENSOR NETWORKS. Hilmi E. Egilmez and Antonio Ortega

Anomaly (outlier) detection. Huiping Cao, Anomaly 1

On the Generalization of the Mahalanobis Distance

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

A Comparative Study of Two Network-based Anomaly Detection Methods

Quantum machine learning for quantum anomaly detection CQT AND SUTD, SINGAPORE ARXIV:

EXAMINING OUTLIER DETECTION PERFORMANCE FOR PRINCIPAL COMPONENTS ANALYSIS METHOD AND ITS ROBUSTIFICATION METHODS

Bregman Divergences. Barnabás Póczos. RLAI Tea Talk UofA, Edmonton. Aug 5, 2008

Clustering and Model Integration under the Wasserstein Metric. Jia Li Department of Statistics Penn State University

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies

Roberto Perdisci^+, Guofei Gu^, Wenke Lee^ presented by Roberto Perdisci. ^Georgia Institute of Technology, Atlanta, GA, USA

Detection and Classification of Anomalies in Network Traffic Using Generalized Entropies and OC-SVM with Mahalanobis Kernel

Mixture Models and EM

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Framework for Adaptive Anomaly Detection Based on Support Vector Data Description

Anomaly Detection. Jing Gao. SUNY Buffalo

Streaming multiscale anomaly detection

Anomaly detection and. in time series

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

An Overview of Outlier Detection Techniques and Applications

A Robust Anomaly Detection Technique Using Combined Statistical Methods

Adding rigor to the comparison of anomaly detector outputs

Unsupervised Anomaly Detection for High Dimensional Data

Iterative Laplacian Score for Feature Selection

Geometric entropy minimization

A Data-driven Approach for Remaining Useful Life Prediction of Critical Components

Entropic graphs for high dimensional data analysis. A Hero, Newton08

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

ADMIRE: Anomaly Detection Method Using Entropy-based PCA with Three-step Sketches

An Automatic and Dynamic Parameter Tuning of a Statistic-based Anomaly Detection Algorithm

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

Final Exam, Machine Learning, Spring 2009

Anomaly Detection via Over-sampling Principal Component Analysis

Multiple Similarities Based Kernel Subspace Learning for Image Classification

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Multi-criteria Anomaly Detection using Pareto Depth Analysis

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Non-parametric Methods

P-Channels: Robust Multivariate M-Estimation of Large Datasets

Chart types and when to use them

CPSC 340: Machine Learning and Data Mining. Linear Least Squares Fall 2016

Value Function Approximation through Sparse Bayesian Modeling

Using Generalized Quantile Sets and FDR

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Linear Models for Regression. Sargur Srihari

Impact of Packet Sampling on Portscan Detection

Using Image Moment Invariants to Distinguish Classes of Geographical Shapes

Change Detection in Multivariate Data

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

BasisDetect : A Model-based Network Event Detection Framework

Statistical Learning. Dong Liu. Dept. EEIS, USTC

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Curve Fitting Re-visited, Bishop1.2.5

Unsupervised Data Discretization of Mixed Data Types

A NEW ONE-CLASS SVM FOR ANOMALY DETECTION. Yuting Chen Jing Qian Venkatesh Saligrama. Boston University Boston, MA, 02215

Censor, Sketch, and Validate for Learning from Large-Scale Data

Support Measure Data Description for Group Anomaly Detection

Internet Traffic Mid-Term Forecasting: a Pragmatic Approach using Statistical Analysis Tools

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University

Detection of Unauthorized Electricity Consumption using Machine Learning

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

STEADY-STATE MEAN SQUARE PERFORMANCE OF A SPARSIFIED KERNEL LEAST MEAN SQUARE ALGORITHM.

Stochastic Online Anomaly Analysis for Streaming Time Series

WE consider principal component analysis (PCA)

One-class Label Propagation Using Local Cone Based Similarity

Modeling Residual-Geometric Flow Sampling

Anomaly Detection in Large-Scale Data Stream Networks

Research Article Comparative Features Extraction Techniques for Electrocardiogram Images Regression

Rapid Object Recognition from Discriminative Regions of Interest

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

An online kernel-based clustering approach for value function approximation

Statistical Machine Learning

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Virtual Sensors and Large-Scale Gaussian Processes

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Household Energy Disaggregation based on Difference Hidden Markov Model

Kaviti Sai Saurab. October 23,2014. IIT Kanpur. Kaviti Sai Saurab (UCLA) Quantum methods in ML October 23, / 8

Automated Sensor Verification using Outlier Detection in the Internet of Things

A signal processing approach to anomaly detection in networks. Kavé Salamatian Université de Savoie

Sparsity and Morphological Diversity in Source Separation. Jérôme Bobin IRFU/SEDI-Service d Astrophysique CEA Saclay - France

Preserving Privacy in Data Mining using Data Distortion Approach

Randomized Algorithms

Learning an Optimal Policy for Police Resource Allocation on Freeways

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION

Astroinformatics in the data-driven Astronomy

Transcription:

Machine Learning Approaches to Network Anomaly Detection Tarem Ahmed, Boris Oreshkin and Mark Coates tarem.ahmed@mail.mcgill.ca, boris.oreshkin@mail.mcgill.ca, coates@ece.mcgill.ca USENIX SysML, Cambridge, MA April 1, 27 Research supported by Canadian National Science and Engineering Research Council (NSERC) through the Agile All- Photonics Research Network (AAPN) and MITACS projects

Introduction Network data are voluminous high-dimensional high-rate What is a network anomaly? rare event short-lived ML-based network anomaly detection methods more general than model-based signature-based

Our Methodology Show applicability of ML approaches to network anomaly detection Two example algorithms: One-Class Neighbor Machine (OCNM) [Muñoz 6] Kernel-based Online Anomaly Detection (KOAD) [Ahmed 7] Two example datasets: Transports Quebec Abilene Log of frequency 1 5 1 5 1 5 1 2 3 1 2 3 1 2 3 Hash of srcip t=214 t=216 t=218

One-Class Neighbor Machine (OCNM) Region of normality should correspond to a Minimum Volume Set (MVS) OCNM for estimating MVS proposed in [Muñoz 6], Anomalous Normal Requires choice of sparsity measure, g. Example: k-th nearest-neighbour distance Sorts list of g, identifies fraction µ inside MVS 2-D Isomap of CHIN-LOSA backbone flow, srcip

Kernel-based Online Anomaly Detection (KOAD): Introduction Sequence of multivariate measurements: {x t } t=1:t Choose feature space with associated kernel: where ( x, x ) = ( x ), ( x ) k ϕ ϕ i j i j ϕ: x n ϕ x ( ) H Then feature vectors corresponding to normal traffic measurements should cluster

Kernel-based Online Anomaly Detection (KOAD): Dictionary Should be possible to describe region of normality in feature space using sparse dictionary, D { } j j= 1 = x M Feature vector is said to be approximately linearly independent on if [Engel 4]: m t 1 ϕ x t ( ) δ = min a φ( x ) φ( x ) > ν t j j t a j= 1 { ϕ ( x )} j j= 1 2 M (1) Dictionary approximation Threshold

Kernel-based Online Anomaly Detection (KOAD): Algorithm At timestep t with arriving input vector x t : δ t Evaluate according to (1), compare with and where δ ν1 ν 2 ν1 < ν 2 If, infer x t far from normality: Red1 t If t 1, raise Orange, resolve l timesteps later, after usefulness test δ < ν If, infer x t close to normality: Green t > ν 2 δ > ν 1 Delete obsolete elements, use exponential forgetting For details of KOAD see [Ahmed 7]

Dataset 1: Transports Quebec Cameras monitored

Sample Images (normal) Camera 1 Camera 2 Camera 3 Camera 4 Camera 5 Camera 6

Sample Images (anomalous) Camera 1 Camera 2 Camera 3 Camera 4 Camera 5 Camera 6

Feature Extraction: Discrete Wavelet Transform -5 α i (n), db -1-15 -2-25 2 4 6 8 1 Timestep At each timestep, at each camera, get 6-D wavelet feature vector

Transports Quebec Results α i, db -5-1 -15-2 -25 1-1 Change of camera position Camera 1 Camera 6 Traffic jam α i, db -5-1 -15-2 -25 Traffic jams OCNM distance 1-2 OCNM distance 1-1 KOAD δ 1-1 1-3 KOAD δ 1-2 1-1 1-3 1 2 3 4 Timestep 1 2 3 4 Timestep Use n = 3 out of c = 6 voting at central monitoring unit

Transports Quebec ROC KOAD: Gaussian kernel, with varying standard deviation for the kernel function OCNM: identify 5%- 5% of outliers P D 1.8.6.4.2 KOAD OCNM.2.4.6.8 1 P FA

Dataset 2: Abilene Data collection 11 core routers, 121 backbone flows 4 main pkt header fields collected: (srcip, dstip, srcport, dstport) Data processing Abilene Weathermap Construct histogram of headers Calculate header entropies for each backbone flow, at each timestep Variations in entropies (distributions) reveal many anomalies [Lakhina 25] Log of frequency 1 5 1 5 1 5 1 2 3 1 2 3 1 2 3 Hash of srcip t=214 t=216 t=218

Abilene Results δ.3 KOAD.1 Magnitude of residual Euclidean distance 1-4 1-5 1 -.7 1 -.9 PCA, 1 components OCNM, 5% outliers KOAD PCA OCNM 4 8 12 16 2 Timestep

Conclusions and Future Work Preliminary results indicate potential of ML approaches Parameters set using supervised learning Computations must be distributed Online: complexity must be independent of time

Acknowledgements, References Acknowledgements: Thanks to Sergio Restrepo for processing Transports Quebec data and to Anukool Lakhina for providing Abilene dataset. References: [Ahmed 7] T. Ahmed, M. Coates, and A. Lakhina, Multivariate online anomaly detection using kernel recursive least squares, in Proc. IEEE Infocom, Anchorage, AK, May 27, to appear. [Engel 4] Y. Engel, S. Mannor, and R. Meir, The kernel recursive least squares algorithm, IEEE Trans. Signal Proc., vol. 52, no. 8, pp. 2275 2285, Aug. 24. [Lakhina 5] A. Lakhina, M. Crovella and C. Diot, Mining anomalies using traffic feature distributions, in Proc. ACM SIGCOMM, Philadelphia, PA, Aug. 25. [Muñoz 6] A. Muñoz and J. Moguerza, Estimation of high-density regions using one-class neighbor machines, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, num 3, pp 476--48, Mar. 26.