Roberto Perdisci^+, Guofei Gu^, Wenke Lee^ presented by Roberto Perdisci. ^Georgia Institute of Technology, Atlanta, GA, USA

Similar documents
Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems

An Information-Theoretic Measure of Intrusion Detection Capability

Unsupervised Anomaly Detection for High Dimensional Data

Sparse Kernel Machines - SVM

An Overview of Outlier Detection Techniques and Applications

Defending against Internet worms using a phase space method from chaos theory

Introduction to Support Vector Machines

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

A Framework for Adaptive Anomaly Detection Based on Support Vector Data Description

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

Quantum Classification of Malware. John Seymour

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

SVMs: nonlinearity through kernels

Evaluation Metrics for Intrusion Detection Systems - A Study

CIS 520: Machine Learning Oct 09, Kernel Methods

An evaluation of automatic parameter tuning of a statistics-based anomaly detection algorithm

What is semi-supervised learning?

Graph-Based Anomaly Detection with Soft Harmonic Functions

CS145: INTRODUCTION TO DATA MINING

SYSTEMATIC CONSTRUCTION OF ANOMALY DETECTION BENCHMARKS FROM REAL DATA. Outlier Detection And Description Workshop 2013

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Support Vector Machine (continued)

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Information-Theoretic Measures for Anomaly Detection

Anomaly Detection. Jing Gao. SUNY Buffalo

Reducing False Alarm Rate in Anomaly Detection with Layered Filtering

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Intrusion Detection and Malware Analysis

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Chapter 6: Classification

Introduction to Machine Learning Midterm Exam

Support Vector Machine. Industrial AI Lab.

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Anomaly Detection for the CERN Large Hadron Collider injection magnets

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

Detection and Classification of Anomalies in Network Traffic Using Generalized Entropies and OC-SVM with Mahalanobis Kernel

An Automatic and Dynamic Parameter Tuning of a Statistic-based Anomaly Detection Algorithm

CS7267 MACHINE LEARNING

A Simple Algorithm for Learning Stable Machines

DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar University of Utah

Novel machine learning techniques for anomaly intrusion detection

FIND A FUNCTION TO CLASSIFY HIGH VALUE CUSTOMERS

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

TDT4173 Machine Learning

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Learning Methods for Linear Detectors

CSE 546 Final Exam, Autumn 2013

MIRA, SVM, k-nn. Lirong Xia

A Support Vector Regression Model for Forecasting Rainfall

Data Mining Based Anomaly Detection In PMU Measurements And Event Detection

Click Prediction and Preference Ranking of RSS Feeds

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL., NO., Improvement in Intrusion Detection with Advances in Sensor Fusion

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Anomaly Detection. What is an anomaly?

Mining Anomalies Using Traffic Feature Distributions

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Chapter 14 Combining Models

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Applying Machine Learning for Gravitational-wave Burst Data Analysis

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Learning theory. Ensemble methods. Boosting. Boosting: history

6.867 Machine learning

Boosting: Algorithms and Applications

Advanced Machine Learning & Perception

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Statistical Machine Learning from Data

Anomaly Detection in Logged Sensor Data. Master s thesis in Complex Adaptive Systems JOHAN FLORBÄCK

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Discriminative Direction for Kernel Classifiers

CS798: Selected topics in Machine Learning

ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp.

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues

FINAL: CS 6375 (Machine Learning) Fall 2014

Classification and Pattern Recognition

Machine Learning: Assignment 1

CS246 Final Exam, Winter 2011

Jeff Howbert Introduction to Machine Learning Winter

Analysis Based on SVM for Untrusted Mobile Crowd Sensing

Infinite Ensemble Learning with Support Vector Machinery

Predicting New Search-Query Cluster Volume

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Andras Hajdu Faculty of Informatics, University of Debrecen

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Machine Learning Linear Classification. Prof. Matteo Matteucci

Contextual Profiling of Homogeneous User Groups for Masquerade Detection. Pieter Bloemerus Ruthven

Advanced Introduction to Machine Learning

Support Vector and Kernel Methods

Anomaly Extraction in Backbone Networks using Association Rules

Machine Learning for the System Administrator

Transcription:

U s i n g a n E n s e m b l e o f O n e - C l a s s S V M C l a s s i f i e r s t o H a r d e n P a y l o a d - B a s e d A n o m a l y D e t e c t i o n S y s t e m s Roberto Perdisci^+, Guofei Gu^, Wenke Lee^ ^Georgia Institute of Technology, Atlanta, GA, USA + University of Cagliari, ITALY presented by Roberto Perdisci

O u t l i n e Anomaly Detection in Computer Networks PAYL, a PAYLoad-based Anomaly Detector Polymorphic Blending Attack Hardening Payload-based Anomaly Detection Payload Analysis using 2 -grams Combining Multiple One-Class Classifiers Experimental Results Conclusion 2

O u t l i n e Anomaly Detection in Computer Networks PAYL, a PAYLoad-based Anomaly Detector Polymorphic Blending Attack Hardening Payload-based Anomaly Detection Payload Analysis using 2 -grams Combining Multiple One-Class Classifiers Experimental Results Conclusion 3

A n o m a l y D e t e c t i o n i n C o m p u t e r N e t w o r k s Problem Definition Classify computer network traffic Distinguish between normal traffic and attacks No labelled dataset Assumptions The vast majority of the network traffic is normal Network attacks can be distinguished from normal traffic using suitable metrics Outlier Detection problem 4

O u t l i n e Anomaly Detection in Computer Networks PAYL, a PAYLoad-based Anomaly Detector Polymorphic Blending Attack Hardening Payload-based Anomaly Detection Payload Analysis using 2 -grams Combining Multiple One-Class Classifiers Experimental Results Conclusion 5

P A Y L PAYLoad-based Anomaly Detector Developed at Columbia University, NY Based on occurrence frequency of n-grams (sequences of n bytes) in the payload 4-gram Packet Header A B A C D A B C D F Training Frequency of n-grams is extracted for each payload in a (noisy) dataset of normal traffic A simple model is constructed by computing the average and standard deviation of frequency of n-grams 256 n possible n-grams = 256 n features 6

P A Y L Operational Phase The frequency of n-grams is extracted from the payload of each packet entering the network Simplified Mahalanobis distance used to compare the packet under test to the model of normal traffic An alarm is flagged if distance greater than a certain threshold Problems PAYL assumes there is no correlation among features Uses 1-gram (or 2-gram) analysis because high values of n are impractical if n is high -> curse of dimensionality if n is low -> low amount of structural information 7

O u t l i n e Anomaly Detection in Computer Networks PAYL, a PAYLoad-based Anomaly Detector Polymorphic Blending Attack Hardening Payload-based Anomaly Detection Payload Analysis using 2 -grams Combining Multiple One-Class Classifiers Experimental Results Conclusion 8

P o l y m o r p h i c B l e n d i n g A t t a c k Polymorphism is used by attackers to avoid signaturebased detection 1-gram and 2-gram PAYL can easily detect standard and Polymorphic attacks normal HTTP requests are highly structured, they contain mostly printable characters the Executable Code, the Decryption Engine and the Encrypted Code contain lots of unusual characters (e.g., non-printable) Polymorphic Blending Attack can evade PAYL Encryption algorithm is designed to make the attack look like normal traffic HTTP GET HTTP GET Decryption Engine Decryption Engine Substitution table 9 Encrypted Code Encrypted Code Padding

P o l y m o r p h i c B l e n d i n g A t t a c k Attack strategy Estimate frequency distribution of n-grams in normal traffic (e.g., sniffing traffic sent towards the victim network) Encode the attack payload to approximate the learned distribution Add padding bytes to further adjust the distribution of n-grams in the attack payload Can evade 1-gram and 2-gram PAYL Attack transformation T brings the attack pattern inside the decision surface 10

A n a l y s i s o f P o l y m o r p h i c B l e n d i n g A t t a c k Why does the Blending Attack work? Model of normal traffic constructed by PAYL is too simple 1-gram and 2-gram analysis do not extract enough structural information Shortcomings of the attack Polymorphic Blending Attack uses a greedy algorithm to find a sub-optimal attack transformation The attack transformation is less and less likely to find a good solution for high values of n 11

O u t l i n e Anomaly Detection in Computer Networks PAYL, a PAYLoad-based Anomaly Detector Polymorphic Blending Attack Hardening Payload-based Anomaly Detection Payload Analysis using 2 -grams Combining Multiple One-Class Classifiers Experimental Results Conclusion 12

E x t r a c t i n g s t r u c t u r a l i n f o r m a t i o n We could use n-gram analysis with a high value of n, but... 256 n features! (if n=3 we have 16,777,216 features!) curse of dimensionality problems related to computational cost and memory consumption of learning algorithms Observation if n=2 we have 256 2 =65,536 features in this case the classification problem is still tractable 13

2 - g r a m a n a l y s i s Definition 2 -gram = 2 bytes in the payload that are bytes apart from each other instead of measuring the occurrence frequency of n- grams we measure the freq. of 2 -grams, with =0..(n-2) 2 0 -gram A B A C D A B C D F 2 1 -gram A B A C D A B C D F 2 2 -gram A B A C D A B C D F 14

C o m b i n i n g m u l t i p l e m o d e l s Intuition combining the structural information extracted using the 2 -gram analysis, =0..(n-2) approximately reconstructs the structural information extracted by n-gram analysis In practice using 2 -gram analysis we obtain (n-2+1) different descriptions of the payload each description projects the payload in a 256 2 - dimensional feature space construct one model of normal traffic for each value of =0..(n-2) using One-Class SVM combine the output of the obtained (n-2+1) classifiers using the Majority Voting combination rule 15

F e a t u r e R e d u c t i o n 2562 = 65,536 features! we need to reduce the dimensionality of each of the (n-2+1) feature spaces before constructing classifiers Payload-based Anomaly Detection with n-gram analysis is analogous to text classification true if we consider the bag-of-words technique with freq. of words as features n-grams = words payload = document We use a Feature Clustering algorithm proposed for text classification problems Dhillon et al., A divisive information-theoretic feature clustering algorithm for text classification, JMLR 2003 16

S u m m a r y Our approach to make Polymorphic Blending Attack harder to succeed Extract more structural information from the payload Construct descriptions of the payload in different feature spaces Reduce the dimensionality of these feature spaces Construct a One-Class SVM classifier on each of the reduced feature spaces to model normal traffic Combine the output of the constructed classifiers 17

O u t l i n e Anomaly Detection in Computer Networks PAYL, a PAYLoad-based Anomaly Detector Polymorphic Blending Attack Hardening Payload-based Anomaly Detection Payload Analysis using 2 -grams Combining Multiple One-Class Classifiers Experimental Results Conclusion 18

E x p e r i m e n t a l R e s u l t s Datasets HTTP requests towards www.cc.gatech.edu collected between October and November 2004 Training dataset 1 day of normal traffic (384,389 payloads) Test datasets 4 days of normal traffic (1,315,433 payloads) Attack Dataset (126 payloads) 11 non-polymorphic Buffer Overflow attacks 6 polymorphic attacks 1 Polymorphic Blending Attack (trained to evade 1-gram and 2-gram PAYL) 19

E x p e r i m e n t a l R e s u l t s 1-gram PAYL 2-gram PAYL Multiple One-Class SVM (n=12,k=40) DFP = False positives on training dataset RFP = False positives on test dataset DR = Percentage of detected attack packets 20

O u t l i n e Anomaly Detection in Computer Networks PAYL, a PAYLoad-based Anomaly Detector Polymorphic Blending Attack Hardening Payload-based Anomaly Detection Payload Analysis using 2 -grams Combining Multiple One-Class Classifiers Experimental Results Conclusion 21

C o n c l u s i o n We introduced the 2 -gram analysis technique to extract information from the payload We used the analogy between payload-based anomaly detection and text classification for feature reduction We used an ensemble of classifiers to combine the structural information extracted with the 2 -gram technique This makes the Polymorphic Blending Attack more difficult to succeed 22

R e l a t e d W o r k Wang et al. Anomalous Payload-based Network Intrusion Detection. RAID 2004. Fogla et al. Polymorphic Blending Attack. USENIX Security 2006. Dhillon et al. A divisive information-theoretic feature clustering algorithm for text classification, MIT Journal of Machine Learning Research, Vol. 3, 2003 Barreno et al. Can machine learning be secure?. AsiaCCS'06. 23

A n o m a l y v s. S i g n a t u r e - b a s e d D e t e c t i o n Signature-based IDS are the most deployed efficient patter matching can detect known attacks low number of false positives (i.e., false alarms) not able to detect unknown (zero-day) attacks Anomaly Detection can detect known and unknown attacks (in theory!) difficulties in precisely modelling the normal traffic may generate a higher number of false positives compared to signature-based IDS 24

P o l y m o r p h i c A t t a c k A standard Buffer Overflow attack (for example) looks like HTTP GET Executable Code these attacks can usually be detected using pattern matching (signature-based IDS) Polymorphism is used by attackers to avoid signature-based detection HTTP GET Decryption Engine Encrypted Code the Decryption Engine and the Encrypted Code change every time the attack is launched towards a new victim 25

E x p e r i m e n t a l R e s u l t s Single One-Class SVM classifiers RBF kernel ( =0.5) k = number of Feature Clusters = parameter for the 2 -gram analysis AUC measured in the interval [0,0.1] of false positives (normalized) 26

A d v a n t a g e s o f o u r a p p r o a c h The attacker could evade our IDS if he was able to construct the attack transformation to approximate the distribution of (n/2+1)-grams in normal traffic However, the greedy attack transformation algorithm is unlikely to find a good solution if (n/2+1) is a sufficiently high value A new attack transformation algorithm specifically crafted to approximate the distribution of 2 -grams has to evade at least n/2 different models at the same time The introduced overhead added to the operational phase is expected to be fairly low 27