Anomaly Detection. What is an anomaly?

Similar documents
Computer Immunlogy. Primary response: Antigen has not been seen before by the immune system.

A Comparative Evaluation of Anomaly Detection Techniques for Sequence Data. Technical Report

Chapter 7 Turing Machines

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

Novel machine learning techniques for anomaly intrusion detection

Stephen Scott.

An Introduction to Bioinformatics Algorithms Hidden Markov Models

(Refer Slide Time: 0:21)

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc.

Computability and Complexity

Hidden Markov Models

Reducing False Alarm Rate in Anomaly Detection with Layered Filtering

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

Formal Definition of Computation. August 28, 2013

Chapter 4 Continuous Random Variables and Probability Distributions

Anomaly Detection. Jing Gao. SUNY Buffalo

Comp 11 Lectures. Mike Shah. July 26, Tufts University. Mike Shah (Tufts University) Comp 11 Lectures July 26, / 40

Physically Unclonable Functions

CMP 309: Automata Theory, Computability and Formal Languages. Adapted from the work of Andrej Bogdanov

Let us first give some intuitive idea about a state of a system and state transitions before describing finite automata.

Markov Chains and Hidden Markov Models. = stochastic, generative models

CS4026 Formal Models of Computation

Pair Hidden Markov Models

Quantum Computing Lecture 8. Quantum Automata and Complexity

Alternatives to competitive analysis Georgios D Amanatidis

Hidden Markov Models for biological sequence analysis

Roberto Perdisci^+, Guofei Gu^, Wenke Lee^ presented by Roberto Perdisci. ^Georgia Institute of Technology, Atlanta, GA, USA

Eigen Co-occurrence Matrix Method for Masquerade Detection

HIDDEN MARKOV MODELS

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

DISTINGUING NON-DETERMINISTIC TIMED FINITE STATE MACHINES

2.6 Variations on Turing Machines

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

Log-Linear Models, MEMMs, and CRFs

Regular Expressions. Definitions Equivalence to Finite Automata

Deterministic Finite Automata

Theory of Computation

Griffith University 3130CIT Theory of Computation (Based on slides by Harald Søndergaard of The University of Melbourne) Turing Machines 9-0

Advanced Undecidability Proofs

Remarks on Separating Words

Thermodynamics of computation.

Undecidable Problems and Reducibility

Unsupervised Anomaly Detection for High Dimensional Data

About the impossibility to prove P NP or P = NP and the pseudo-randomness in NP

UNIT-VIII COMPUTABILITY THEORY

Discriminative Learning can Succeed where Generative Learning Fails

Compute the Fourier transform on the first register to get x {0,1} n x 0.

STA 4273H: Statistical Machine Learning

Nondeterminism. September 7, Nondeterminism

Outline. CS21 Decidability and Tractability. Regular expressions and FA. Regular expressions and FA. Regular expressions and FA

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

Anomaly Detection for the CERN Large Hadron Collider injection magnets

What You Must Remember When Processing Data Words

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Enhancing Active Automata Learning by a User Log Based Metric

Probabilistic Time Series Classification

11.3 Decoding Algorithm

Lecture 1: September 25, A quick reminder about random variables and convexity

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Hidden Markov Models 1

Statistical Methods for NLP

Turing Machines A Turing Machine is a 7-tuple, (Q, Σ, Γ, δ, q0, qaccept, qreject), where Q, Σ, Γ are all finite

CSC173 Workshop: 13 Sept. Notes

Uncertainty. Michael Peters December 27, 2013

A Genetic Algorithm Approach for Doing Misuse Detection in Audit Trail Files

Hidden Markov Models: All the Glorious Gory Details

O 3 O 4 O 5. q 3. q 4. Transition

A Necessary Condition for Learning from Positive Examples

Regular Languages and Finite Automata

Fast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200

Nondeterministic Finite Automata

CSE 105 THEORY OF COMPUTATION

Randomized Time Space Tradeoffs for Directed Graph Connectivity

Lecture 24: Randomized Complexity, Course Summary

L23: hidden Markov models

Advanced topic: Space complexity

How many rounds can Random Selection handle?

A Critique of Solving the P/NP Problem Under Intrinsic Uncertainty

Efficient Techniques for Fast Packet Classification

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Hidden Markov Models

CS21 Decidability and Tractability

A Distributed Approach to Anomaly Detection

Intro to Bayesian Methods

Languages, regular languages, finite automata

Bayesian Computation

Public Key Exchange by Neural Networks

Unit 8: Sequ. ential Circuits

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

How to write maths (well)

Linear equations The first case of a linear equation you learn is in one variable, for instance:

Numerically Stable Hidden Markov Model Implementation

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

18.9 SUPPORT VECTOR MACHINES

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

Hidden Markov Models for biological sequence analysis I

Transcription:

Anomaly Detection Brian Palmer What is an anomaly? the normal behavior of a process is characterized by a model Deviations from the model are called anomalies. Example Applications versus spyware There is a model of what using the computer involves; if the system notices communication with strange hosts, it s an anomaly Attacks on networks 1

Detecting Anomalies Anomalies are useful when nothing is known about intrusions Modern systems like IDES combine anomaly detection with known intrusion databases Two ways of modelling normal behaviour Models of normal behaviour can be the allowed patterns (positive detection) Anomalous patterns (negative detection) Which do you think would be better? Positive and Negative Detection Early research focused on positive detection It seems smaller and simpler Advantages of negative detection More common in nature Same amount of information Easier to distribute 2

Algorithms/Anomaly Detection N-gram auditing Finite automata System call sense of self Artificial Immune System N-Gram auditing Intrusions are correlated with abnormal behaviour Events are logged into a symbolic audit trail How to represent this? Simplify features Then use an n-length but cat <1> sliding window across the audit log to compare with positive training data In a shell, for example, not cat file1.c 3

Finite Automata N-gram auditing is very simple Processes are complicated Use finite automata instead Hand constructing can be straightforward or hard Much better to let the computer do the work (e.g., Baum-Welch algorithm for HMM) But HMM calculation is expensive Training a Finite Automaton This approach outlined by Michael & Ghosh Works on a training set of nonintrusive patterns It must accept every element in the training data That s trivial how can you do it easily? (Hint: one way is too weak, another is too strong) 4

What is a finite automaton (S,f) A finite set S of states A transition mapping f such that given a sequence of elements l, f(s1,l)=s2 for some s1,s2 in S Generalization from the DFA/NFA that we re probably all familiar with Using n-grams to construct it Each state is associated with one or more n- grams of audit information n is a parameter of the algorithm More than one n-gram for most states When we see a new n-gram, either create a new state, or reuse an existing state Transition is the last l elements of the current n- gram l is a parameter of the algorithm 5

Deciding to create a state Ask one simple question For the next n-gram, can the automaton already accommodate it? The answer comes in three forms: Creating a State: Form 1 The current state has a transition matching last l elements of previous n- gram to a state associated with the new n-gram Action: Done 6

Creating a State: Form 2 The current state has a matching transition, but not to the correct state (or there is no matching state). Create a state for the new n-gram if it does not exist Create a transition from the current state to the new n-gram s associated state, using the last l elements of the previous n- gram Creating a State: Form 3 The current state has no outgoing edges that correspond to last l elements of previous n- gram. If there is already a state assigned to the next n- gram, add a transition to it as previously If not, we assign it to a compatible state The authors quibble over good compatibility, but go with longest matching prefix If there are no compatible states, then create a new state 7

Size of the automaton No n-gram has more than one state associated with it Thus, no more than k n states for a program with k unique audit events Total number of edges is bounded by k n+l In practice, it is much smaller Example 8

Confidence Rather than simply accepting or rejecting, we should have a confidence value in the automaton s assertion If the current state exists, and there s a transition for l, then the confidence that it is an anomaly is 1-P(taking this transition) P(taking this transition)=# of times it was taken in training/# of times the current state was encountered in training Confidence 2 More absolute values: If the current state exists, but there s no transition, P(Anomaly)=1 If the current state is not defined (i.e., previous state had no correct transition), P(Anomaly)=0 9

Performance Note that this is used for real-time detection Very efficient Training is done in linear time Immune System The human immune response is a driving metaphor T-cells are grown in the thymus and accustomed to self peptides Ones that react to the self peptides are censored; others are released into the body 10

Simple notion of self We want to tag known software runs with some identifier of self. Any anomaly should interfere with these signatures Normal runs of the program should not But it should be able to run on arbitrary data System calls System calls are easily tracked by the kernel for arbitrary programs Already requires a context switch Will be involved in any critical intrusion Claim: they form a useful fingerprint for intrusions 11

System call windows Similar to the n-gram auditing, we will use a k element window and slide it over a trace of system calls But rather than pay attention to the exact order, we collect the k in the tail simply as valid successors to the k in the head. Example: System Call Window Trace: open, read, mmap, mmap, open, getrlimit, mmap, close Let k=3 12

Mismatches The likelihood that it is an anomaly against a live run is found by counting the number of mismatches in sequences Maximum number of mismatches for a sequence of length L with lookahead of k is k(l-k)+(k-1)+(k-2)+ +1=k(L-(k+1)/2) So # mismatches/maximum # of mismatches = confidence of anomaly Immune System as Algorithm Sequence of events form a string in a universe U We have a set RS of these strings; we can access only a sample S to train on Candidates are generated randomly and censored against S We ll discuss the form of these candidates later on Those that fail to match any in S are retained as active detectors 13

Immune System Algorithm 2 Each detector is independently generated so it s probabilistic that, given sufficient of them, they ll detect anomalies Works when given only positive examples to train against (why is that important?) Done Artificial Immune System Hofmeyr adapted this for an online, dynamic detector for network attacks More like a real biological system Immature, mature, and memory detectors present in system Immature ones are deleted if they match a connection Mature ones that are sufficiently discerning are promoted to memory detectors, with extended lifetime but lower threshold of activation 14

Artificial Immune System 2 Activation thresholds work to prevent autoimmune disorders Metaphor to proteins avidity thresholds Temporal clumping works for single host attacks To handle distributed attacks, each successful detection lowers activation threshold Each of these goes down over time Form of Detector In this system, a detector is An element of the set or An r-chunk length r string along a fixed position R-chunks are strictly more powerful than rcb matching 15

Different schemes Different choices are possible for most of the crucial parameters Similarity of Sequences How close are two sequences of events? What would be a good way to classify their similarity? We want a distance metric 16

Sequence Similarity Metrics Hamming Distance Distance in n-space Plot x 1,,x n and y 1,,y n as points in R n Requires same-size sequences, large dimensions, numerical approximation Outliers grossly affect this Largest common subsequence Define Sim(X,Y)= LCS(X,Y) /max( X, Y ) Also seems to be known as r-contiguous bits (rcb) Various variants of this: linear filter, scaling, r-chunk Summary I ve gone over 4 basic approaches to detecting anomalies given positive training data. These tend to be very efficient, but specialized, at detecting oddities in systems Useful in many areas of (computer) security 17

References Machine Learning Techniques for the Computer Security Domain of Anomaly Detection. PhD thesis, Purdue University, W. Lafayette, IN, August 2000. A Sense of Self for Unix Processes. Stephanie Forrest, Steven A. Hofmeyr, Anil Somayaji, Thomas Longstaff. Bollobas, B., Das, G., Gunopulos, D., Mannila, H., `Time-Series Similarity Problems and Well-Separated Geometric Sets', Proc. of 13th Annual ACM Symposium on Computational Geometry, To appear, 1998. http://citeseer.ist.psu.edu/bollobas98timeseries.html Immunity by Design: An Artificial Immune System by Hofmeyr and Forrest 18