Sequential Feature Explanations for Anomaly Detection

Size: px

Start display at page:

Download "Sequential Feature Explanations for Anomaly Detection"

Ashley George
5 years ago
Views:

1 Sequential Feature Explanations for Anomaly Detection Md Amran Siddiqui, Alan Fern, Thomas G. Die8erich and Weng-Keen Wong School of EECS Oregon State University

2 Anomaly Detection Anomalies : points that are generated by a process that is dis:nct from the process genera:ng normal points In this talk Anomaly = Threat

3 Anomaly Detectors We focus on density-based anomaly detectors StaBsBcal Outliers : points with low density values non-anomaly anomaly non-anomaly Not all stabsbcal outliers are anomalies of interest (sta:s:cs versus seman:cs)

4 Anomaly Detec:on Pipeline Data Points Threats & Non-Threats

5 Anomaly Detec:on Pipeline Data Points Threats & Non-Threats Anomaly Detector

6 Anomaly Detec:on Pipeline Data Points Threats & Non-Threats Anomaly Detector Outliers Threats & False Positives Non-Outliers Non-Threats & Missed Threats Type 1 Missed Threats = Anomaly Detector s False Negatives Reduce by improving anomaly detector

7 Anomaly Detec:on Pipeline Data Points Outliers Threats & Non-Threats Anomaly Detector Threats & False Positives Human Analyst Non-Outliers Non-Threats & Missed Threats Type 1 Missed Threats = Anomaly Detector s False Negatives Reduce by improving anomaly detector

8 Anomaly Detec:on Pipeline Data Points Outliers Alarms Threats & Non-Threats Anomaly Detector Threats & False Positives Human Analyst Threats & False Positives Non-Outliers Discarded Non-Threats & Missed Threats Non-Threats & Missed Threats Type 1 Missed Threats = Anomaly Detector s False Negatives Reduce by improving anomaly detector

9 Anomaly Detec:on Pipeline Data Points Outliers Alarms Threats & Non-Threats Anomaly Detector Threats & False Positives Human Analyst Threats & False Positives Non-Outliers Discarded Non-Threats & Missed Threats Non-Threats & Missed Threats Type 1 Missed Threats = Anomaly Detector s False Negatives Reduce by improving anomaly detector Type 2 Missed Threats = Analyst s False Negatives Can occur due to information overload and time constraints How can we reduce type 2 errors?

10 Anomaly Detec:on Pipeline Data Points Outliers Alarms Threats & Non-Threats Anomaly Detector Threats & False Positives Human Analyst Threats & False Positives Non-Outliers Discarded Non-Threats & Missed Threats Non-Threats & Missed Threats Goal: reduce analyst effort for correctly detecting outliers that are threats

11 Anomaly Detec:on Pipeline Data Points Outliers + Explanations Alarms Threats & Non-Threats Anomaly Detector Threats & False Positives Human Analyst Threats & False Positives Non-Outliers Discarded Non-Threats & Missed Threats Non-Threats & Missed Threats Goal: reduce analyst effort for correctly detecting outliers that are threats How: provide analyst with explanations of outlier points

12 Anomaly Detec:on Pipeline Data Points Outliers + Explanations Alarms Threats & Non-Threats Anomaly Detector Threats & False Positives Human Analyst Threats & False Positives Non-Outliers Discarded Non-Threats & Missed Threats Non-Threats & Missed Threats Goal: reduce analyst effort for correctly detecting outliers that are threats How: provide analyst with explanations of outlier points Why did the detector consider an object to be an outlier?

13 Anomaly Detec:on Pipeline Data Points Outliers + Explanations Alarms Threats & Non-Threats Anomaly Detector Threats & False Positives Human Analyst Threats & False Positives Non-Outliers Discarded Non-Threats & Missed Threats Non-Threats & Missed Threats Goal: reduce analyst effort for correctly detecting outliers that are threats How: provide analyst with explanations of outlier points Why did the detector consider an object to be an outlier? Analyst can focus on information related to explanation.

14 Anomaly Detec:on Pipeline Data Points Outliers + Explanations Alarms Threats & Non-Threats Anomaly Detector Threats & False Positives Human Analyst Threats & False Positives Non-Outliers Discarded Non-Threats & Missed Threats Non-Threats & Missed Threats Sequential Feature Explanation (SFE): an ordering on features of an outlier prioritized by importance to anomaly detector (F2, F10, F37, F26 )

15 Anomaly Detec:on Pipeline Data Points Outliers + Explanations Alarms Threats & Non-Threats Anomaly Detector Threats & False Positives Human Analyst Threats & False Positives Non-Outliers Discarded Non-Threats & Missed Threats Non-Threats & Missed Threats Sequential Feature Explanation (SFE): an ordering on features of an outlier prioritized by importance to anomaly detector (F2, F10, F37, F26 ) Protocol: incrementally reveal features ordered by SFE until analyst makes a determination

16 SFE Example Analyst s belief about normality of X 16

17 SFE Example Analyst s belief about normality of X 17

18 SFE Example Analyst s belief about normality of X 18

19 SFE Example Analyst s belief about normality of X

20 SFE Example Analyst s belief about normality of X

21 SFE Example Analyst s belief about normality of X

22 SFE Example Analyst s belief about normality of X

23 SFE Example Analyst s belief about normality of X Threshold

24 SFE Example Analyst s belief about normality of X Threshold How do we evaluate SFE quality?

25 SFE Example Analyst s belief about normality of X MFP = 4 Threshold Minimum Feature Prefix (MFP). Minimum number of features that must be revealed for the analyst to become confident that a threat is truly a threat.

26 Op:mizing MFP Analyst s belief about normality of X unknown MFP = 4 unknown Threshold Ideal ObjecBve: compute SFE with minimum MFP But.. We don t know the analyst belief model or threshold!

27 Op:mizing MFP Analyst s belief about normality of X f(x) MFP = 4 unknown Threshold Ideal ObjecBve: compute SFE with minimum MFP AssumpBon 1: analyst s beliefs modeled by learned density f(x)

28 Op:mizing MFP Analyst s belief about normality of X f(x) MFP = 4 Pr(threshold) Threshold Ideal ObjecBve: compute SFE with minimum MFP AssumpBon 1: analyst s beliefs modeled by learned density f(x) AssumpBon 2: distribu:on Pr(threshold) over possible thresholds

29 Op:mizing MFP Analyst s belief about normality of X f(x) MFP = 4 Pr(threshold) Threshold Ideal ObjecBve: compute SFE with minimum MFP AssumpBon 1: analyst s beliefs modeled by learned density f(x) AssumpBon 2: distribu:on Pr(threshold) over possible thresholds

30 Op:mizing MFP Analyst s belief about normality of X f(x) MFP = 4 Pr(threshold) Threshold Realizable ObjecBve: compute SFE with minimum expected MFP under assump:ons 1 and 2 AssumpBon 1: analyst s beliefs modeled by learned density f(x) AssumpBon 2: distribu:on Pr(threshold) over possible thresholds

31 Op:mizing MFP Analyst s belief about normality of X f(x) MFP = 4 Pr(threshold) Threshold Realizable ObjecBve: compute SFE with minimum expected MFP under assump:ons 1 and 2 NP-hard problem Not Covered Today: branch and bound op:miza:on procedure

32 Greedy Op:miza:on: Sequen:al Marginal SequenBal Marginal: Choose First feature i that minimizes f( x i )

33 Greedy Op:miza:on: Sequen:al Marginal SequenBal Marginal: Choose First feature i that minimizes f( x i ) Choose Second feature j that minimizes f( x i, x j )

34 Greedy Op:miza:on: Sequen:al Marginal SequenBal Marginal: Choose First feature i that minimizes f( x i ) Choose Second feature j that minimizes f( x i, x j )....

35 Greedy Op:miza:on: Independent Marginal Independent Marginal: computa:onally cheaper

36 Greedy Op:miza:on: Independent Marginal Independent Marginal: computa:onally cheaper Order features according to increasing f( x i ) I.e. order according to independent anomalousness of each feature

37 Greedy Op:miza:on: Indepedent Dropout Independent Dropout: inspired by [Robnik et al., 2008] for compu:ng supervised learning explana:ons

38 Greedy Op:miza:on: Indepedent Dropout Independent Dropout: inspired by [Robnik et al., 2008] for compu:ng supervised learning explana:ons Order features according to decreasing f( x i ) I.e. order according to how much more normal x looks ader removing the feature

39 Greedy Op:miza:on: Sequen:al Dropout SequenBal Dropout:

40 Greedy Op:miza:on: Sequen:al Dropout SequenBal Dropout: Select first feature i as one that maximizes f( x i )

41 Greedy Op:miza:on: Sequen:al Dropout SequenBal Dropout: Select first feature i as one that maximizes f( x i ) Select second feature j as one that maximizes f( x i j )..

42 Evalua:ng SFEs Analyst s belief about normality of X MFP = 4 Threshold Problem: Evalua:ng an SFE requires access to an analyst, but we can t run large scale experiments with real analysts

43 Evalua:ng SFEs Analyst s belief about normality of X MFP = 4 Threshold Problem: Evalua:ng an SFE requires access to an analyst, but we can t run large scale experiments with real analysts SoluBon: Construct simulated analyst for anomaly detec:on benchmarks

44 Evalua:ng Explana:ons Start with anomaly detec:on benchmarks constructed from UCI supervised learning data set [Emmoi et al., 2013] Each benchmark has known anomaly and normal classes UCI Dataset Normal Points Anomaly Points Anomaly Detec:on Benchmark

45 Evalua:ng Explana:ons Start with anomaly detec:on benchmarks constructed from UCI supervised learning data set [Emmoi et al., 2013] Each benchmark has known anomaly and normal classes Learn a classifier P(normal x) to predict normal vs. anomalous for any feature subset Can serve as a simulated analyst UCI Dataset Normal Points Supervised Learning Simulated Analyst Anomaly Points P(no Anomaly Detec:on Benchmark

46 Evalua:ng SFEs Analyst s belief about normality of X Simulated Analyst MFP = 4 P(no Threshold rmal EvaluaBon Metric : expected MFP of simulated analyst x) Use reasonable distribu:on over thresholds.

47 Results of Explana:ons for EGMM 6 5 Average MFP Random OptOracle IndDO SeqDO IndMarg SeqMarg Use an ensemble of GMMs (EGMM) as the learned density f(x)

48 Oracle Experiments Explana:on evalua:on depends on two factors: 1. Quality of f(x) How well does f(x) match true analyst? 2. Quality of explana:on computa:on To assess (2) we run experiments that replace f(x) with ground truth analyst

49 Results of Explana:ons for Oracle Detector 6 5 Average MFP Random OptOracle IndDO* SeqDO* IndMarg* SeqMarg*

50 Result on KDDCup99 Dataset

51 Result on KDDCup99 Dataset Average MFP Random IndDO SeqDO IndMarg SeqMarg

52 Key Observa:ons from the Experiments All methods significantly beat random

53 Key Observa:ons from the Experiments All methods significantly beat random Marginal methods no worse and sometimes better than dropout

54 Key Observa:ons from the Experiments All methods significantly beat random Marginal methods no worse and sometimes better than dropout Independent marginal is nearly as good as sequential marginal

55 Key Observa:ons from the Experiments All methods significantly beat random Marginal methods no worse and sometimes better than dropout Independent marginal is nearly as good as sequential marginal But sequential is significantly better in oracle experiments The weaker signals produced by the Dropout methods when taking early decisions makes it less robust compare to the Marginal methods

56 Summary Reducing effort of analyst to detect threats can reduce the analyst miss rate

57 Summary Reducing effort of analyst to detect threats can reduce the analyst miss rate Proposed sequential feature explanations to guide analyst investigation

58 Summary Reducing effort of analyst to detect threats can reduce the analyst miss rate Proposed sequential feature explanations to guide analyst investigation Proposed an evaluation framework for explanations

59 Summary Reducing effort of analyst to detect threats can reduce the analyst miss rate Proposed sequential feature explanations to guide analyst investigation Proposed an evaluation framework for explanations Designed 4 greedy explanation methods and evaluated

60 Summary Reducing effort of analyst to detect threats can reduce the analyst miss rate Proposed sequential feature explanations to guide analyst investigation Proposed an evaluation framework for explanations Designed 4 greedy explanation methods and evaluated Preferred Method: sequential marginal

61 Further evaluations Future Work Additional anomaly detectors (e.g. with PCA applied) Larger feature spaces Evaluate non-greedy algorithms Branch-and-Bound Anomaly exoneration Alternative types of explanations

62 Ques:ons

63 SFE Calcula:on We assume, for every feature subset s there exists a par:cular threshold τ such that for any instance x: f( x s )<τ implies x is an anomaly To find op:mal SFE we first define the MFP of a SFE E for an instance x: MFP(x, E, τ(e))= min { i :f( x E 1:i )< τ i (E) } Where f(.) is the density func:on τ(e) is the set of thresholds, where τ i (E) is a random variable corresponding to the feature subset E 1:i

64 SFE Calcula:on Expected MFP: MFP(x,E)=E τ(e) [MFP(x,E,τ(E))] Objec:ve func:on for gerng op:mal MFP of x: arg min E MFP(x,E) The objec:ve func:on is hard to op:mize, hence, we introduce two greedy methods: Marginal and Dropout, those approximately try to minimize the objec:ve func:on for compu:ng SFE

65 f(x) is the learned normal Explana:on Algorithms SequenBal Marginal: Choose First feature i that minimizes f( x i ) Choose Second feature j that minimizes f( x i, x j ).... Independent Marginal: computa:onally cheaper Order features according to increasing f( x i ) I.e. order according to independent anomalousness of each feature Independent Dropout: inspired by [Robnik et al., 2008] from supervised learning Order features according to decreasing f( x i ) I.e. order according to how much more normal x looks ader removal SequenBal Dropout: Select first feature i as one that maximizes f( x i ) Select second feature j as one that maximizes f( x i j )..

Advances in Anomaly Detection

Advances in Anomaly Detection Tom Dietterich Alan Fern Weng-Keen Wong Andrew Emmott Shubhomoy Das Md. Amran Siddiqui Tadesse Zemicheal Outline Introduction Three application areas Two general approaches