Support Measure Data Description for Group Anomaly Detection

Size: px

Start display at page:

Download "Support Measure Data Description for Group Anomaly Detection"

Suzanna Owen
5 years ago
Views:

1 Support Measure Data Description for Group Anomaly Detection 1 Jorge Guevara Díaz, 2 Stephane Canu, 1 Roberto Hirata Jr. 1 IME-USP, Brazil. 2 LITIS-INSA, France. November 03, 2015.

2 Outline 1 Who am I? 2 Introduction 3 Hilbert space embedding for distributions 4 SMDD models 5 Experiments 6 Conclusions and further research Guevara 2015 (IME-USP) IBM-Research November 03, / 49

3 Outline 1 Who am I? 2 Introduction 3 Hilbert space embedding for distributions 4 SMDD models 5 Experiments 6 Conclusions and further research Guevara 2015 (IME-USP) IBM-Research November 03, / 49

4 Who am I? Three papers in local peruvian conferences. Lorito. A isolated word recognition from scratch, in JAVA. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

5 Who am I? Speech Miner. Speech recognition system using MFCC, HMM. using HTK SOM-TSP. SOM neural network to solve the TSP problem in JAVA. FNN. Neural network for digit classification in JAVA Guevara 2015 (IME-USP) IBM-Research November 03, / 49

6 Who am I? Guevara 2015 (IME-USP) IBM-Research November 03, / 49

7 Who am I? Similarity between fuzzy sets using kernels link between fuzzy systems and kernels theory of positive definite kernels on fuzzy sets kernels induced by fuzzy distance A data description model for set of distributions Guevara 2015 (IME-USP) IBM-Research November 03, / 49

8 Who am I? Similarity between fuzzy sets using kernels link between fuzzy systems and kernels theory of positive definite kernels on fuzzy sets kernels induced by fuzzy distance A data description model for set of distributions Fuzzy kernel hypothesis testing TSK kernels on fuzzy sets, classification of low quality datasets. Group anomaly detection using SMDD. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

9 Who am I? Guevara 2015 (IME-USP) IBM-Research November 03, / 49

10 Outline 1 Who am I? 2 Introduction 3 Hilbert space embedding for distributions 4 SMDD models 5 Experiments 6 Conclusions and further research Guevara 2015 (IME-USP) IBM-Research November 03, / 49

Introduction Definition (Anomaly) An anomaly is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism a Figure :

11 Introduction Definition (Anomaly) An anomaly is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism a Figure : Rare starfish found. One in a million! b a) Hawkins D., Identification of Outliers, Chapman and Hall, b) Guevara 2015 (IME-USP) IBM-Research November 03, / 49

Some applications include a : Fraud detection. i.e., abnormal buying patterns; Medicine. i.e., Unusual symptoms, abnormal tests; Sports.

Cyber-intrusion detection ; Industrial damage detection; Image processing, Textual anomaly detection (a)

a)varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv.

12 Some applications include a : Fraud detection. i.e., abnormal buying patterns; Medicine. i.e., Unusual symptoms, abnormal tests; Sports. i.e., Outstanding players; Measurement errors. i.e., abnormal values. Cyber-intrusion detection ; Industrial damage detection; Image processing, Textual anomaly detection (a) Unusual contraction. (b) Ceramic defects b. (c) Traffic jam recognition. a)varun Chandola, Arindam Banerjee, and Vipin Kumar Anomaly detection: A survey. ACM Comput. Surv. b) c) Guevara 2015 (IME-USP) IBM-Research November 03, / 49

13 Techniques: Generative models. i.e., HMM, GMM; Unsupervised methods. i.e., clustering, distance-based, density based; Discriminative models. i.e., SVM, neural networks; Information Theoretic Methods, Geometric methods Figure : 2D-anomalies a. a)varun Chandola, Arindam Banerjee, and Vipin Kumar Anomaly detection: A survey. ACM Comput. Surv. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

14 Problem Given a data set of the form T = {s i } N i=1, (1) where s i = {x (i) 1, x(i) 2,..., x(i) L i } P i, and P i defined on (R D, B(R D ). Try to detect anomalies or group anomalies from T Guevara 2015 (IME-USP) IBM-Research November 03, / 49

15 Problem Given a data set of the form T = {s i } N i=1, (2) where s i = {x (i) 1, x(i) 2,..., x(i) L i } P i, and P i defined on (R D, B(R D ). Try to detect anomalies or group anomalies from T A more complex scenario Guevara 2015 (IME-USP) IBM-Research November 03, / 49

b) http://www.vlfeat.org. c) Chang-Dong et al.

16 Introduction Examples of datasets of this form T = {s i } N i=1, (3) (a) Cluster of galaxies (b) SIFT. (c) USPS. a) b) c) Chang-Dong et al. "Multi-Exemplar Affinity Propagation", Guevara 2015 (IME-USP) IBM-Research November 03, / 49

17 Group anomaly Group of points with unexpected behavior wrt a dataset of group of points. Point-based group anomalies aggregation of anomalous points Distributed-based group anomalies anomalous aggregation of non-anomalous points. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

18 Previous work Feature engineering approach Feature extraction from each group a,b Clustering point anomalies However, they ignore distributed-based group anomalies. Generative approach Flexible genre models c Hierarchical probabilistic models d However, procedures rely on parametric assumptions Discriminative approach Support measure machines e Our work nonparametric, performance depends on the kernel choice. a Chan et al. "Modeling multiple time series for anomaly detection," in Data Mining, IEEE. b Keogh et al. "HOT SAX: efficiently finding the most unusual time series subsequence," in Data Mining, IEEE. c L Xion et al. "Group Anomaly Detection using Flexible Genre Models", NIPS. d L Xion et al. "Hierarchical Probabilistic Models for Group Anomaly Detection", AISTATS. e Muandet et a. "One-Class Support Measure Machines for Group Anomaly Detection", UAI. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

19 Outline 1 Who am I? 2 Introduction 3 Hilbert space embedding for distributions 4 SMDD models 5 Experiments 6 Conclusions and further research Guevara 2015 (IME-USP) IBM-Research November 03, / 49

20 RKHS, where the magic happens Figure : Kernel mapping a Figure from Shawe-Taylor et al. "Kernel Methods for Pattern Analysis". Cambridge University Press. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

21 RKHS, where the magic happens Main ingredient A real-valued symmetric positive definite kernel k. N i=1,j=1 c ic j k(x i, x j ) 0 Guevara 2015 (IME-USP) IBM-Research November 03, / 49

22 Kernel methods SVM kernel PCA Gaussian process SVDD MKL SMDD Kernel regression Kernel two-sample test kernel spectral clustering Guevara 2015 (IME-USP) IBM-Research November 03, / 49

23 Kernel methods SVM kernel PCA Gaussian process SVDD MKL SMDD Kernel regression Kernel two-sample test kernel spectral clustering Representer Theorem f = argmin Cost ( (x 1, y 1, f (x 1 )),..., (x N, y N, f (x N ))) + Ω( f ) (4) f H f (.) = N α i k(., x i ) (5) i =1 Guevara 2015 (IME-USP) IBM-Research November 03, / 49

24 Remembering the problem Detecting group anomalies from the set T = {s i } N i= Guevara 2015 (IME-USP) IBM-Research November 03, / 49

25 Hilbert space embedding for distributions Framework for embedding probability measures into a RKHS H. Definition The embedding of probability measures P P into H is given by the mapping µ : P H P µ P = E P [k(x,.)] = k(x,.)dp(x). x R D Guevara 2015 (IME-USP) IBM-Research November 03, / 49

26 Hilbert space embedding for distributions Properties if µ P (X ) = E P [k(x, X )] <, with k being measurable then µ P H Guevara 2015 (IME-USP) IBM-Research November 03, / 49

27 Hilbert space embedding for distributions Properties if µ P (X ) = E P [k(x, X )] <, with k being measurable then µ P H Reproducing property f, µ P = f, E P [k(x,.)] = E P [f (X )] holds for all f H Guevara 2015 (IME-USP) IBM-Research November 03, / 49

28 Hilbert space embedding for distributions Properties if µ P (X ) = E P [k(x, X )] <, with k being measurable then µ P H Reproducing property f, µ P = f, E P [k(x,.)] = E P [f (X )] holds for all f H µ P is the representative function of P Guevara 2015 (IME-USP) IBM-Research November 03, / 49

29 Hilbert space embedding for distributions Properties if µ P (X ) = E P [k(x, X )] <, with k being measurable then µ P H Reproducing property f, µ P = f, E P [k(x,.)] = E P [f (X )] holds for all f H µ P is the representative function of P if k is characteristic then µ : P H is injective Guevara 2015 (IME-USP) IBM-Research November 03, / 49

30 Hilbert space embedding for distributions Properties if µ P (X ) = E P [k(x, X )] <, with k being measurable then µ P H Reproducing property f, µ P = f, E P [k(x,.)] = E P [f (X )] holds for all f H µ P is the representative function of P if k is characteristic then µ : P H is injective The term µ P µ emp, is bounded, where µ emp is a empirical estimator of µ emp Guevara 2015 (IME-USP) IBM-Research November 03, / 49

31 Kernel on probability measures The mapping P P R (P, Q) P, Q P = µ P, µ Q H defines an inner product on P. the real-valued kernel on P P, defined by k(p, Q) = P, Q P = µ P, µ Q H = k(x, x )dp(x)dq(x ) x R D x R D (6) is positive definite. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

32 Outline 1 Who am I? 2 Introduction 3 Hilbert space embedding for distributions 4 SMDD models 5 Experiments 6 Conclusions and further research Guevara 2015 (IME-USP) IBM-Research November 03, / 49

33 Minimum volume sets (MV-set) a MV-set is a set satisfying some optimization criteria. Definition (MV-set for probability measures) Let (P, A, E) be a probability space, where P is the space of all probability measures P on (R D, B(R D )), A is some suitable σ-algebra of P and E is a probability measure on (P, A). The MV-set is the set Gα = argmin{ρ(g) E(G) α}, (7) G A where ρ is a reference measure on A and α [0, 1]. The MV-set G α, describes a fraction α of the mass concentration of E. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

34 Minimum volume sets (MV-set) Assuming that {P i } N i=1 is an i.i.d. sample distributed according to E Assuming that each P i is unknown. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

35 Minimum volume sets (MV-set) Assuming that {P i } N i=1 is an i.i.d. sample distributed according to E Assuming that each P i is unknown. Examples of three different classes of volume-sets Ĝ 1 (R, c) = {P i P µ Pi c 2 H R 2 }, (8) Guevara 2015 (IME-USP) IBM-Research November 03, / 49

36 Minimum volume sets (MV-set) Assuming that {P i } N i=1 is an i.i.d. sample distributed according to E Assuming that each P i is unknown. Examples of three different classes of volume-sets Ĝ 1 (R, c) = {P i P µ Pi c 2 H R 2 }, (8) Ĝ 2 (R, c) = {P i P µ Pi c 2 H R 2, µ P 2 H = 1}. (9) Guevara 2015 (IME-USP) IBM-Research November 03, / 49

37 Minimum volume sets (MV-set) Assuming that {P i } N i=1 is an i.i.d. sample distributed according to E Assuming that each P i is unknown. Examples of three different classes of volume-sets Ĝ 1 (R, c) = {P i P µ Pi c 2 H R 2 }, (8) Ĝ 2 (R, c) = {P i P µ Pi c 2 H R 2, µ P 2 H = 1}. (9) Ĝ 3 (K) = {P i P P i ( k(x i,.) c 2 H R 2 ) 1 κ i }. (10) Guevara 2015 (IME-USP) IBM-Research November 03, / 49

38 First SMDD model Ĝ 1 (R, c) = {P i P µ Pi c 2 H R 2 } Given the mean functions {µ Pi } N i=1 of {P i} N i=1, the SMDD is: Problem min c H,R R +,ξ R N subject to R 2 + λ N i=1 ξ i µ Pi c 2 H R 2 + ξ i, i = 1,..., N ξ i 0, i = 1,..., N. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

39 Proposition (Dual form) The dual form of the previously problem is given by: Problem max α R N subject to N N α i k(pi, P i ) α i α j k(pi, P j ) i=1 i,j=1 0 α i λ, i = 1,..., N N α i = 1 i=1 where k(p i, P j ) = µ Pi, µ Pj H and α is a Lagrange multiplier vector with non negative components α i. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

40 Proposition (Representer theorem) c(.) = i α i µ Pi, i {i I 0 < α i λ}, where I = {1, 2,..., N}. Furthermore, all P i, i {i I α i = 0} are inside the MV-set Ĝα. All P i, i {i I α i = λ} are the training errors. All P i, i {i I 0 < α i < λ} are the support measures. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

41 Proposition (Representer theorem) c(.) = i α i µ Pi, i {i I 0 < α i λ}, where I = {1, 2,..., N}. Furthermore, Theorem all P i, i {i I α i = 0} are inside the MV-set Ĝ α. All P i, i {i I α i = λ} are the training errors. All P i, i {i I 0 < α i < λ} are the support measures. Let η be the Lagrange multiplier of the constraint N i=1 α i = 1, then R 2 = η + c 2 H. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

42 Proposition (Representer theorem) c(.) = i α i µ Pi, i {i I 0 < α i λ}, where I = {1, 2,..., N}. Furthermore, Theorem all P i, i {i I α i = 0} are inside the MV-set Ĝ α. All P i, i {i I α i = λ} are the training errors. All P i, i {i I 0 < α i < λ} are the support measures. Let η be the Lagrange multiplier of the constraint N i=1 α i = 1, then R 2 = η + c 2 H. If P t is described by this SMDD model, then this must be true: µ Pt c 2 H = k(p t, P t ) 2 i α i k(pi, P t ) + i,j α i α j k(pi, P j ) R, (11) Guevara 2015 (IME-USP) IBM-Research November 03, / 49

43 Second SMDD model Mean maps with stationary kernels do not have constant norm µ P H = E P [k I (X., )] H E P [ k I (X., ) H ] = ɛ normalize mean maps to lie on a surface of some hypersphere k(p i, P j ) = the injectivity of µ : P H is preserved. µ P, µ Q H µp, µ P H µ Q, µ Q H, (12) Figure : Figure from a Muandet et al "One-class support measure machines for group anomaly detection." Guevara 2015 (IME-USP) IBM-Research November 03, / 49

44 This could be modeled by the class of volume sets Ĝ 2 (R, c) = {P i P µ Pi c 2 H R 2, µ P 2 H = 1}. (13) and formulated by the following optimization problem: Problem (M2) max α R N subject to N α i α j k(pi, P j ) i,j=1 0 α i λ, i = 1,..., N N α i = 1. i=1 Guevara 2015 (IME-USP) IBM-Research November 03, / 49

45 Third SMDD model Estimate the MV-set over the class of the volume-sets given by Ĝ 3 (K) = {P i P P i ( k(x i,.) c 2 H R 2 ) 1 κ i }. (14) Given the mean functions {µ Pi } N i=1 of {P i} N i=1, and {κ i} N i=1, κ i [0, 1], the SMDD model is the following chance constrained problem: Problem min c H,R R,ξ R N R 2 + λ N i=1 ξ i subject to P i ( k(x i,.) c(.) 2 H R 2 + ξ i ) 1 κ i, ξ i 0, for all i = 1,..., N. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

46 Lemma Markov s inequality. P i ( k(x i,.) c(.) 2 H R 2 + ξ i ) E P[ k(x i,.) c(.) 2 H ] R 2 + ξ i, (15) holds, for all i = 1, 2,..., N. Trace of the covariance operator: Σ H : H H, tr(σ H ) = E P [k(x, X )] k(p, P), (16) This allows to use the following result E P [ k(x,.) c(.) 2 H] = tr(σ H ) + µ P c(.) 2 H. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

47 Deterministic form Given the mean functions {µ Pi } N i=1 of {P i} N i=1 and {κ i} N i=1, κ i (0, 1], the SMDD model is: Problem min c H,R R,ξ R N R 2 + λ N i=1 ξ i subject to µ Pi c(.) 2 H (R 2 + ξ i )κ i tr(σ H i ), ξ i 0, for all i = 1,..., N Guevara 2015 (IME-USP) IBM-Research November 03, / 49

48 Proposition (Dual form) The dual form is given by the following fractional programming problem Problem (M1) max α R N subject to N α i µ Pi, µ Pi H i=1 + N α i tr(σ H i ) i=1 0 α i κ i λ, i = 1,..., N N α i κ i = 1, i=1 N i,j=1 α iα j µ Pi, µ Pj H N i=1 α i where µ Pi, µ Pj H is computed by k(p i, P j ) Guevara 2015 (IME-USP) IBM-Research November 03, / 49

49 Proposition (Representer theorem) c(.) = where I = {1, 2,..., N}. Furthermore, Theorem i α iµ Pi i α, i {i I 0 < α i κ i λ}, (17) i all P i, i {i I α i = 0} are inside the MV-set Ĝ α. All P i, i {i I α i κ i = λ} are the training errors. All P i, i {i I 0 < α i κ i < λ} are the support measures. Let η be the Lagrange multiplier of the constraint N i=1 α iκ i = 1 then R 2 = η. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

50 Group anomalies could be detected if the term µ Pt c 2 H + tr(σh t ) R is true, where µ Pt c is given by k(p t, P t ) 2 i α i k(pi, P t ) + i,j α i α j k(pi, P j ) + tr(σ H t ) (18) Guevara 2015 (IME-USP) IBM-Research November 03, / 49

51 Outline 1 Who am I? 2 Introduction 3 Hilbert space embedding for distributions 4 SMDD models 5 Experiments 6 Conclusions and further research Guevara 2015 (IME-USP) IBM-Research November 03, / 49

52 Group anomaly detection in astronomical data The Star KIC phenomena Guevara 2015 (IME-USP) IBM-Research November 03, / 49

53 Group anomaly detection in astronomical data The Star KIC phenomena Guevara 2015 (IME-USP) IBM-Research November 03, / 49

54 Group anomaly detection in astronomical data The Star KIC phenomena Guevara 2015 (IME-USP) IBM-Research November 03, / 49

55 Group anomaly detection in astronomical data (a) Cluster of galaxies (b) A galaxy. (c) Spectrum of a galaxy. Pictures from Guevara 2015 (IME-USP) IBM-Research November 03, / 49

56 Guevara 2015 (IME-USP) (m) AUC. (n) ACC Non IBM-Research A. (o) ACC Anom. (p) November Médias grup. 03, / 49 (a) AUC. (b) ACC Non A. (c) ACC Anom. (d) Means. (e) AUC. (f) ACC Non A. (g) ACC Anom. (h) Means. (i) AUC. (j) ACC Non A. (k) ACC Anom. (l) Means.

57 Outline 1 Who am I? 2 Introduction 3 Hilbert space embedding for distributions 4 SMDD models 5 Experiments 6 Conclusions and further research Guevara 2015 (IME-USP) IBM-Research November 03, / 49

58 Conclusions and further research SMDD is a non parametric kernel method, with promising results in group anomaly detection task Guevara 2015 (IME-USP) IBM-Research November 03, / 49

59 Conclusions and further research SMDD is a non parametric kernel method, with promising results in group anomaly detection task Performance depends on the kernel choice, and regularization parameter Guevara 2015 (IME-USP) IBM-Research November 03, / 49

60 Conclusions and further research SMDD is a non parametric kernel method, with promising results in group anomaly detection task Performance depends on the kernel choice, and regularization parameter More models can be easily induced based on the idea of MV-sets Guevara 2015 (IME-USP) IBM-Research November 03, / 49

61 Conclusions and further research SMDD is a non parametric kernel method, with promising results in group anomaly detection task Performance depends on the kernel choice, and regularization parameter More models can be easily induced based on the idea of MV-sets Further research Create models for classification, regression and other ML tasks Guevara 2015 (IME-USP) IBM-Research November 03, / 49

62 Conclusions and further research SMDD is a non parametric kernel method, with promising results in group anomaly detection task Performance depends on the kernel choice, and regularization parameter More models can be easily induced based on the idea of MV-sets Further research Create models for classification, regression and other ML tasks exploit structural information of groups Guevara 2015 (IME-USP) IBM-Research November 03, / 49

63 Conclusions and further research SMDD is a non parametric kernel method, with promising results in group anomaly detection task Performance depends on the kernel choice, and regularization parameter More models can be easily induced based on the idea of MV-sets Further research Create models for classification, regression and other ML tasks exploit structural information of groups References Guevara, Jorge and Canu, Stephane and Hirata Jr, Roberto Support Measure Data Description for Group Anomaly Detection 2015, SIGACM KDD, ODDx3 15. Guevara, Jorge and Canu, Stephane and Hirata Jr, Roberto Support Measure Data Description, 2014, Techical Report IME-USP. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

64 Any questions? Guevara 2015 (IME-USP) IBM-Research November 03, / 49

65 Point-based group anomaly detection over a Gaussian mixture distribution dataset Red, blue and magenta boxes: group anomalies. Green and yellow boxes: non-anomalous groups. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

66 M4: OCSMM M5: SVDD kernel on probability measures: gaussian kernel, γ given by the median heuristic. 200 runs,training set=50 groups, test set = 30 groups (20 group anomalies). métricas AUC e ACC (a) AUC. (b) ACC Non A. (c) ACC Anom. (d) Médias grup. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

67 Distribution-based group anomaly detection over a Gaussian mixture distribution dataset (a) AUC. (b) ACC Non A. (c) ACC Anom. (d) Médias grup. (e) AUC. (f) ACC Non A. (g) ACC Anom. (h) Médias grup. Guevara 2015 (IME-USP) IBM-Research November 03, / 49

Supervised machine learning with kernel embeddings of fuzzy sets and probability measures

Supervised machine learning with kernel embeddings of fuzzy sets and probability measures Jorge Guevara To cite this version: Jorge Guevara. Supervised machine learning with kernel embeddings of fuzzy