Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

Size: px

Start display at page:

Download "Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection"

Beatrice Dean
5 years ago
Views:

1 Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Tsuyoshi Idé ( Ide-san ), Ankush Khandelwal*, Jayant Kalagnanam IBM Research, T. J. Watson Research Center (*Currently with University of Minnesota) Proceedings of the 2016 IEEE International Conference on Data Mining (ICDM 16), Dec , 2016, pp

2 Summary: Gaussian mixture + anomaly detection Newly added features: Principled variable-wise scoring Mixture extension of sparse structure learning 2

3 Define variable-wise anomaly score as conditional log-loss Anomaly score for the i-th variable of a new sample conditional predictive distribution x i : the i-th variable x -i : the rest D : training data c.f. overall score [Yamanishi+ 00] predictive distribution for x a will be large if x falls in the area where p(x D) is small 3

4 (For ref.) Why negative log p? It reproduces Mahalanobis distance in the single Gaussian case Gaussian with mean μ and covariance Σ Mahalanobis distance 4

5 Use mixture of Gaussian Markov random fields (GMRF) for the conditional distribution Variable-wise mixture weight GMRF GMRF is characterized as conditional distribution of Gaussian GMRF descries dependency among variables 5

6 Mixture model with i.i.d. assumption on data is a practical compromise: Compressor data example This is normal operation data It looks piece-wise stationary (with heavy noise) Time-series modeling looks too hard 6

7 Tackling noisy data of complex systems: Strategy of designing inference algorithm How to choose K? Give a large enough K first. Let the algorithm decide on an optimal value. How to handle heavy noise? How to stably learn the model? Use sparsity-enforcing prior to leverage sparse structure learning 7

8 Two-step approach to GMRF mixture learning: Model Step 1: Find GMRF parameters Observation model Step 2: Find variable-wise weights given GMRF parameters Observation model Priors: Gauss-Laplace for (μ k, Λ k ) Categorical for {z k } (each sample) Priors: Categorical-Dirichlet for {h ki } (each sample) 8

9 Two-step approach to GMRF mixture learning: Inference Use variational Bayes (VB) for the 1 st and 2 nd steps The 1 st step achieves sparsity over variable dependency and mixture components o Variable dependency: (iteratively) solve graphical lasso [Friedman+ 08] o Mixture components: (iteratively) point-estimated for ARD (automated relevance determination) [Corduneanu+ 01] Details paper 9

Overview of the approach (for multivariate noisy sensor data) K Initialize o Randomly pick time-series blocks with a large enough K o Run graphical lasso separately to initialize {(μ k, Λ k )} Step 1

10 Overview of the approach (for multivariate noisy sensor data) K Initialize o Randomly pick time-series blocks with a large enough K o Run graphical lasso separately to initialize {(μ k, Λ k )} Step 1 o Iteratively update {(μ k, Λ k )} and o remove clusters with zero weight Step 2 o Compute variable-wise mixture weights o Produce anomaly scoring model K < K Surviving models with adjusted model parameters time removed anomaly scoring model 10

Results: Synthetic data (see paper for real

K=7 x 1 x 2 x 3 o Achieved better performance in

11 Results: Synthetic data (see paper for real application) Pattern A Pattern B Data generated o Training: A-B-A-B o Testing: A-B-(anomaly) Results o Successfully recovered 2 major patterns starting from K=7 x 1 x 2 x 3 o Achieved better performance in anomaly detection (in terms of AUC) x 4 x 5 (a) training (b) testing 11

12 Conclusion Proposed a new outlier detection method, the sparse GMRF mixture Our method is capable of handling multiple operational states in the normal condition and variable-wise anomaly scores. Derived variational Bayes iterative equations based on the Gaussdelta posterior model 12

13 Thank you! 13

density IBM Research Results: Detecting pump surge failures Computed anomaly score for x 14, which is a flow-rate variable, based on the normal state model learned Meta model learned Compared anomaly

14 density IBM Research Results: Detecting pump surge failures Computed anomaly score for x 14, which is a flow-rate variable, based on the normal state model learned Meta model learned Compared anomaly score between the pre-failure region (24h) and several normal periods o Black: normal o red: pre-failure period Clearly outperformed alternative methods including neural network (autoencoder) Proposed single model sparse PCA autoencoder Pre-failure period has higher anomaly score Separation is not clear: Detection is impossible anomaly score 14

15 Leveraging variational Bayes method for inference Assumption: posterior distribution is factorized Posterior is determined so that the KL divergence between the factorized form and the full posterior o Full posterior is proportional to the complete likelihood 15

Two-step approach to GMRF mixture learning: Inference Use variational Bayes (VB) for the 1 st and 2 nd steps The 1 st step achieves sparsity over variable dependency and mixture components o Variable

16 Two-step approach to GMRF mixture learning: Inference Use variational Bayes (VB) for the 1 st and 2 nd steps The 1 st step achieves sparsity over variable dependency and mixture components o Variable dependency: (iteratively) solve graphical lasso o Mixture components: point-estimated for ARD (automated relevance determination) [Corduneanu-Bishop 01] VB iteration for the 1 st step Pointestimated cluster weight graphical lasso [Friedman+ 08] 16

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Sparse Gaussian Marov Random Field Mixtures for Anomaly Detection Tsuyoshi Idé IBM Research T. J. Watson Research Center tide@us.ibm.com Anush Khandelwal University of Minnesota Department of Computer