Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Similar documents
Anomalous Association Rules

Mining Exceptional Relationships with Grammar-Guided Genetic Programming

Mining Positive and Negative Fuzzy Association Rules

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

Alternative Approach to Mining Association Rules

Removing trivial associations in association rule discovery

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

A Methodology for Direct and Indirect Discrimination Prevention in Data Mining

CS5112: Algorithms and Data Structures for Applications

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti

Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion

An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers

Handling a Concept Hierarchy

Data Analytics Beyond OLAP. Prof. Yanlei Diao

Machine Learning: Pattern Mining

Correlation Preserving Unsupervised Discretization. Outline

Data-Driven Logical Reasoning

Association Rule Mining on Web

732A61/TDDD41 Data Mining - Clustering and Association Analysis

Association Analysis. Part 1

Mining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany


Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Formal Concept Analysis

Discovering Non-Redundant Association Rules using MinMax Approximation Rules

Assignment 7 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

An Approach to Classification Based on Fuzzy Association Rules

CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS. Due to elements of uncertainty many problems in this world appear to be

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Mining Strong Positive and Negative Sequential Patterns

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

CSE 5243 INTRO. TO DATA MINING

DATA MINING LECTURE 3. Frequent Itemsets Association Rules

Frequent Pattern Mining: Exercises

Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies

Statistical Privacy For Privacy Preserving Information Sharing

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05

COMP 5331: Knowledge Discovery and Data Mining

The Market-Basket Model. Association Rules. Example. Support. Applications --- (1) Applications --- (2)

Mining Infrequent Patter ns

CSE 5243 INTRO. TO DATA MINING

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

Processing Count Queries over Event Streams at Multiple Time Granularities

Introduction to Data Mining

Regression and Correlation Analysis of Different Interestingness Measures for Mining Association Rules

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Quantitative Association Rule Mining on Weighted Transactional Data

Data Mining and Analysis: Fundamental Concepts and Algorithms

On Minimal Infrequent Itemset Mining

Guaranteeing the Accuracy of Association Rules by Statistical Significance

15 Introduction to Data Mining

Algorithmic Methods of Data Mining, Fall 2005, Course overview 1. Course overview

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

CPDA Based Fuzzy Association Rules for Learning Achievement Mining

Accelerating Effect of Attribute Variations: Accelerated Gradual Itemsets Extraction

A Posteriori Corrections to Classification Methods.

D B M G Data Base and Data Mining Group of Politecnico di Torino

Un nouvel algorithme de génération des itemsets fermés fréquents

Association Rules. Fundamentals

A Novel Approach of Multilevel Positive and Negative Association Rule Mining for Spatial Databases

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Association Rules. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION

Data mining, 4 cu Lecture 7:

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

NetBox: A Probabilistic Method for Analyzing Market Basket Data

Data Warehousing & Data Mining

CS 484 Data Mining. Association Rule Mining 2

Basic Data Structures and Algorithms for Data Profiling Felix Naumann

A Concise Representation of Association Rules using Minimal Predictive Rules

1 Frequent Pattern Mining

Association Rules Discovery in Multivariate Time Series

Classification Based on Logical Concept Analysis

Dynamic Programming Approach for Construction of Association Rule Systems

Explaining Results of Neural Networks by Contextual Importance and Utility

Association Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci

CS4445 B10 Homework 4 Part I Solution

EFFICIENT MINING OF WEIGHTED QUANTITATIVE ASSOCIATION RULES AND CHARACTERIZATION OF FREQUENT ITEMSETS

CS 584 Data Mining. Association Rule Mining 2

DATA MINING - 1DL360

10/19/2017 MIST.6060 Business Intelligence and Data Mining 1. Association Rules

Approximate counting: count-min data structure. Problem definition

DATA MINING - 1DL105, 1DL111

Mining chains of relations

Introduction to Spatial Data Mining

A Clear View on Quality Measures for Fuzzy Association Rules

Data Warehousing & Data Mining

Mining State Dependencies Between Multiple Sensor Data Sources

Rare Event Discovery And Event Change Point In Biological Data Stream

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Pattern Structures 1

Association Analysis Part 2. FP Growth (Pei et al 2000)

Geovisualization for Association Rule Mining in CHOPS Well Data

Transcription:

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013

Detecting Anom and Exc Behaviour on Credit Data by means of AR 2/20 Motivation Association Rules allow to identify novel, useful and comprehensive knowledge from databases. They symbolize the presence of a set of items together in most of the transactions. Previous approaches using data mining techniques for fraud detection try to discover the usual profiles of legitimate customer behaviour and then search the anomalies using different methodologies such us clustering. New approaches have been developed based on obtaining different kinds of knowledge: peculiar rules, infrequent rules, exception rules, anomalous rules... These rules have several advantages: They provide a comprehensive understanding of a type of information. In general, they are less numerous.

Detecting Anom and Exc Behaviour on Credit Data by means of AR 3/20 Objective To automatically detect exceptional or anomalous behaviour that could help for fraud detection, obtaining the common customer behaviour as well as some indicators (exceptions) that happen when the behaviour deviates from an usual one and the anomalous deviations (anomalies).

Detecting Anom and Exc Behaviour on Credit Data by means of AR 4/20 Overview 1. Brief Introduction to Association Rules 2. Exception Rules Our Proposal for Mining Exception Rules 3. Anomalous Rules Our Proposal for Mining Anomalous Rules 4. Algorithm and Implementation Issues 5. Experimental Evaluation 6. Conclusions and Future Research 7. References

Detecting Anom and Exc Behaviour on Credit Data by means of AR 5/20 Brief Introduction to Association Rules Data is usually stored in datasets D composed by transactions t i (rows) and attributes (columns). We call item to a pair attribute, value or attribute, interval. D i 1 i 2... i j i j+1... i m t 1 1 0... 0 1... 0 t 2 0 1... 1 1... 1............ t n 1 1... 0 1... 1 Association Rules are expressions of the form A B where A, B are non-empty set of items with no intersection. An association rule represents a relation between the conjoint occurrence of A and B.

Detecting Anom and Exc Behaviour on Credit Data by means of AR 6/20 Brief Introduction to Association Rules The support of an itemset A is defined as probability that a transaction contains the item supp(a) = t D : A t D For assessing the ARs validity, the most common measures are support (joint probability P (A B)) and conf idence (conditional probability P (B A) Supp(A B) = supp(a B) supp(a B) ; Conf(A B) = D supp (A) that must be minsupp and minconf resp. (thresholds imposed by the user), that is, the rule is frequent and confident.

Detecting Anom and Exc Behaviour on Credit Data by means of AR 7/20 Brief Introduction to Association Rules An alternative framework is to measure the accuracy by means of the certainty factor, CF (A B) Conf(A B) supp(b) if Conf(A B) > supp(b) 1 supp(b) Conf(A B) supp(b) if Conf(A B) < supp(b) supp(b) 0 otherwise. CF measures how our belief that B is in a transaction changes when we are told that A is in that transaction. Certainty factor has better properties than confidence and other quality measures, in particular, it helps to reduce the number of rules obtained by filtering those rules corresponding to statistical independence or negative dependence. When CF (A B) mincf the rule is called certain.

Detecting Anom and Exc Behaviour on Credit Data by means of AR 8/20 Exception Rules Idea: An attribute interacting with another may change the consequent of an association rule [Suzuki et al., 1996]. Interpretation: Example: X strongly implies the fulfilment of Y, but, there exists E such that X E implies Y. IF the patient takes antibiotics, THEN it }{{}} tends {{ to recover }, X Y UNLESS staphylococcus appears, }{{} E This example shows how the presence of E changes the usual behaviour of rule X Y.

Detecting Anom and Exc Behaviour on Credit Data by means of AR 9/20 Our Proposal for Mining Exception Rules Formally, let D X = {t D : X t}. An exception rule is a pair (csr, exc) satisfying: X Y is frequent and certain in D. E Y is certain in D X. where the certainty factor is used instead of the confidence. Advantages: The quantity of rule pairs (csr, exc) is reduced. Using CF instead of Conf more reliable rules are obtained.

Detecting Anom and Exc Behaviour on Credit Data by means of AR 10/20 Anomalous Rules Idea: Anomalous Rules are rules that come to the surface when the dominant effect produced by a strong rule (csr) is removed [Berzal et al., 2004] Interpretation: When X, then we have either Y (usually) or A (unusually) This is captured by the set of rules: X strongly implies Y, but in those cases where X implies Y, then X confidently implies A Example: IF a patient have symptoms X, THEN he has the disease Y, IF NOT, he has the disease A, This example shows that the anomalous rule try to capture what is the deviation (A) from the usual behaviour(x Y ).

Detecting Anom and Exc Behaviour on Credit Data by means of AR 11/20 Our Proposal for Mining Anomalous Rules Formally, let D X = {t D : X t}. An anomalous rule is a triple (csr, anom, ref) satisfying: X Y (csr) is frequent and certain in D. Y A (anom) is certain in D X. A Y (ref) is certain in D X. Advantages: The quantity of rule triples (csr, anom, ref) is reduced. (More restrictive approach than that of Berzal et al.) Using CF instead of Conf more reliable rules are obtained.

Detecting Anom and Exc Behaviour on Credit Data by means of AR 12/20 Algorithm and Implementation Issues ERSA and ARSA (Exception/Anomalous Rule Search Algorithm) are able to mine together the set of common sense rules their associated exceptions They are based on the Apriori Algorithm using a bit-string representation of items which speeds up the logical operations (, ) Complexity: It depends on D = n, the number of items (i), and the number of csr obtained in the first part of the algorithm (r) O(nri2 i ).

Detecting Anom and Exc Behaviour on Credit Data by means of AR 13/20 ERSA algorithm Input: Transactional database, minsupp, minconf or mincf Output: Set of association rules with their associated exception rules. 1. Database Preprocessing 1.1 Transformation of the trans. database into a boolean one. 1.2 Database storage into a vector of BitSets. 2. Mining Process 2.1 Mining Common Sense Rules Searching the set of candidates (frequent itemsets) for the csr. Storing the indexes of BitSet vectors and support of candidates. csr extraction exceeding minsupp and minconf/mincf 2.2.1 Mining Exception Rules For every csr X Y we compute the possible exceptions: For each item E I (except those in the csr) Compute X E Y and its support. Compute X E and its support. Compute supp X ( Y ). If CF X (E Y ) mincf then this is an exc.

Detecting Anom and Exc Behaviour on Credit Data by means of AR 14/20 Experimental Evaluation Database: German-statLog about credit bank data from the UCI Machine Learning repository. 1000 transactions 21 attributes: 18 categorical or numerical, 3 continuous (categorized into meaningful intervals) 1.73GHz Intel Core 2Duo notebook with 1024MB of main memory running Windows 7 using Java. The maximum number of items in the antecedent or the consequent of the csr is limited to 3 in order to obtain more manageable rules.

Detecting Anom and Exc Behaviour on Credit Data by means of AR 15/20 Experimental Evaluation Number of csr, exc and anom rules found for different thresholds in German-statlog database. minsupp mincf = 0.8 mincf = 0.9 mincf = 0.95 csr exc anom csr exc anom csr exc anom 0.08 674 66 326 309 11 39 270 6 10 0.1 384 27 208 137 4 12 123 3 5 0.12 226 11 142 62 1 3 57 0 2

Detecting Anom and Exc Behaviour on Credit Data by means of AR 16/20 Experimental Evaluation Time in seconds for mining exception and anomalous rules for different thresholds in German-statlog database. minsupp mincf = 0.8 mincf = 0.9 mincf = 0.95 ERSA ARSA ERSA ARSA ERSA ARSA 0.08 137 139 116 116 115 116 0.1 73 71 64 63 63 64 0.12 43 43 38 38 38 38

Detecting Anom and Exc Behaviour on Credit Data by means of AR 17/20 Experimental Evaluation Some of the obtained rules are: IF present employment since 7 years AN D status & sex = single male T HEN people being liable to provide maintenance for = 1 (Supp = 0.105 & CF = 0.879) EXCEP T when Purpose = business (CF = 1). IF property = real estate AN D number of existing credits on this bank = 1 T HEN age is in between 18 and 25 (Supp = 0.082 & CF = 0.972) OR property = car (unusually with CF 1 = 1, CF 2 = 1).

Detecting Anom and Exc Behaviour on Credit Data by means of AR 18/20 Conclusions and Future Research We have given new proposals for mining exception and anomalous rules. We provide efficient algorithms for mining these kinds of rules. The implementations have been run in a credit bank database obtaining a manageable set of interesting rules that should be analysed by an expert. Future: Development of new approaches for exception and anomalous rules with uncertain data.

References [Suzuki et al., 1996] E. Suzuki and M. Shimura. Exceptional knowledge discovery in databases based on information theory. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 275?278. AAAI Press, 1996. [Berzal et al., 2004] F. Berzal, J.C. Cubero, N. Marín, and M. Gámez. Anomalous association rules. In IEEE ICDM Workshop Alternative Techniques for Data Mining and Knowledge Discovery, 2004. [Delgado et al., 2011] M. Delgado, M.D. Ruiz, and D. Sánchez. New Approaches for Discovering Exception and Anomalous Rules. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 19, No. 2 pp. 361-399, 2011. Detecting Anom and Exc Behaviour on Credit Data by means of AR 19/20

Thank you. Any questions? Detecting Anom and Exc Behaviour on Credit Data by means of AR 20/20