Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013
Detecting Anom and Exc Behaviour on Credit Data by means of AR 2/20 Motivation Association Rules allow to identify novel, useful and comprehensive knowledge from databases. They symbolize the presence of a set of items together in most of the transactions. Previous approaches using data mining techniques for fraud detection try to discover the usual profiles of legitimate customer behaviour and then search the anomalies using different methodologies such us clustering. New approaches have been developed based on obtaining different kinds of knowledge: peculiar rules, infrequent rules, exception rules, anomalous rules... These rules have several advantages: They provide a comprehensive understanding of a type of information. In general, they are less numerous.
Detecting Anom and Exc Behaviour on Credit Data by means of AR 3/20 Objective To automatically detect exceptional or anomalous behaviour that could help for fraud detection, obtaining the common customer behaviour as well as some indicators (exceptions) that happen when the behaviour deviates from an usual one and the anomalous deviations (anomalies).
Detecting Anom and Exc Behaviour on Credit Data by means of AR 4/20 Overview 1. Brief Introduction to Association Rules 2. Exception Rules Our Proposal for Mining Exception Rules 3. Anomalous Rules Our Proposal for Mining Anomalous Rules 4. Algorithm and Implementation Issues 5. Experimental Evaluation 6. Conclusions and Future Research 7. References
Detecting Anom and Exc Behaviour on Credit Data by means of AR 5/20 Brief Introduction to Association Rules Data is usually stored in datasets D composed by transactions t i (rows) and attributes (columns). We call item to a pair attribute, value or attribute, interval. D i 1 i 2... i j i j+1... i m t 1 1 0... 0 1... 0 t 2 0 1... 1 1... 1............ t n 1 1... 0 1... 1 Association Rules are expressions of the form A B where A, B are non-empty set of items with no intersection. An association rule represents a relation between the conjoint occurrence of A and B.
Detecting Anom and Exc Behaviour on Credit Data by means of AR 6/20 Brief Introduction to Association Rules The support of an itemset A is defined as probability that a transaction contains the item supp(a) = t D : A t D For assessing the ARs validity, the most common measures are support (joint probability P (A B)) and conf idence (conditional probability P (B A) Supp(A B) = supp(a B) supp(a B) ; Conf(A B) = D supp (A) that must be minsupp and minconf resp. (thresholds imposed by the user), that is, the rule is frequent and confident.
Detecting Anom and Exc Behaviour on Credit Data by means of AR 7/20 Brief Introduction to Association Rules An alternative framework is to measure the accuracy by means of the certainty factor, CF (A B) Conf(A B) supp(b) if Conf(A B) > supp(b) 1 supp(b) Conf(A B) supp(b) if Conf(A B) < supp(b) supp(b) 0 otherwise. CF measures how our belief that B is in a transaction changes when we are told that A is in that transaction. Certainty factor has better properties than confidence and other quality measures, in particular, it helps to reduce the number of rules obtained by filtering those rules corresponding to statistical independence or negative dependence. When CF (A B) mincf the rule is called certain.
Detecting Anom and Exc Behaviour on Credit Data by means of AR 8/20 Exception Rules Idea: An attribute interacting with another may change the consequent of an association rule [Suzuki et al., 1996]. Interpretation: Example: X strongly implies the fulfilment of Y, but, there exists E such that X E implies Y. IF the patient takes antibiotics, THEN it }{{}} tends {{ to recover }, X Y UNLESS staphylococcus appears, }{{} E This example shows how the presence of E changes the usual behaviour of rule X Y.
Detecting Anom and Exc Behaviour on Credit Data by means of AR 9/20 Our Proposal for Mining Exception Rules Formally, let D X = {t D : X t}. An exception rule is a pair (csr, exc) satisfying: X Y is frequent and certain in D. E Y is certain in D X. where the certainty factor is used instead of the confidence. Advantages: The quantity of rule pairs (csr, exc) is reduced. Using CF instead of Conf more reliable rules are obtained.
Detecting Anom and Exc Behaviour on Credit Data by means of AR 10/20 Anomalous Rules Idea: Anomalous Rules are rules that come to the surface when the dominant effect produced by a strong rule (csr) is removed [Berzal et al., 2004] Interpretation: When X, then we have either Y (usually) or A (unusually) This is captured by the set of rules: X strongly implies Y, but in those cases where X implies Y, then X confidently implies A Example: IF a patient have symptoms X, THEN he has the disease Y, IF NOT, he has the disease A, This example shows that the anomalous rule try to capture what is the deviation (A) from the usual behaviour(x Y ).
Detecting Anom and Exc Behaviour on Credit Data by means of AR 11/20 Our Proposal for Mining Anomalous Rules Formally, let D X = {t D : X t}. An anomalous rule is a triple (csr, anom, ref) satisfying: X Y (csr) is frequent and certain in D. Y A (anom) is certain in D X. A Y (ref) is certain in D X. Advantages: The quantity of rule triples (csr, anom, ref) is reduced. (More restrictive approach than that of Berzal et al.) Using CF instead of Conf more reliable rules are obtained.
Detecting Anom and Exc Behaviour on Credit Data by means of AR 12/20 Algorithm and Implementation Issues ERSA and ARSA (Exception/Anomalous Rule Search Algorithm) are able to mine together the set of common sense rules their associated exceptions They are based on the Apriori Algorithm using a bit-string representation of items which speeds up the logical operations (, ) Complexity: It depends on D = n, the number of items (i), and the number of csr obtained in the first part of the algorithm (r) O(nri2 i ).
Detecting Anom and Exc Behaviour on Credit Data by means of AR 13/20 ERSA algorithm Input: Transactional database, minsupp, minconf or mincf Output: Set of association rules with their associated exception rules. 1. Database Preprocessing 1.1 Transformation of the trans. database into a boolean one. 1.2 Database storage into a vector of BitSets. 2. Mining Process 2.1 Mining Common Sense Rules Searching the set of candidates (frequent itemsets) for the csr. Storing the indexes of BitSet vectors and support of candidates. csr extraction exceeding minsupp and minconf/mincf 2.2.1 Mining Exception Rules For every csr X Y we compute the possible exceptions: For each item E I (except those in the csr) Compute X E Y and its support. Compute X E and its support. Compute supp X ( Y ). If CF X (E Y ) mincf then this is an exc.
Detecting Anom and Exc Behaviour on Credit Data by means of AR 14/20 Experimental Evaluation Database: German-statLog about credit bank data from the UCI Machine Learning repository. 1000 transactions 21 attributes: 18 categorical or numerical, 3 continuous (categorized into meaningful intervals) 1.73GHz Intel Core 2Duo notebook with 1024MB of main memory running Windows 7 using Java. The maximum number of items in the antecedent or the consequent of the csr is limited to 3 in order to obtain more manageable rules.
Detecting Anom and Exc Behaviour on Credit Data by means of AR 15/20 Experimental Evaluation Number of csr, exc and anom rules found for different thresholds in German-statlog database. minsupp mincf = 0.8 mincf = 0.9 mincf = 0.95 csr exc anom csr exc anom csr exc anom 0.08 674 66 326 309 11 39 270 6 10 0.1 384 27 208 137 4 12 123 3 5 0.12 226 11 142 62 1 3 57 0 2
Detecting Anom and Exc Behaviour on Credit Data by means of AR 16/20 Experimental Evaluation Time in seconds for mining exception and anomalous rules for different thresholds in German-statlog database. minsupp mincf = 0.8 mincf = 0.9 mincf = 0.95 ERSA ARSA ERSA ARSA ERSA ARSA 0.08 137 139 116 116 115 116 0.1 73 71 64 63 63 64 0.12 43 43 38 38 38 38
Detecting Anom and Exc Behaviour on Credit Data by means of AR 17/20 Experimental Evaluation Some of the obtained rules are: IF present employment since 7 years AN D status & sex = single male T HEN people being liable to provide maintenance for = 1 (Supp = 0.105 & CF = 0.879) EXCEP T when Purpose = business (CF = 1). IF property = real estate AN D number of existing credits on this bank = 1 T HEN age is in between 18 and 25 (Supp = 0.082 & CF = 0.972) OR property = car (unusually with CF 1 = 1, CF 2 = 1).
Detecting Anom and Exc Behaviour on Credit Data by means of AR 18/20 Conclusions and Future Research We have given new proposals for mining exception and anomalous rules. We provide efficient algorithms for mining these kinds of rules. The implementations have been run in a credit bank database obtaining a manageable set of interesting rules that should be analysed by an expert. Future: Development of new approaches for exception and anomalous rules with uncertain data.
References [Suzuki et al., 1996] E. Suzuki and M. Shimura. Exceptional knowledge discovery in databases based on information theory. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 275?278. AAAI Press, 1996. [Berzal et al., 2004] F. Berzal, J.C. Cubero, N. Marín, and M. Gámez. Anomalous association rules. In IEEE ICDM Workshop Alternative Techniques for Data Mining and Knowledge Discovery, 2004. [Delgado et al., 2011] M. Delgado, M.D. Ruiz, and D. Sánchez. New Approaches for Discovering Exception and Anomalous Rules. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, Vol. 19, No. 2 pp. 361-399, 2011. Detecting Anom and Exc Behaviour on Credit Data by means of AR 19/20
Thank you. Any questions? Detecting Anom and Exc Behaviour on Credit Data by means of AR 20/20