Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence

Size: px

Start display at page:

Download "Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence"

Darcy Warren
5 years ago
Views:

1 Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence Yong Shi the Charles W. and Margre H. Durham Distinguished Professor of Information Technology University of Nebraska, USA

2 Introduction -Data Mining -Process -Methodology -Mathematical Tools Contents Linear System Approaches -Linear Programming Methods -Multiple Criteria Linear Programming Methods Credit Card Intelligence -Real-life Credit Card Portfolio Management -Many Others

3 Data Mining: Introduction a powerful information technology (IT) tool in today s competitive business world. an area of the intersection of human intervention, machine learning, mathematical modeling and databases.

4 Introduction Process: Selecting Transforming Mining Interpreting

5 Introduction Methodology: Association Clustering Classification Predictions Sequential Patterns Similar Time Sequences

6 Introduction Mathematical Tools: statistics decision trees neural networks fuzzy logic linear programming

7 Introduction: Classification Use a training data set for predetermined classes Develop a separation model with rules on the training set Apply the model to classify unknown objects Discover knowledge training set model unknown objects knowledge

8 Linear System Approaches: Linear Programming Linear programming has been used for Classification in Data Mining. Given two attributes {a 1, a 2 } and two groups {G 1, G 2 }, with the observation A i = (A i1, A i2 ), we want to find a scalar b and nonzero vector X = (x 1,x 2 ) such that A i X b, A i G 1 and A i X b, A i G 2 have the fewest number of violated constraints.

9 Let Linear System Approaches α i = the overlapping of two-group (classes) boundary for case A i (external measurement); α = the max overlapping of two-group (classes) boundary for all cases A i (α i < α ); β i = the distance of case A i from its adjusted boundary (internal measurement); β = the min distance of all cases A i to the adjusted boundary (β i > β ); h i = the penalties for α i ( cost of misclassification); k i = the penalties for β i ( cost of misclassification);

10 Linear System Approaches G 1: 1 2 : k G 2: k+1 k+2 : n a 1 a 2 A 11 A 12 A 21 A 22 A k1 A k2 A k+1,1 A k+2,2 A k+2,1 A k+2,2 A n1 A n2 Linear Transformed Scores A i X = b A 1 X* A 2 X* : A k X* A k+1 X* A k+2 X* : A n X*

11 Linear System Approaches Example: Consider a credit-rating problem with two variables and two cases: a 1 = salary and a 2 = age. Let the boundary b = 10 Cases (Obs) a 1 a 2 Boundary b = 10 found best coef. x 1 x 2 LP Score A A

12 Linear System Approaches Find the best (x 1 *, x 2 *) to compare A i X* = a 1 x 1 *+ a 2 x 2 * with b = 10 We see A 1 X* = 7.2 is Bad (< 10) and A 2 X* = 10.4 is Good ( > 10) Bad β i β i Good A i X* = 10 Perfect Separation (α = 0

13 A i X* = b-α A i X* = b+α Linear System Approaches Linear System Approaches Example: Overlapping A i X* = b α α β i β i α i α i Bad Good

14 Linear System Approaches Simple Models (Freed and Glover 1981): Minimize Σ i h i α i Subject to A i X b + α i, A i Bad, A i X b - α i, A i Good, where A i are given, X and b are unrestricted, and α i 0.

15 Linear System Approaches Simple Models (Freed and Glover 1981): OR Maximize Σ i β i Subject to A i X b - β i, A i Bad, A i X b + β i, A i Good, where A i are given, X and b are unrestricted, and β i 0.

16 Linear System Approaches Hybrid Model (Glover 1990): Minimize h α + Σ i h i α i - k β -Σ i k i β i Subject to A i X = b + α + α i - β - β i, A i Bad, A i X = b - α - α i + β + β i, A i Good, where A i are given, X and b are unrestricted, α, α i, β i and β 0.

17 Linear System Approaches Mixed Integer Model (Koehler and Erenguc 1990): Minimize Σ j I 1j + Σ i I 2j Subject to A i X b -M I 1j, A i X b + M I 2j, where A i are given, X 0 and b are unrestricted, I 1j = 1, if A i X < b, A i Bad; Otherwise, 0; I 2j = 1, if A i X > b, A i Good; Otherwise, 0.

18 Linear System Approaches Three-group Model (Freed and Glover 1981): Minimize h 1 α 1 + h 2 α 2 Subject to b L1 A i X b U1, A i G 1, b L2 A i X b U2, A i G 2, b L3 A i X b U3, A i G 3, b U1 + ε b L2 + α 1 b U2 + ε b L3 + α 2 where A i and ε are given, X, b Lj (lower bound) and b Uj (upper bound) are unrestricted, and α j (group overlapping) 0.

19 Linear System Approaches a 2 b L1 b U1 b L2 G 1 b U2 b L3 G 2 b U3 G 3 X Three-group LP Model a 1

20 Linear System Approaches Multi-group Model (Freed and Glover 1981): Minimize Σ j h j α j Subject to b Lj A i X b Uj, A i G j, j =1,, m b Uj + ε b Lj+1 + α j, j =1,, m-1 where A i and ε are given, X, b Lj (lower bound) and b Uj (upper bound) are unrestricted, and α j (group overlapping) 0.

21 Linear System Approaches Multi-group Model (Freed and Glover 1981): OR Minimize Σ j h j α j Subject to b Lj - α j A i X b Uj + α j, A i G j, j =1,, m b Uj b Lj+1, j =1,, m-1 where A i and ε are given, X, b Lj (lower bound) and b Uj (upper bound) are unrestricted, and α j (group overlapping) 0.

22 Linear System Approaches Problems and Challenges of LP Approaches: Different normalization will give the different solution of X and b (boundary). Giving a proper value of b may lead a nice separations. Integer variables may improve the misclassification rate of LP model. Penalty cost of misclassification will change the classifier results. The simple model is verified as an useful model alternative to logistic discriminant (classification) model. Both are better than linear discriminant function and quadratic discriminant function. No comparison between Decision Tree Induction and LP approaches

23 Linear System Approaches: Multi-Criteria Linear Programming Multi-Criteria Linear programming considers to simultaneously minimize the total overlapping degree and maximize the total distance from the boundary of two groups: Minimize Σ i α i and Maximize Σ i β i Subject to A i X = b + α i - β i, A i B, A i X = b - α i + β i, A i G, where A i are given, X and b are unrestricted, and α i and β i 0.

24 Linear System Approaches: Multi-Criteria Linear Programming To utilize the capability of computational power of some commercial software on LP and non-lp problems, we can find the compromise solution for the separation problems (Yu 1973, Yu 1985, and Shi and Yu 1989): Let α* = the ideal overlapping of -Σ i α i ; β* = the ideal distance of Σ i β i. Then, we define the regret function as: -d + α = Σ i α + α*, if -Σ i i α i > α*; otherwise, it is 0. d - α = α* + Σ i α i,if -Σ i α i < α*;otherwise, it is 0. d + β = Σ i β - β*, if Σ i i β i > β*; otherwise, it is 0. d - α = β* - Σ i β i,if Σ i β i < β*; otherwise, it is 0.

25 β i Min (d α + + d α - ) p + (d β + +d α - ) p (α*, β*) 0 -Σ i α i

26 Linear System Approaches: Multi-Criteria Linear Programming Thus, the Multi-Criteria separation problem becomes (Shi and Peng 2001): Min (d α + + d α - ) p + (d β + +d β - ) p Subject to α* + Σ i α i = d α - -d α +, β* - Σ i β i = d β - -d β +, A i X = b + α i - β i, A i G, A i X = b - α i + β i, A i B, where A i, α*, and β* are given, X and b are unrestricted, and α i, β i, d α -, d α +, d β -, d β + 0.

27 R(d; ) R(d; 2) R(d; 1) for p 1 R(d; p) = (d + k + d - k ) p and p = R(d; ) = min x max {d + k + d - k k = 1,, q }

28 Linear System Approaches: Multi-Criteria Linear Programming Multi-Criteria Separation Model for three groups (Shi, Peng, Xu and Tang 2001): Given groups (G 1, G 2, G 3 ), let b 1 = the boundary between G 1 and G 2 ; and b 2 = the boundary between G 2 and G 3 ; α i1 = the overlapping of G 1 and G 2 for all cases A i ; β i1 = the distance of case A i from its adjusted boundary between G 1 and G 2 ; α i2 = the overlapping of G 2 and G 3 for all cases A i ; β i2 = the distance of case A i from its adjusted boundary between G 2 and G 3.

29 Linear System Approaches: Multi-Criteria Linear Programming Min Σ i (α i1 + α i2 ) and Max Σ i (β i1 + β i2 ) Subject to A i X = b 1 - α i1 + β i1, A i G 1; A i X = 0.5(b 1 + α i1 - β i 1 + b 2 - α i2 + β i2 ), A i G 2; A i X = b 2 + α i2 - β i2, A i G 3; b 1 + α i1 < b 2 - α i 2 ; where A i are given, X, b 1 and b 2 are unrestricted, and α i1, β i 1, α i2 and β i2 0.

30 Linear System Approaches: Multi-Criteria Linear Programming A i X = b 1 A i X = b 2 β 1 i β 2 α 2 i i β 1 α 1 i i α 2 i β 2 G i 1 G α 1 3 i G 2 b 1 - α 1 b 1 + α 1 b b 2 + α α 2 Three-group MC Model

31 Linear System Approaches: Multi-Criteria Linear Programming The compromise model for three groups can be: Min (d α1 + + d α1 - ) p + (d α2 + + d α2- ) p + (d β1 + +d β1- ) p + (d β2 + + d β2- ) p Subject to α* 1 + Σ i α i1 = d α1- -d α1 + ; β* 1 - Σ i β i1 = d β1- -d β1 + ; α* 2 + Σ i α i2 = d α2- -d α2 + ; β* 2 - Σ i β i2 = d β2- -d β2 + ; A i X = b 1 - α i1 + β i1, A i G 1; A i X = 0.5(b 1 + α i1 - β i 1 + b 2 - α i2 + β i2 ), A i G 2; A i X = b 2 + α i2 - β i2, A i G 3; b 1 + α i1 < b 2 - α i 2 ; where A i are given, X, b 1 and b 2 are unrestricted, and d αj +, d αj-, d βj +, d βj +, α* 1, α* 2, β* 1, β* 2, α i1, β i 1, α i2 and β i2 0.

32 Linear System Approaches: Multi-Criteria Linear Programming Similarly, Multi-Criteria Classification Model for four groups (Kou, Peng, Shi, Wise and Xu 2002): Given groups (G 1, G 2, G 3, G 4 ), we have: Min Σ i (α i1 + α i 2 + α i3 ) and Max Σ i (β i1 + β i 2 + β i3 ) Subject to A i X = b 1 - α i1 + β i1, A i G 1; A i X =.5(b 1 + α i1 - β i 1 + b 2 - α i2 + β i2 ), A i G 2; A i X =.5(b 2 + α i2 - β i 2 + b 3 - α i3 + β i3 ), A i G 3; A i X = b 3 + α i3 - β i3, A i G 4; b 1 + α i1 < b 2 - α i 2 ; b 2 + α i2 < b 3 - α i 3 ; where A i are given, X, b 1, b 2 and b 3 are unrestricted, and α i1, β i 1, α i2, β i 2, α i3 and β i3 0.

33 Linear System Approaches: Multi-Criteria Linear Programming Generally, given groups (G 1, G 2,, G s ), Multi- Criteria Classification Model for s groups is: Minimize Σ i Σ j α ij and Maximize Σ i Σ j β i j Subject to A i X = b 1 - α i1 + β i1, A i G 1; A i X = 0.5(b k-1 + α i k-1 - β i k-1 + b k - α ik + β ik ), A i G k; A i X = b s-1 + α s-1 i - β s-1 i, A i G s; b k-1 + α k-1 i < b k - α k i ; k = 2,,s-1 where A i are given, X, b j are unrestricted, and α j i and β ij 0, j = 1,,s-1.

34 Linear System Approaches: Step 5 Use Predict to mine the s groups from Verifying data set. Multi-Criteria Linear Programming Algorithm. Step 1 Use ReadCHD to convert both Training and Verifying data into data mat. Step 2 Use GroupDef to divide the observations within Training data sets into s groups: G1, G2,., and Gs. Step 3 Use sgmodel to perform the separation task on the training data. Here, PROC LP is called to calculate the MCLP model for the best solution of the s-group classifier given the values of control parameters. Step 4 Use Score to produce the graphical representations of training results. Step 3-4 will not terminate until the best training result is found.

35 Credit Card Portfolio Management Introduction Data mining for credit card portfolio management decisions is to classify the different cardholder behaviors in terms of their payment to the credit card companies, such as banks and mortgage loan firms. In realty, the common categories of the credit card variables are balance, purchase, payment and cash advance. Some credit card company may consider residence state category and job security as special variables. In the case of FDC (First Data Corporation), there are 38 original variables from the common variables over the past seven months. Then, a set of derived variables is internally generated from the 38 variables to perform the precise data mining.

36 Individual Bankruptcy Filing ( )

37 Real-life life Applications: Credit Card Portfolio Management The objective of this research is to search an alternative modeling approach (preferred by a linear approach) that could outperform the current approaches (Shi, Wise, Luo and Lin 2001): (1)Behavior Score by FICO, (2)Credit Bureau Score by FICO, (3)FDC Proprietary Bankruptcy Score, (4)SE Decision Tree.

38 Research Methodology Using the 65 (Char 1-65) variables in FDR and a small development sample, we want to determine the coefficients for an appropriate subset of 65 derived variables, X = (x 1,..., x r ), and a boundary value b to separate two groups: G (Goods) and B (Bads); that is, A i X b, A i G and A i X b, A i B, where A i are the vector values of the variables.

39 Two-group MC Model for SAS Algorithm Minimize d α - + d α + + d β - + d β + Subject to α* + Σ i α i = d - α -d + α, β* - Σ i β i = d - β -d + β, A i X = b + α i - β i, A i G, A i X = b - α + β i i, A i B, where A i, α*, and β* are given, X and b are unrestricted, and α i, β i, d - α, d + α, d - β, d + β 0.

40 Comparison Method A comparison of different methods can be demonstrated by the Kolmogorov-Smirnov (KS) value that measures the largest separation of the cumulative distributions of Goods and Bads (Conover 1999): KS = max Cum. distribution of Good - Cum. distribution of Bad

41 Comparison Results KS values on a sample of 1000 cases: (i) KS(Behavior Score) = 55.26; (ii) KS(Credit Bureau Score) = 45.55; (iii) KS(FDC Bankruptcy Score) = 59.16; (iv) KS(SE Score) = 60.22; (v) KS(MCLP) = 59.49

42 Comparison Results on a sample of 2000 cases: KS(MCLP) = 60.1 outperforms other methods From the sample size 3000 to 6000, the KS(MCLP) deviation is only 0.38 ( ).

43 MCLP Model Learning Experience Versions # of variables Sample Size KS value Note (M8) Optimal (M8) Cross-Validation (M9) 30 (log. reg.) Optimal (M9) 30 (log. reg.) Cross-Validation (M10) 29 (log. reg.) Optimal (M10) 29 (log. reg.) Cross-Validation (M11) 27 (expert) Optimal (M11) 27 (expert) Cross-Validation

44 KS(MCLP) = on 1000 cases: % % 80.00% 60.00% 40.00% 20.00% 0.00% CUMGOOD CUMBAD

45 % % 80.00% 60.00% 40.00% 20.00% 0.00% KS(MCLP) = on 2000 cases: CUMGOOD CUMBAD

46 KS(MCLP) = on 6000 cases: % % 80.00% 60.00% 40.00% 20.00% 0.00% CUMGOOD CUMBAD

47 Research Findings the MCLP model is fully controlled by the formulation; the 3000 is a stable size of sample for the robustness in the separation process; the MCLP model can be easily adopted to multi-group separation problems.

48 Three-group MC Model 3-group cumulative distribution(training) distribution cumpct1 cumpct2 cumpct interval score

49 Three-group MC Model 3-group cumulative distribution (Verifying) distribution 0.6 cumpct1 cumpct2 cumpct interval score

50 Four-group MC Model 4-group cumulative distribution(training) distr ibution cumpct1 cumpct2 cumpct3 cumpct score interval

51 Four-group MC Model 4-group cumulative distribution(verifying) distribution cumpct1 cumpct2 cumpct3 cumpct score interval

52 Five-group MC Model 5-group MCLP separation(verify) distribution 0.6 cumpct1 cumpct2 cumpct3 cumpct4 cumpct score interval

53 Other Applications Linear Programming-based Data Mining Technology can be used to: (1) Bank and Firms Bankruptcy Analyses (2) Fraud Management (3) Financial Risk Management (4) Medical Clinic Analyses (5) Marketing Promotion ---- Many others

How to evaluate credit scorecards - and why using the Gini coefficient has cost you money

How to evaluate credit scorecards - and why using the Gini coefficient has cost you money David J. Hand Imperial College London Quantitative Financial Risk Management Centre August 2009 QFRMC - Imperial