Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence Yong Shi the Charles W. and Margre H. Durham Distinguished Professor of Information Technology University of Nebraska, USA
Introduction -Data Mining -Process -Methodology -Mathematical Tools Contents Linear System Approaches -Linear Programming Methods -Multiple Criteria Linear Programming Methods Credit Card Intelligence -Real-life Credit Card Portfolio Management -Many Others
Data Mining: Introduction a powerful information technology (IT) tool in today s competitive business world. an area of the intersection of human intervention, machine learning, mathematical modeling and databases.
Introduction Process: Selecting Transforming Mining Interpreting
Introduction Methodology: Association Clustering Classification Predictions Sequential Patterns Similar Time Sequences
Introduction Mathematical Tools: statistics decision trees neural networks fuzzy logic linear programming
Introduction: Classification Use a training data set for predetermined classes Develop a separation model with rules on the training set Apply the model to classify unknown objects Discover knowledge training set model unknown objects knowledge
Linear System Approaches: Linear Programming Linear programming has been used for Classification in Data Mining. Given two attributes {a 1, a 2 } and two groups {G 1, G 2 }, with the observation A i = (A i1, A i2 ), we want to find a scalar b and nonzero vector X = (x 1,x 2 ) such that A i X b, A i G 1 and A i X b, A i G 2 have the fewest number of violated constraints.
Let Linear System Approaches α i = the overlapping of two-group (classes) boundary for case A i (external measurement); α = the max overlapping of two-group (classes) boundary for all cases A i (α i < α ); β i = the distance of case A i from its adjusted boundary (internal measurement); β = the min distance of all cases A i to the adjusted boundary (β i > β ); h i = the penalties for α i ( cost of misclassification); k i = the penalties for β i ( cost of misclassification);
Linear System Approaches G 1: 1 2 : k G 2: k+1 k+2 : n a 1 a 2 A 11 A 12 A 21 A 22 A k1 A k2 A k+1,1 A k+2,2 A k+2,1 A k+2,2 A n1 A n2 Linear Transformed Scores A i X = b A 1 X* A 2 X* : A k X* A k+1 X* A k+2 X* : A n X*
Linear System Approaches Example: Consider a credit-rating problem with two variables and two cases: a 1 = salary and a 2 = age. Let the boundary b = 10 Cases (Obs) a 1 a 2 Boundary b = 10 found best coef. x 1 x 2 LP Score A 1 6 8 10.4.6 7.2 A 2 8 12 10.4.6 10.4
Linear System Approaches Find the best (x 1 *, x 2 *) to compare A i X* = a 1 x 1 *+ a 2 x 2 * with b = 10 We see A 1 X* = 7.2 is Bad (< 10) and A 2 X* = 10.4 is Good ( > 10) Bad β i β i Good 7.2 10.4 A i X* = 10 Perfect Separation (α = 0
A i X* = b-α A i X* = b+α Linear System Approaches Linear System Approaches Example: Overlapping A i X* = b α α β i β i α i α i Bad Good
Linear System Approaches Simple Models (Freed and Glover 1981): Minimize Σ i h i α i Subject to A i X b + α i, A i Bad, A i X b - α i, A i Good, where A i are given, X and b are unrestricted, and α i 0.
Linear System Approaches Simple Models (Freed and Glover 1981): OR Maximize Σ i β i Subject to A i X b - β i, A i Bad, A i X b + β i, A i Good, where A i are given, X and b are unrestricted, and β i 0.
Linear System Approaches Hybrid Model (Glover 1990): Minimize h α + Σ i h i α i - k β -Σ i k i β i Subject to A i X = b + α + α i - β - β i, A i Bad, A i X = b - α - α i + β + β i, A i Good, where A i are given, X and b are unrestricted, α, α i, β i and β 0.
Linear System Approaches Mixed Integer Model (Koehler and Erenguc 1990): Minimize Σ j I 1j + Σ i I 2j Subject to A i X b -M I 1j, A i X b + M I 2j, where A i are given, X 0 and b are unrestricted, I 1j = 1, if A i X < b, A i Bad; Otherwise, 0; I 2j = 1, if A i X > b, A i Good; Otherwise, 0.
Linear System Approaches Three-group Model (Freed and Glover 1981): Minimize h 1 α 1 + h 2 α 2 Subject to b L1 A i X b U1, A i G 1, b L2 A i X b U2, A i G 2, b L3 A i X b U3, A i G 3, b U1 + ε b L2 + α 1 b U2 + ε b L3 + α 2 where A i and ε are given, X, b Lj (lower bound) and b Uj (upper bound) are unrestricted, and α j (group overlapping) 0.
Linear System Approaches a 2 b L1 b U1 b L2 G 1 b U2 b L3 G 2 b U3 G 3 X Three-group LP Model a 1
Linear System Approaches Multi-group Model (Freed and Glover 1981): Minimize Σ j h j α j Subject to b Lj A i X b Uj, A i G j, j =1,, m b Uj + ε b Lj+1 + α j, j =1,, m-1 where A i and ε are given, X, b Lj (lower bound) and b Uj (upper bound) are unrestricted, and α j (group overlapping) 0.
Linear System Approaches Multi-group Model (Freed and Glover 1981): OR Minimize Σ j h j α j Subject to b Lj - α j A i X b Uj + α j, A i G j, j =1,, m b Uj b Lj+1, j =1,, m-1 where A i and ε are given, X, b Lj (lower bound) and b Uj (upper bound) are unrestricted, and α j (group overlapping) 0.
Linear System Approaches Problems and Challenges of LP Approaches: Different normalization will give the different solution of X and b (boundary). Giving a proper value of b may lead a nice separations. Integer variables may improve the misclassification rate of LP model. Penalty cost of misclassification will change the classifier results. The simple model is verified as an useful model alternative to logistic discriminant (classification) model. Both are better than linear discriminant function and quadratic discriminant function. No comparison between Decision Tree Induction and LP approaches
Linear System Approaches: Multi-Criteria Linear Programming Multi-Criteria Linear programming considers to simultaneously minimize the total overlapping degree and maximize the total distance from the boundary of two groups: Minimize Σ i α i and Maximize Σ i β i Subject to A i X = b + α i - β i, A i B, A i X = b - α i + β i, A i G, where A i are given, X and b are unrestricted, and α i and β i 0.
Linear System Approaches: Multi-Criteria Linear Programming To utilize the capability of computational power of some commercial software on LP and non-lp problems, we can find the compromise solution for the separation problems (Yu 1973, Yu 1985, and Shi and Yu 1989): Let α* = the ideal overlapping of -Σ i α i ; β* = the ideal distance of Σ i β i. Then, we define the regret function as: -d + α = Σ i α + α*, if -Σ i i α i > α*; otherwise, it is 0. d - α = α* + Σ i α i,if -Σ i α i < α*;otherwise, it is 0. d + β = Σ i β - β*, if Σ i i β i > β*; otherwise, it is 0. d - α = β* - Σ i β i,if Σ i β i < β*; otherwise, it is 0.
β i Min (d α + + d α - ) p + (d β + +d α - ) p (α*, β*) 0 -Σ i α i
Linear System Approaches: Multi-Criteria Linear Programming Thus, the Multi-Criteria separation problem becomes (Shi and Peng 2001): Min (d α + + d α - ) p + (d β + +d β - ) p Subject to α* + Σ i α i = d α - -d α +, β* - Σ i β i = d β - -d β +, A i X = b + α i - β i, A i G, A i X = b - α i + β i, A i B, where A i, α*, and β* are given, X and b are unrestricted, and α i, β i, d α -, d α +, d β -, d β + 0.
R(d; ) R(d; 2) R(d; 1) for p 1 R(d; p) = (d + k + d - k ) p and p = R(d; ) = min x max {d + k + d - k k = 1,, q }
Linear System Approaches: Multi-Criteria Linear Programming Multi-Criteria Separation Model for three groups (Shi, Peng, Xu and Tang 2001): Given groups (G 1, G 2, G 3 ), let b 1 = the boundary between G 1 and G 2 ; and b 2 = the boundary between G 2 and G 3 ; α i1 = the overlapping of G 1 and G 2 for all cases A i ; β i1 = the distance of case A i from its adjusted boundary between G 1 and G 2 ; α i2 = the overlapping of G 2 and G 3 for all cases A i ; β i2 = the distance of case A i from its adjusted boundary between G 2 and G 3.
Linear System Approaches: Multi-Criteria Linear Programming Min Σ i (α i1 + α i2 ) and Max Σ i (β i1 + β i2 ) Subject to A i X = b 1 - α i1 + β i1, A i G 1; A i X = 0.5(b 1 + α i1 - β i 1 + b 2 - α i2 + β i2 ), A i G 2; A i X = b 2 + α i2 - β i2, A i G 3; b 1 + α i1 < b 2 - α i 2 ; where A i are given, X, b 1 and b 2 are unrestricted, and α i1, β i 1, α i2 and β i2 0.
Linear System Approaches: Multi-Criteria Linear Programming A i X = b 1 A i X = b 2 β 1 i β 2 α 2 i i β 1 α 1 i i α 2 i β 2 G i 1 G α 1 3 i G 2 b 1 - α 1 b 1 + α 1 b b 2 + α 2 2 - α 2 Three-group MC Model
Linear System Approaches: Multi-Criteria Linear Programming The compromise model for three groups can be: Min (d α1 + + d α1 - ) p + (d α2 + + d α2- ) p + (d β1 + +d β1- ) p + (d β2 + + d β2- ) p Subject to α* 1 + Σ i α i1 = d α1- -d α1 + ; β* 1 - Σ i β i1 = d β1- -d β1 + ; α* 2 + Σ i α i2 = d α2- -d α2 + ; β* 2 - Σ i β i2 = d β2- -d β2 + ; A i X = b 1 - α i1 + β i1, A i G 1; A i X = 0.5(b 1 + α i1 - β i 1 + b 2 - α i2 + β i2 ), A i G 2; A i X = b 2 + α i2 - β i2, A i G 3; b 1 + α i1 < b 2 - α i 2 ; where A i are given, X, b 1 and b 2 are unrestricted, and d αj +, d αj-, d βj +, d βj +, α* 1, α* 2, β* 1, β* 2, α i1, β i 1, α i2 and β i2 0.
Linear System Approaches: Multi-Criteria Linear Programming Similarly, Multi-Criteria Classification Model for four groups (Kou, Peng, Shi, Wise and Xu 2002): Given groups (G 1, G 2, G 3, G 4 ), we have: Min Σ i (α i1 + α i 2 + α i3 ) and Max Σ i (β i1 + β i 2 + β i3 ) Subject to A i X = b 1 - α i1 + β i1, A i G 1; A i X =.5(b 1 + α i1 - β i 1 + b 2 - α i2 + β i2 ), A i G 2; A i X =.5(b 2 + α i2 - β i 2 + b 3 - α i3 + β i3 ), A i G 3; A i X = b 3 + α i3 - β i3, A i G 4; b 1 + α i1 < b 2 - α i 2 ; b 2 + α i2 < b 3 - α i 3 ; where A i are given, X, b 1, b 2 and b 3 are unrestricted, and α i1, β i 1, α i2, β i 2, α i3 and β i3 0.
Linear System Approaches: Multi-Criteria Linear Programming Generally, given groups (G 1, G 2,, G s ), Multi- Criteria Classification Model for s groups is: Minimize Σ i Σ j α ij and Maximize Σ i Σ j β i j Subject to A i X = b 1 - α i1 + β i1, A i G 1; A i X = 0.5(b k-1 + α i k-1 - β i k-1 + b k - α ik + β ik ), A i G k; A i X = b s-1 + α s-1 i - β s-1 i, A i G s; b k-1 + α k-1 i < b k - α k i ; k = 2,,s-1 where A i are given, X, b j are unrestricted, and α j i and β ij 0, j = 1,,s-1.
Linear System Approaches: Step 5 Use Predict to mine the s groups from Verifying data set. Multi-Criteria Linear Programming Algorithm. Step 1 Use ReadCHD to convert both Training and Verifying data into data mat. Step 2 Use GroupDef to divide the observations within Training data sets into s groups: G1, G2,., and Gs. Step 3 Use sgmodel to perform the separation task on the training data. Here, PROC LP is called to calculate the MCLP model for the best solution of the s-group classifier given the values of control parameters. Step 4 Use Score to produce the graphical representations of training results. Step 3-4 will not terminate until the best training result is found.
Credit Card Portfolio Management Introduction Data mining for credit card portfolio management decisions is to classify the different cardholder behaviors in terms of their payment to the credit card companies, such as banks and mortgage loan firms. In realty, the common categories of the credit card variables are balance, purchase, payment and cash advance. Some credit card company may consider residence state category and job security as special variables. In the case of FDC (First Data Corporation), there are 38 original variables from the common variables over the past seven months. Then, a set of 65-80 derived variables is internally generated from the 38 variables to perform the precise data mining.
Individual Bankruptcy Filing (1980 2000)
Real-life life Applications: Credit Card Portfolio Management The objective of this research is to search an alternative modeling approach (preferred by a linear approach) that could outperform the current approaches (Shi, Wise, Luo and Lin 2001): (1)Behavior Score by FICO, (2)Credit Bureau Score by FICO, (3)FDC Proprietary Bankruptcy Score, (4)SE Decision Tree.
Research Methodology Using the 65 (Char 1-65) variables in FDR and a small development sample, we want to determine the coefficients for an appropriate subset of 65 derived variables, X = (x 1,..., x r ), and a boundary value b to separate two groups: G (Goods) and B (Bads); that is, A i X b, A i G and A i X b, A i B, where A i are the vector values of the variables.
Two-group MC Model for SAS Algorithm Minimize d α - + d α + + d β - + d β + Subject to α* + Σ i α i = d - α -d + α, β* - Σ i β i = d - β -d + β, A i X = b + α i - β i, A i G, A i X = b - α + β i i, A i B, where A i, α*, and β* are given, X and b are unrestricted, and α i, β i, d - α, d + α, d - β, d + β 0.
Comparison Method A comparison of different methods can be demonstrated by the Kolmogorov-Smirnov (KS) value that measures the largest separation of the cumulative distributions of Goods and Bads (Conover 1999): KS = max Cum. distribution of Good - Cum. distribution of Bad
Comparison Results KS values on a sample of 1000 cases: (i) KS(Behavior Score) = 55.26; (ii) KS(Credit Bureau Score) = 45.55; (iii) KS(FDC Bankruptcy Score) = 59.16; (iv) KS(SE Score) = 60.22; (v) KS(MCLP) = 59.49
Comparison Results on a sample of 2000 cases: KS(MCLP) = 60.1 outperforms other methods From the sample size 3000 to 6000, the KS(MCLP) deviation is only 0.38 (56.19 55.81).
MCLP Model Learning Experience Versions # of variables Sample Size KS value Note (M8) 64 1000 59.49 Optimal (M8) 64 6000 51.67 Cross-Validation (M9) 30 (log. reg.) 2000 60.10 Optimal (M9) 30 (log. reg.) 6000 56.43 Cross-Validation (M10) 29 (log. reg.) 3000 56.19 Optimal (M10) 29 (log. reg.) 6000 55.81 Cross-Validation (M11) 27 (expert) 3000 55 Optimal (M11) 27 (expert) 6000 53.63 Cross-Validation
KS(MCLP) = 59.49 on 1000 cases: 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% CUMGOOD CUMBAD 3421-3440 3761-3780 3961-3980 4161-4180 4381-4400 4581-4600 4781-4800 5001-5020 5201-5220 5421-5440 5621-5640 5841-5860 6041-6060 6261-6280 6461-6480 6661-6680 6861-6880
25001-26000 28001-29000 31001-32000 34001-35000 37001-38000 40001-41000 43001-44000 46001-47000 49001-50000 52001-53000 55001-56000 58001-59000 61001-62000 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% KS(MCLP) = 60.10 on 2000 cases: CUMGOOD CUMBAD 12001-13000 21001-22000
KS(MCLP) = 56.43 on 6000 cases: 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% CUMGOOD CUMBAD 13001-14000 18001-19000 20001-21000 22001-23000 24001-25000 26001-27000 28001-29000 30001-31000 32001-33000 34001-35000 36001-37000 39001-40000 41001-42000 43001-44000 45001-46000 47001-48000
Research Findings the MCLP model is fully controlled by the formulation; the 3000 is a stable size of sample for the robustness in the separation process; the MCLP model can be easily adopted to multi-group separation problems.
Three-group MC Model 3-group cumulative distribution(training) 1.2 1 distribution 0.8 0.6 0.4 cumpct1 cumpct2 cumpct3 0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 interval score
Three-group MC Model 3-group cumulative distribution (Verifying) 1.2 1 0.8 distribution 0.6 cumpct1 cumpct2 cumpct3 0.4 0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 interval score
Four-group MC Model 4-group cumulative distribution(training) 1.2 1 0.8 distr ibution 0.6 0.4 cumpct1 cumpct2 cumpct3 cumpct4 0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 score interval
Four-group MC Model 4-group cumulative distribution(verifying) 1.2 1 0.8 distribution 0.6 0.4 cumpct1 cumpct2 cumpct3 cumpct4 0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 score interval
Five-group MC Model 5-group MCLP separation(verify) 1.2 1 0.8 distribution 0.6 cumpct1 cumpct2 cumpct3 cumpct4 cumpct5 0.4 0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 score interval
Other Applications Linear Programming-based Data Mining Technology can be used to: (1) Bank and Firms Bankruptcy Analyses (2) Fraud Management (3) Financial Risk Management (4) Medical Clinic Analyses (5) Marketing Promotion ---- Many others