Jedovnice 20 Computation of Information Value for Credit Scoring Models Martin Řezáč, Jan Koláček Dept. of Mathematics and Statistics, Faculty of Science, Masaryk University
Information value The special case of Kullback-Leibler divergence given by: where are densities of scores of bad and good clients. Finanční matematika v praxi I, Jedovnice 20 /30
Kernel estimate The kernel density estimates are defined by where and K is the Epanechnikov kernel. Finanční matematika v praxi I, Jedovnice 20 2/30
Empirical estimate of I val Finanční matematika v praxi I, Jedovnice 20 3/30
Empirical estimate of I val However in practice, there could occur computational problems. The Information value index becomes infinite in cases when some of n 0 j or n j are equal to 0. Choosing of the number of bins is also very important. In the literature and also in many applications in credit scoring, the value r=0 is preferred. Finanční matematika v praxi I, Jedovnice 20 4/30
Empirical estimate with supervised interval selection (ESIS), We want to avoid zero values of n 0 j or n j. We propose to require to have at least k, where k is a positive integer, observations of scores of both good and bad client in each interval. This is the basic idea of all proposed algorithms. Finanční matematika v praxi I, Jedovnice 20 5/30
Empirical estimate with supervised interval selection, the first ESIS: Set where is the empirical quantile function appropriate to the empirical cumulative distribution function of scores of bad clients. Finanční matematika v praxi I, Jedovnice 20 6/30
Empirical estimate with supervised interval selection Usage of quantile function of scores of bad clients is motivated by the assumption, that number of bad clients is less than number of good clients. If n 0 is not divisible by k, it is necessary to adjust our intervals, because we obtain number of scores of bad clients in the last interval, which is less than k. In this case, we have to merge the last two intervals. Furthermore we need to ensure, that the number of scores of good clients is as required in each interval. To do so, we compute n j for all actual intervals. If we obtain n j < k for jth interval, we merge this interval with its neighbor on the right side. This can be done for all intervals except the last one. If we have n < k for the j last interval, than we have to merge it with its neighbor on the left side, i.e. we merge the last two intervals. Finanční matematika v praxi I, Jedovnice 20 7/30
and Empirical estimate with supervised interval selection Very important is the choice of k. If we choose too small value, we get overestimated value of the Information value, and vice versa. As a reasonable compromise seems to be adjusted square root of number of bad clients given by The estimate of the Information value is given by where n and n 0 correspond to observed counts of good and bad clients j j in intervals created according to the described procedure. Finanční matematika v praxi I, Jedovnice 20 8/30
Simulation results Consider n clients, 00p B % of bad clients with and 00(-p B )% of good clients with f N(, ). : 0 Because of normality we know I val. Consider following values of parameters: n = 00 000, n = 000 μ 0 = 0 σ 0 = σ = μ = 0.5,,.5 p B = 0.02, 0.05, 0., 0.2 2 f : N( 0, 0) 0 Finanční matematika v praxi I, Jedovnice 20 9/30
Simulation results ) Scores of bad and good clients were generated according to given parameters. 2) Estimates ˆ, Iˆ, Iˆ were computed. 3) Square errors were computed. 4) Steps )-3) were repeated one thousand times. 5) was computed. I val, DEC val, KERN val, ESIS Finanční matematika v praxi I, Jedovnice 20 0/30
Simulation results n=00000, = 0.5 0 IV_decil 0,000546 0,00030 0,000224 0,00068 IV_kern 0,000487 0,000232 0,0003 0,000076 IV_esis 0,00090 0,000384 0,00028 0,00027 n=00000, =.0 0 IV_decil 0,006286 0,004909 0,004096 0,002832 IV_kern 0,003396 0,00697 0,00064 0,000646 IV_esis 0,00246 0,000973 0,000477 0,000568 n=00000, =.5 0 IV_decil 0,056577 0,04845 0,03484 0,02066 IV_kern 0,0956 0,00789 0,006796 0,004862 IV_esis 0,03045 0,00834 0,007565 0,027943 n=000, = 0.5 0 IV_decil 0,025574 0,04006 0,026536 0,009074 IV_kern 0,038634 0,07547 0,00928 0,004737 IV_esis 0,03833 0,02980 0,06280 0,008028 n=000, =.0 0 IV_decil 0,86663 0,084572 0,043097 0,029788 IV_kern 0,7382 0,07238 0,045344 0,0323 IV_esis 0,5088 0,07088 0,036503 0,023609 n=000, =.5 0 IV_decil,663859,037778 0,53580 0,200792 IV_kern 0,529367 0,349783 0,26692 0,96856 IV_esis 0,60993 0,3525 0,7293 0,94676 worst average best performance Finanční matematika v praxi I, Jedovnice 20 /30
and Adjusted empirical estimate with supervised interval selection (AESIS) It is obvious that the choice of parameter k is crucial. So the question is: Is the choice optimal (according to )? What effect has n 0 on the optimal k? And what effect, if any, has the difference of means? 0 Finanční matematika v praxi I, Jedovnice 20 2/30
Simulation results Consider 0000 clients, 00p B % of bad clients with 00(-p B )% of good clients with f : N(,). Set 0 0 and consider 0.5,and.5,. E(( Iˆ val I val ) 2 ) f : N( 0,) 0 and k p B p B 0 0 3/30
Simulation results Dependence of on k,. 0 The highlighted circles correspond to values of k, where minimal value of the is obtained. The diamonds correspond to values of k given by. 0 p B 0.5 3 45 6 84 2 7 24 32.5 7 0 4 9 k 0 p B 0.5 29 42 62 84 2 8 23 32.5 6 9 8 9 ˆ I val, AESIS 4/30
Simulation results n=000, = 0.5 0 IV_decil 0,025574 0,04006 0,026536 0,009074 IV_kern 0,038634 0,07547 0,00928 0,004737 IV_esis 0,03833 0,02980 0,06280 0,008028 IV_aesis 0,042409 0,030808 0,05558 0,007223 n=000, =.0 0 IV_decil 0,86663 0,084572 0,043097 0,029788 IV_kern 0,7382 0,07238 0,045344 0,0323 IV_esis 0,5088 0,07088 0,036503 0,023609 IV_aesis 0,2568 0,093932 0,043860 0,027467 n=000, =.5 0 The classical estimate had the lowest in case of weak model and extremely low number (namely 20) of bad clients. If the number of bad client is slightly higher, had the best performance. Considering models with higher predictive power, and had the best performance. IV_decil,663859,037778 0,53580 0,200792 IV_kern 0,529367 0,349783 0,26692 0,96856 IV_esis 0,60993 0,3525 0,7293 0,94676 IV_aesis 0,553650 0,205889 0,3587 0,089354 Finanční matematika v praxi I, Jedovnice 20 5/30
Simulation results n=00000, = 0.5 0 IV_decil 0,000546 0,00030 0,000224 0,00068 IV_kern 0,000487 0,000232 0,0003 0,000076 IV_esis 0,00090 0,000384 0,00028 0,00027 IV_aesis 0,000603 0,000253 0,00035 0,000079 n=00000, =.0 0 IV_decil 0,006286 0,004909 0,004096 0,002832 IV_kern 0,003396 0,00697 0,00064 0,000646 IV_esis 0,00246 0,000973 0,000477 0,000568 IV_aesis 0,002446 0,0057 0,000552 0,0003 n=00000, =.5 0 IV_decil 0,056577 0,04845 0,03484 0,02066 IV_kern 0,0956 0,00789 0,006796 0,004862 IV_esis 0,03045 0,00834 0,007565 0,027943 IV_aesis 0,00640 0,002688 0,002502 0,0472 We can see that the classical estimate was outperformed by all other estimates of the Information value in all considered cases when the number of clients was high (00000 in our case). The best performance was achieved by in case of weak scoring models, by in case of models with high performance and by for models with very high performance. Finanční matematika v praxi I, Jedovnice 20 5/30
Algorithm for the modified ESIS: ) 2) 3) q [] k q j F n smax max( q j, q j 0 ) q j 0 F 0 k n ESIS. 0, ˆ I val, ESIS where s max 4) Add to the sequence, i.e. q [ q, smax ] 5) Erase all scores s max 6) While n 0 and n are greater than 2*k, repeat step 2) 5) 7) q [min( score ), q] Finanční matematika v praxi I, Jedovnice 20 7/30
and AESIS. Simulation results Consider 000 and 0000 clients, 00p B % of bad clients with and 00(-p B )% of good clients with f : N(,). Set 0 0 and consider 0.5,and.5,. E(( Iˆ val I val ) 2 ) f : N( 0,) 0 n 000 n 0000 k p B 0.5 5 9 0 22 0 2 3 4 6 0.5 2 3 2 5 8 0 5 k pb 0.5 2 8 32 5 4 7 0 3.5 2 3 4 5 5 23 32 45 Finanční matematika v praxi I, Jedovnice 20 8/30
Simulation results Dependence of on k. n 000, p 0.2 B n 0000, p 0.2 B 0.5. 0 0 0 0.5. 0. 5 0 0 0 0.5 k pb n 3. 4 ˆ ˆ0 n 000 pb n k 3. 4 ˆ ˆ0 0 n 0000 p B 0.5 4 7 9 3 2 3 4 5.5 2 2 3 k 0 p B 0.5 5 9 0 22 2 3 4 6.5 2 2 2 ˆ I val, AESIS pb n k 3. 4 ˆ ˆ0 0 p B 0.5 3 20 28 39 5 8 5.5 3 5 6 9 k 0 p B 0.5 2 8 32 5 4 7 0 3.5 2 3 4 5 9/30
Simulation results n=00000, = 0.5 0 IV_decil 0,000546 0,00030 0,000224 0,00068 IV_kern 0,000487 0,000232 0,0003 0,000076 IV_esis 0,00090 0,000384 0,00028 0,00027 IV_aesis 0,000603 0,000253 0,00035 0,000079 IV_esis 0,000550 0,000273 0,000480 0,00079 IV_aesis 0,000938 0,00038 0,000203 0,00023 n=00000, =.0 0 IV_decil 0,006286 0,004909 0,004096 0,002832 IV_kern 0,003396 0,00697 0,00064 0,000646 IV_esis 0,00246 0,000973 0,000477 0,000568 IV_aesis 0,002446 0,0057 0,000552 0,0003 IV_esis 0,03965 0,04877 0,07427 0,034 IV_aesis 0,009763 0,004326 0,002494 0,00534 n=00000, =.5 0 IV_decil 0,056577 0,04845 0,03484 0,02066 IV_kern 0,0956 0,00789 0,006796 0,004862 IV_esis 0,03045 0,00834 0,007565 0,027943 IV_aesis 0,00640 0,002688 0,002502 0,0472 IV_esis 0,435297 0,29647 0,297 0,6950 IV_aesis 0,069952 0,043534 0,032264 0,023526 Finanční matematika v praxi I, Jedovnice 20 n=000, = 0.5 0 IV_decil 0,025574 0,04006 0,026536 0,009074 IV_kern 0,038634 0,07547 0,00928 0,004737 IV_esis 0,03833 0,02980 0,06280 0,008028 IV_aesis 0,042409 0,030808 0,05558 0,007223 IV_esis 0,0279 0,05727 0,00805 0,006886 IV_aesis 0,060838 0,027738 0,08239 0,00903 n=000, =.0 0 IV_decil 0,86663 0,084572 0,043097 0,029788 IV_kern 0,7382 0,07238 0,045344 0,0323 IV_esis 0,5088 0,07088 0,036503 0,023609 IV_aesis 0,2568 0,093932 0,043860 0,027467 IV_esis 0,289062 0,4470 0,5949 0,098609 IV_aesis 0,260596 0,29346 0,07732 0,036574 n=000, =.5 0 IV_decil,663859,037778 0,53580 0,200792 IV_kern 0,529367 0,349783 0,26692 0,96856 IV_esis 0,60993 0,3525 0,7293 0,94676 IV_aesis 0,553650 0,205889 0,3587 0,089354 IV_esis,50276,05837 0,922429 0,86092 IV_aesis 0,63465 0,29309 0,2057 0,3756 20/30
. Simulation results The new algorithm (ESIS.) ended disastrously in most cases. The only exception was a situation where n=000 a μ -μ 0 =0.5. This corresponds to a scoring model with very poor discriminatory power and a very small number of observed bad clients (20-200). AESIS. ranked among the average. Finanční matematika v praxi I, Jedovnice 20 2/30
ESIS.2 U původního ESIS často dochází ke slučování vypočtených intervalů ve druhé fázi algoritmu. Pro výpočet se používá jen F0.. Aby byla splněna podmínka n >k, je zřejmě nutné, aby hranice prvního intervalu byla větší než k. F n To vede k myšlence použít ke konstrukci intervalů nejprve F. a následně, od nějaké hodnoty skóre F0.. Jako vhodná hodnota skóre pro tento účel se jeví hodnota s 0, ve které se protínají hustoty skóre, rozdíl distribučních funkcí skóre nabývá své maximální hodnoty a také platí, že funkce f IV nabývá nulové hodnoty., s 0 Point of intersection of densities = = Point of maximal difference of CDFs Point of zero value of f IV Finanční matematika v praxi I, Jedovnice 20 22/30
ESIS.2, Algorithm for the modified ESIS: ) 2) 3) 4) s0 arg max F q j j k n F, j,, F ( 0 ) n k j k n0 n F0 j F0 ( s0),, n0 k k s q j s F 0 0 0, q [min( score ), q, q0, max( score ) ] 5) Merge intervals given by q where number of bads is less than k. 6) Merge intervals given by q 0 where number of goods is less than k. Finanční matematika v praxi I, Jedovnice 20 ˆ I val, ESIS 2 where 23/30
and AESIS.2 Simulation results Consider 000, 0000 and 00000 clients, 00p B % of bad clients with f0 : N( 0,) and 00(-p B )% of good clients with f : N(,). Set 0 0, 5 0.5,and. and consider. E(( Iˆ val I val ) 2 ) n 000 0 k p B 0.5 5 9 22 45 3 8 6.5 2 3 6 7 5 8 0 5 n 00000 n 0000 0 k pb 0.5 29 5 77 2 5 24 28 45.5 6 4 5 23 32 45 Finanční matematika v praxi I, Jedovnice 20 0 k 0.5 8 98 298 37 50 6 06 4.5 7 28 32 48 5 8 0 5 p B 24/30
Simulation results Dependence of on k. n 000, p 0.2 B 0.5. 0. 5 0 0 0 n 00000, p 0.05 B 0.5. 0 0 0 n 0000, p 0.2 B 0.5. 0. 5 0 0 0 0.5 pb n k ˆ ˆ0 ˆ I val, AESIS 2 2 n 0000 pb n k ˆ ˆ0 0 2 p B 0.5 38 60 85 20 5 23 32 45.5 8 3 8 26 k 0 p B 0.5 29 5 77 2 5 24 28 45.5 6 4 25/30
Simulation results n=000, n=000, =.0 0 = 0.5 0 IV_decil 0,025574 0,04006 0,026536 0,009074 IV_kern 0,038634 0,07547 0,00928 0,004737 IV_esis 0,03833 0,02980 0,06280 0,008028 IV_aesis 0,042409 0,030808 0,05558 0,007223 IV_esis 0,0279 0,05727 0,00805 0,006886 IV_aesis 0,060838 0,027738 0,08239 0,00903 IV_esis2 0,0382 0,025568 0,09098 0,009540 IV_esis2a 0,048697 0,027729 0,044 0,007988 IV_esis2b 0,09599 0,043529 0,026044 0,04985 IV_aesis2 0,0570 0,02658 0,043 0,007838 esis2 esis2a eis2b aesis2 algoritmus ESIS.2 ESIS.2 ESIS.2 ESIS.2 Finanční matematika v praxi I, Jedovnice 20 k k pb n 3. ˆ ˆ0 4 pb n k ˆ ˆ0 2 IV_decil 0,86663 0,084572 0,043097 0,029788 IV_kern 0,7382 0,07238 0,045344 0,0323 IV_esis 0,5088 0,07088 0,036503 0,023609 IV_aesis 0,2568 0,093932 0,043860 0,027467 IV_esis 0,289062 0,4470 0,5949 0,098609 IV_aesis 0,260596 0,29346 0,07732 0,036574 IV_esis2 0,7946 0,074200 0,04890 0,02286 IV_esis2a 0,2349 0,089678 0,04828 0,03476 IV_esis2b 0,357268 0,68690 0,097523 0,064290 IV_aesis2 0,209890 0,0964 0,050609 0,028864 n=000, =.5 0 worst average best performance IV_decil,663859,037778 0,53580 0,200792 IV_kern 0,529367 0,349783 0,26692 0,96856 IV_esis 0,60993 0,3525 0,7293 0,94676 IV_aesis 0,553650 0,205889 0,3587 0,089354 IV_esis,50276,05837 0,922429 0,86092 IV_aesis 0,63465 0,29309 0,2057 0,3756 IV_esis2 0,838666 0,244379 0,33832 0,6534 IV_esis2a 0,577075 0,8388 0,28374 0,067026 IV_esis2b 0,73756 0,260 0,202780 0,09247 IV_aesis2 0,57543 0,84840 0,25203 0,073825 27/30
Simulation results n=00000, = 0.5 n=00000, =.0 0 IV_decil 0,000546 0,00030 0,000224 0,00068 IV_kern 0,000487 0,000232 0,0003 0,000076 IV_esis 0,00090 0,000384 0,00028 0,00027 IV_aesis 0,000603 0,000253 0,00035 0,000079 IV_esis 0,000550 0,000273 0,000480 0,00079 IV_aesis 0,000938 0,00038 0,000203 0,00023 IV_esis2 0,000905 0,000353 0,000222 0,0003 IV_esis2a 0,00060 0,00023 0,00035 0,000072 IV_esis2b 0,0052 0,00049 0,000252 0,00039 IV_aesis2 0,000570 0,0002 0,0009 0,000064 0 IV_decil 0,006286 0,004909 0,004096 0,002832 IV_kern 0,003396 0,00697 0,00064 0,000646 IV_esis 0,00246 0,000973 0,000477 0,000568 IV_aesis 0,002446 0,0057 0,000552 0,0003 IV_esis 0,03965 0,04877 0,07427 0,034 IV_aesis 0,009763 0,004326 0,002494 0,00534 IV_esis2 0,00258 0,000905 0,000484 0,000285 IV_esis2a 0,002578 0,004 0,000582 0,00035 IV_esis2b 0,005844 0,002378 0,00366 0,000725 IV_aesis2 0,00237 0,000945 0,000497 0,000294 worst average best performance Finanční matematika v praxi I, Jedovnice 20 n=00000, =.5 0 IV_decil 0,056577 0,04845 0,03484 0,02066 IV_kern 0,0956 0,00789 0,006796 0,004862 IV_esis 0,03045 0,00834 0,007565 0,027943 IV_aesis 0,00640 0,002688 0,002502 0,0472 IV_esis 0,435297 0,29647 0,297 0,6950 IV_aesis 0,069952 0,043534 0,032264 0,023526 IV_esis2 0,0258 0,006407 0,002796 0,003459 IV_esis2a 0,006033 0,002356 0,00235 0,000727 IV_esis2b 0,0927 0,004857 0,002937 0,00373 IV_aesis2 0,008045 0,003735 0,00760 0,00244 26/30
. Simulation results n=000, = 0.5 0 r IV_kern 0,54537 0,07089 0,03724 0,08946 IV_esis 0,086877 0,062907 0,032206 0,027545 IV_aesis2 0,20468 0,06072 0,056524 0,03350 n=000, =.0 0 r IV_kern 0,7382 0,07238 0,045344 0,0323 IV_esis 0,5088 0,07088 0,036503 0,023609 IV_esis2 0,7946 0,074200 0,04890 0,02286 n=000, =.5 0 r IV_kern 0,235274 0,55459 0,8628 0,087492 IV_esis2a 0,256478 0,0847 0,057055 0,029789 IV_aesis2 0,25569 0,0825 0,055646 0,0328 n=00000, = 0.5 0 r IV_kern 0,00947 0,000928 0,000524 0,000306 IV_esis 0,00299 0,0009 0,0099 0,00075 IV_aesis2 0,002280 0,000844 0,000475 0,000255 n=00000 =.0 0 r IV_kern 0,003396 0,00697 0,00064 0,000646 IV_esis 0,00246 0,000973 0,000477 0,000568 IV_esis2 0,00258 0,000905 0,000484 0,000285 n=00000 =.5 0 r IV_kern 0,008694 0,004795 0,003020 0,0026 IV_esis2a 0,002682 0,00047 0,000549 0,000323 IV_aesis2 0,003576 0,00660 0,000782 0,000553 Relativní (r) jde o z předchozích tabulek vydělené příslušnou teoretickou hodnotou IV. Umožní lepší porovnání eliminuje vliv absolutní výše IV. Finanční matematika v praxi I, Jedovnice 20 28/30
Conclusions The classical way of computation of the information value, i.e. empirical estimate using deciles of scores, may lead to strongly biased results. We conclude that kernel estimates and empirical estimates with supervised interval selection (ESIS, AESIS, ESIS., ESIS.2 and AESIS.2) are much more appropriate to use. Finanční matematika v praxi I, Jedovnice 20 29/30
Děkuji za pozornost. Finanční matematika v praxi I, Jedovnice 20 30/30