Research Article Metric Divergence Measures and Information Value in Credit Scoring

Size: px

Start display at page:

Download "Research Article Metric Divergence Measures and Information Value in Credit Scoring"

Maryann Benson
5 years ago
Views:

1 Joural of Mathematics Volume 013, Article ID 84871, 10 pages Research Article Metric Divergece Measures ad Iformatio Value i Credit Scorig Guopig Zeg Thik Fiace, 4150 Iteratioal Plaza, Fort Worth, TX 76109, USA Correspodece should be addressed to Guopig Zeg; guopigtx@yahoocom Received 13 August 013; Accepted 4 September 013 Academic Editor: Baodig Liu Copyright 013 Guopig Zeg This is a ope access article distributed uder the Creative Commos Attributio Licese, which permits urestricted use, distributio, ad reproductio i ay medium, provided the origial work is properly cited Recetly, a series of divergece measures have emerged from iformatio theory ad statistics ad umerous iequalities have bee established amog them However, oe of them are a metric i topology I this paper, we propose a class of metric divergece measures, amely, L p (P Q), P 1,ad study their mathematical properties We the study a importat divergece measure widely used i credit scorig, called iformatio value I particular, we explore the mathematical reasoig of weight of evidece ad suggest a better alterative to weight of evidece Fially, we propose usig L p (P Q) as alteratives to iformatio value to overcome its disadvatages 1 Itroductio The iformatio measure is a importat cocept i Iformatio theory ad statistics It is related to the system of measuremet of iformatio or the amout of iformatiobasedotheprobabilitiesoftheevetsthatcovey iformatio Divergece measures are a importat type of iformatio measures They are commoly used to fid appropriate distace or differece betwee two probability distributios Let Δ ={P=(p 1,p,,p ) p i 0, p i =1},, be the set of fiite discrete probability distributios as i [1] For all P, Q Δ, the followig divergece measures are well kow i the literature of iformatio theory ad statistics Helliger Discrimiatio [] h (P Q) = 1 (1) ( p i ) () Shao s Etropy [3] H (P) = p i log (p i ), (3) which is sometimes referred to as measure of ucertaity The etropy H(P) of a discrete radom variable is defied i terms of its probability distributio P ad is a good measure of radomess or ucertaity Note that i the origial defiitio of Shao s Etropy thelogistothebaseadetropyisexpressedibitsi iformatio theory The log ca be ay other bases ad the etropy will be a costat factor of the oe i base by the chage-base formula of the logarithm fuctio Hece, without loss of geerality, we will assume all the logs are atural logarithms Kullback ad Leibler s Relative Iformatio [4] D (P Q) = p i l ( p i ) (4) Its symmetric form is the well-kow J-divergece

2 Joural of Mathematics J-Divergece (Jeffreys [5], Kullback, ad Leibler [4]) J (P Q) =D(P Q) +D(Q P) = Triagular Discrimiatio [6] Δ (P Q) = (p i ) (p i ) l ( p i ) (5) p i + (6) Symmetric Chi-Square Divergece (Dragomir et al [7]) Oe has ψ (P Q) =χ (P Q) +χ (Q P) = (p i ) (p i + ) p i, where χ (P Q) = ((p i ) /p i ) is the well-kow χ - divergece (Pearso [8]) Jese-Shao Divergece (Sibso [9], Burbea, ad Rao [10, 11]) I (P Q) = 1 [ p i l ( p i )+ q p i + l ( )] i p i + Arithmetic-Geometric Divergece (Taeja [1]) Moreover T (P Q) = p i + Taeja s Divergece (Taeja [1]) Oe has d (P Q) =1 ( p i + (7) (8) l ( p i + p i ) (9) )( p i + ) (10) The iformatio measures J(P Q), I(P Q) ad T(P Q) cabewritteas J (P Q) =4[I (P Q) +T(P Q)], I (P Q) = 1 T (P Q) = 1 [D (P P+Q [D (P+Q )+D(Q P+Q )], P)+D( P+Q Q)] (11) Relative Iformatio of Type s Cressie ad Read [13] cosidered the oe-parametric geeralizatio of iformatio measure D(P Q), called the relative iformatio of type s give by D s (P Q) = [s (s 1)] 1 [ p s i q1 s i 1], s=0,1 (1) It also has some special cases: (i) lim s 0 D s (P Q) = D(Q P), (ii) lim s 1 D s (P Q)=D(P Q), (iii) D 1 (P Q) = (1/)ψ (Q P), (iv) D 1/ (P Q)=4h(P Q), (v) D (P Q) = (1/)ψ (P Q), (vi) D (P Q)=D 1 (Q P), (vii) D 1 (P Q)=D 0 (Q P) It is show that D s (P Q)is oegative ad covex i P ad Q i [1] J-Divergece of Type s [14] V s (P Q) =D s (P Q) +D s (Q P) = [s (s 1)] 1 [ (p s i q1 s i It admits the followig particular cases: +q s i p1 s i ) ], s =0,1 (13) (i) lim s 0 V s (P Q) = lim s 1 V s (P Q)=J(P Q), (ii) lim s 1 V s (P Q)=J(P Q), (iii) V 1 (P Q)=V (P Q) = (1/)Ψ(P Q), (iv) V 0 (P Q)=V 1 (P Q)=J(P Q), (v) V 1/ (P Q)=8h(P Q) Uified Geeralizatio of Jese-Shao Divergece ad Arithmetic-Geometry Mea Divergece [14] W s (P Q) = 1 [D s ( P+Q P)+D s ( P+Q It admits the followig particular cases: (i) W 1 (P Q) = (1/4)Δ(P Q), (ii) W 0 (P Q)=I(P Q), (iii) W 1/ (P Q)=4d(P Q), (iv) W 1 (P Q)=T(P Q), (v) W (P Q)=(1/16)Ψ(P Q) Q)] (14) Taeja proved [14]thatallthe3s-type iformatio measures D s (P Q), V s (P Q), adw s (P Q) are oegative adcovexithepair(p, Q) He also obtaied iequalities regardig the various divergece measures: 1 Δ (P Q) I(P Q) h(p Q) 4d(P Q) J (P Q) T(P Q) 1 (15) 16 ψ (P Q)

3 Joural of Mathematics 3 Here, we observe that Δ(P Q)>0Hece,fromtheabove iequalities we see that I(P Q), h(p Q), d(p Q), J(P Q), T(P Q), adψ(p Q)are all positive We also ote that all p i i the origial defiitio of Δ i [1] are required to be positive Yet, i realty some p i may be 0 I this case I(P Q), T(P Q), ψ(p Q), adδ(p Q) will be udefied We have exteded the defiitio of Δ to iclude the cases whe p i =0 We assume that 0 l(0) = 0, which is easily justified by cotiuity sice x l(x) 0 as x 0 For coveiece, we also assume 0 l(0/0) = 0 ad t l(t/0) = 0 for t>0 A problem with the above divergece measures is that oe of them are a real distace, that is, a metric, i topology I this paper, we will study a class of metric divergece measures L p (P Q) We the study the uderlyig mathematics of a special divergece measure called iformatio value, which is widely used i credit scorig We propose usig L p (P Q) as alteratives to IV i order to overcome the disadvatages of iformatio value The rest of this paper is orgaized as follows I Sectio, after reviewig the metric space, we disprove that the above divergece measures are metrics We the study a class of metric divergece measures L p (P Q) Sectio 3 is cocered with iformatio value We examie a rule of thumb ad weight of evidece ad suggest a better alterative to weight of evidece We the propose usig L p (P Q) as alteratives of IV to overcome the disadvatages of iformatio value Sectio 4 presets some umerical results Fially, the paper is cocluded i Sectio 5 Metric Divergece Measures 1 Review of Metric Space Defiitio 3 Two metrics d 1 ad d are equivalet if there exist positive costats α ad β such that α d (x, y) d 1 (x, y) β d (x, y) If two metrics d 1 ad d are equivalet, they will have the same covergece Nometric Divergece Measures Propositio 4 Noe of the divergece measures Δ(P Q), I(P Q), h(p Q), d(p Q), J(P Q), T(P Q), ad ψ(p Q)areametricitopologyIdeed,oeofthemsatisfy thetriagleiequality Proof We disprove them either umerically or aalytically by costructig couter examples i Δ (a) Let p = (0, 1), q = (1, 0), ad r = (05, 05) The Δ(P Q) =, Δ(P R) = Δ(R Q) = /3, ad Δ(P R)+Δ(R Q)=4/3<=Δ(P Q) (b) Let p = (05, 05), q = (0, 08), adr = (03, 07) The I(P Q) = , I(R Q) = , I(P R) = , adi(p R) + I(R Q) < I(P Q) (c) Let p = (0, 1), q = (1, 0), ad r = (05, 05) The h(p, Q) = 1, h(p R) = h(r Q) = (1/)( ),adh(p R) + h(r Q) = <1=h(P Q) (d) Let p = (0, 08), q = (04, 06), adr = (03, 07) The Defiitio 1 Suppose a real valued fuctio d:m M R ad that for all x, y, z of set M (1) d(x, y) 0 (oegative), () d(x, x) = 0 if ad oly if x=y(idetity), (3) d(x, y) = d(y, x) (symmetry), (4) d(x, z) d(x, y) + d(y, z) (triagle iequality) Such a distace fuctio d is called a metric o M,adthe pair (D, M) is called a metric space If d satisfies (1) (3) but ot ecessarily (4), it is called a semimetric Ametricspaceisatopologicalspaceiaaturalmaer, ad therefore all defiitios ad theorems about geeral topological spaces also apply to a metric space For istace, i a metric space oe ca defie ope ad closed sets, covergece of sequeces of poits, compact space, ad coected space D (P R) +D(R Q) D(P Q) = 0 l l l l l 1 08 l 4 3 = 0 (l 3 l 1 ) + 08 (l 8 7 l 4 3 ) + 03 l l 7 6 = 0 l l l l 7 6 = 0 l l l l 6 7 (16) Defiitio Ametricd 1 is said to be upper bouded by aother metric d if there exists a positive costat c such that d 1 (x, y) c d (x, y) for all x, y MIthiscase, d is said to be lower bouded by d 1 If d 1 is upper bouded by d, the the covergece i the metric space (M, d ) implies the covergece i the metric space (M, d 1 ) = 01 l l 6 7 = 01 (l 6 7 l 4 9 ) = 01 l <01 l 1= Hece, D(P R)+D(R Q)<D(P Q) Ideed, D(P Q)is ot symmetric either (see [15])

4 4 Joural of Mathematics (e) Let p = (0, 08), q = (04, 06), r = (03, 07)The Whe p=,weobtaitheeuclideadistace: J (P R) +J(R Q) = (01 l l 8 7 ) + (01 l l 7 6 ) = 01 l 8 3, (17) L (P Q) = Whe p,wehave (p i ) () L (P Q) = max 1 i p i (3) J (P Q) = 0 l + 0 l 4 3 = 0 l 8 3 Hece, J(P R)+J(R Q)<J(P Q) (f) Let p = (05, 05), q = (0, 08), ad r = (03, 07) The T(P Q) = , T(P R) = , T(R Q) = , adt(p R) + T(R Q) < T(P Q) (g) Let p = (05, 05), q = (0, 08), r = (03, 07) The, ψ(p Q) = 369/400, ψ(q R) = 185/1680, ψ(p R) = 368/1050,adψ(P R)+ψ(R Q) < ψ(p Q) 3 A Natural Metric Divergece Measure If we pick up the commo part of D(P Q)ad J(P Q),wewillobtaia metric divergece measure Z (P Q) = l (p i) l ( ) = l (p i ) (18) Sice p i 1ad p i 1for all 1 i,bothd(p Q) ad J(P Q) are upper bouded by Z(P Q);thatis, D(P Q) Z(P Q), J(P Q) Z(P Q) 4 L p -Divergece Recall that for a real umber p 1, the l p -orm of vector x = (x 1,x,,x ) R is defied by x p =( x i p ) (19) We will apply the l p -metric from the l p -orm to divergece measures to obtai l p -divergece For coveiece, we will use the upper case otatio Defiitio 5 For two probability distributios P ad Q,oe defies their L p -divergece as L p (P Q) = p p i p (0) Here, p 1is used for superscript, subscript, ad radial root It should ot be cofused with the vector P I particular, whe p=1,wehavethel 1 distace: L 1 (P Q) = p i (1) It is kow that l p -orms are decreasig i p Moreover, all l p metrics are equivalet Lemma 6 If r>p 1, the the l p -orms i R satisfy x r x p ( 1/r) x r (4) Corollary 7 If r>p 1, the the L p -divergeces satisfy L r (P Q) L p (P Q) ( 1/r) L r (P Q) (5) Theorem 8 L p -divergeces are all bouded by costat for p 1;thatis,L p (P Q) I particular, L (P Q) 1 ad L (P Q) Proof We first prove the geeral case From Corollary 7, it is sufficiet to prove that the L 1 -divergece is bouded by Let P = (p 1,p,,p ) ad Q = (q 1,q,,q ) be two probability distributios Without loss of geerality, let us assume that p 1 q 1, p q,,p k q k,ad p k+1 < q K+1,,p <q The,wehave L 1 (P Q) = = p i k k (p i )+ p i + ( p i ) p i + =1+1= (6) Notig that p i 1for 1 i,wehave(p i ) p i 1Hece, L (P Q) = L (P Q) = max 1 i p i 1, (p i ) p i = L 1 (P Q) Therefore, we have proved the particular cases (7) The followig result shows that the relative etropy D(P Q) islowerboudedbythesquareofthe L 1 (P Q)Itsproof ca be foud at [15]

5 Joural of Mathematics 5 Lemma 9 D (P Q) 1 (L 1 (P Q)) (8) Theorem 10 The square root of J(P Q) is lower bouded by L 1 (P Q);thatis, J (P Q) L 1 (P Q) (9) Proof Applyig lemma to D(P Q) ad D(Q P), we obtai D (P Q) +D(Q P) 1 (L 1 (P Q)) + 1 (L 1 (Q P)) =(L 1 (P Q)) (30) Note that the left had side is othig but J(P Q) The proof is completed by takig the square root o both sides Remark 11 J(P Q) ad hece J(P Q)are ubouded by ay costats This ca be see by takig P = (s, 1 s) ad Q = (1 s, s), 0 < s < 1,adtakiglimits 0 Sice L p -divergeces are all bouded by costat for p 1, J(P Q) ad hece J(P Q) are ot equivalet to L p -divergeces We ow establish the covexity property for L p - divergece, which is useful i optimizatio Theorem 1 L p (P Q) is covex i the pair (p, q), thatis, if (P 1,Q 1 ),ad(p,q ) are two pairs of probability distributios, the ( +( =(λ λ(p 1i q 1i ) p ) (1 λ) (p i q i ) p ) (p 1i q 1i ) p ) + (1 λ) ( (p i q i ) p ) =λl p (P 1 Q 1 )+(1 λ) L p (P Q ) (3) Here, the first iequality is from the well-kow Mikowski s Iequality It follows from the followig results that we ca geerate ifiitely may metric divergece measures usig the existig oes Propositio 13 If d 1 (P Q) ad d (P Q) are two metric divergece measures, so are the followig 3 measures: (1) α d 1 (P Q) + β d (P Q) for all α 0, β 0ad α+β>0, () max(d 1 (P Q), d (P Q)), (3) (d 1 (P Q)) +(d (P Q)) Proof The proof of (1) ad () is trivial ad hece will be omitted As for (3), it is sufficiet to verify the triagle iequality sice oegative, idetity, ad symmetry are all easy to verify To begi with, let us first prove a iequality: for ay oegative a, b, c, d, L p (λp 1 + (1 λ) P λq 1 + (1 λ) Q ) λl p (P 1 Q 1 )+(1 λ) L p (P Q ) (31) a +b +c +d + ac + bd a +b + c +d (33) for all 0 λ 1 Proof Let P 1 = (p 11,p 1,,p 1 ), P = (p 1,p,,p ), Q 1 =(q 11,q 1,,q 1 ),adq =(q 1,q,,q )The L p (λp 1 + (1 λ) P λq 1 + (1 λ) Q ) =( =( λp 1i + (1 λ) p i λq 1i (1 λ) q i p ) λ(p 1i q 1i )+(1 λ) (p i q i ) p ) It is easy to see that iequality (33) isequivalettothe followig iequality: ac + bd (a +b )(c +d ) (34) Iequality (34) is equivalet to the followig iequality: abcd a d +b c (35) Iequality (35) is equivalet to the followig iequality: (ad bc) 0 (36) Sice iequality (36) is always true, iequality (34)istrue

6 6 Joural of Mathematics Now, let us assume P, Q, R are 3 arbitrary probability distributios Sice d 1 (P Q) ad d (P Q) satisfy the triagle iequality, we have (d 1 (P Q)) +(d (P Q)) ([d 1 (P R) +d 1 (R Q)] +[d (P R) +d (R Q)] ) 1/ =(d 1 (P R) +d 1 (P R) d 1 (R Q) +d 1 (R Q) +d (P R) +d (P R) d (R Q) +d (R Q))1/ =([d 1 (P R) +d (P R)] +[d 1 (R Q) +d (R Q)] +d 1 (P R) d 1 (R Q) +d (P R) d (R Q)) 1/ d 1 (P R) +d (P R) + d 1 (R Q) +d (R Q) The last iequality results from iequality (33) (37) Remark 14 da Costa ad Taeja [16] showthat W s (P Q) ad V s (P Q)are metrics divergece measures for all s Sice Δ(P Q), I(P Q), h(p Q), d (P Q), J(P Q), T(P Q), ad ψ(p Q) are all costat factors of special cases of W s (P Q) or V s (P Q),theyareallmetric divergece measures by Propositio 13 Yet,da Costa ad Taeja did ot disprove or discuss ay applicatios of these divergece measures 3 Iformatio Value i Credit Scorig Iformatio value, or IV i short, is a widely used measure i credit scorig i the fiacial idustry It is a umerical value to quatify the predictive power of a idepedet cotiuous variable x i capturig the biary depedet variable y Mathematically, it is defied as [17] IV = ( g i g b i b ) l (g i/g ), (38) b i /b where istheumberofbisorgroupsofvari-ablex, g i ad b i are the umbers of good ad bad accouts with bi i,adg ad b are the total umber of good accouts ad bad accouts i the populatio Hece, g i /g ad b i /b are distributios of good accouts ad bad accouts Therefore, g i g = b i =1 (39) b Usually, good meas y=0ad bad meas y=1itcould be the other way, sice IV is symmetric about good ad bad If g i /g = b i /b for all,,,theiv=0;thatis,x has o iformatio o y IV is maily used to reduce the umber of variables as the iitial step i the logistic regressio, especially i big data with may variables IV is based o a aalysis of each idividual predictor i tur without takig ito accout the other predictors 31 IV ad WOE Oe advatage of IV is its close tie with weight of evidece (WOE), defied by l((g i /g)/(b i /b)) WOE measures the stregth of each grouped attribute i separatig good ad bad accouts Accordig to [17], WOE is the log of odds ratio, which measures odds of beig good Moreover, WOE is mootoic ad liear Yet, WOE is ot a accurate measure i that it is ot the log of odds ratio ad hece its liearity is ot guarateed Ideed, g i /g ad b i /b are from two differet probability distributios They represet the umber of good accouts i bi i divided by the total umber of good accouts i the populatio ad the umber of bad accouts i bi i divided bythetotalumberofbadaccoutsithepopulatio, respectively I geeral, g i /g+b i /b =1 ascabeseefrom Exhibit 6 i [17] To make WOE a log of odds, let us chage its defiitio to WOE1 =l ( g i/ i )=l ( g i )= l ( b i ) (40) b i / i b i g i ad deote it by WOE1 The cacelled i = g i +b i is the umber of accouts i bi i,adsog i / i +b i / i =1 As is well kow, the logistic regressio models the log odds, expressed i coditioal probabilities, as a liear fuctio of the idepedet variable; that is, P (Y =1 x) l ( P (Y =0 x) )=β 0 +β 1 x (41) Whe x falls ito bi i, l(p(y = 1 x)/p(y = 0 x)) becomes l(b i /g i )= l(g i /b i ) Hece, the WOE1 values are either cotiuously icreasig or cotiuously decreasig i aliearfashio IV ad WOE1 ca be used together to select idepedet variables for logistic regressio Whe a cotiuous variable x has a large IV, we make it a cadidate variable for logistic regressio if WOE1 values are liear It is commo to plot the WOE1 values versus the mea values of x at bi i 3ARuleofThumbofIVItuitively, the larger the IV, the more predictive the idepedet variable However, if IV is too large, it should be checked for over predictig For istace, x maybeapostkowledgevariable To quatify IV, a rule of thumb is proposed i [17, 18]: (i) less tha 00: upredictive, (ii) 00 to 01: weak, (iii) 01 to 03: medium, (iv) 03+: strog

7 Joural of Mathematics 7 I additio, mathematical reasoig of the rule of thumb isgivei[18] I more detail, IV ca be expressed as the average of likelihood ratio test statistics G(P, Q) ad G(Q, P) of Chi-square distributios with ( 1) degrees of freedom: IV = p i l ( p i )+q l ( ) i p i =G(P, Q) +G(Q, P) (4) The close relatioship betwee IV ad the likelihood ratio test allows usig the Chi-square distributio to assig a sigificace level However, this is doubtful O the oe had, G(P, Q) ad G(Q, P) are ot ecessarily idepedet O the other had, eve if they are idepedet, it is ot eough Let us assume that IV follows a Chi-square distributio with ( 1) degrees of freedom Yet, the critical values of the Chi-square distributio are too large compared with the values i the rule of thumb, as ca be see from the Chi-square table i may books about Probability, say [19] We oly list the first several rows of Table 1 Whe Table 1 growsasdficreases,thevaluesieachcolumwill icrease Oe may use the Excel fuctio CHIINV(p, df) or its ewer ad more accurate versio CHISQINVRT(p, df) to build Table 1, which returs the iverse of the right-tailed probability 1 pof the Chi-square distributio with df degrees of freedom Thecriticalvaluesareassmallasthevaluesitheruleof thumb oly whe the degrees of freedom are as small as 6 For istace, there is a probability of = 0995 that a Chi-square distributio with 6 degrees of freedom will be larger tha or equal to 068, that is, CHIINV (0995, 6) = 068 (43) Yet, there is a probability of 0995 that a Chi-square radom variable with 10 degrees of freedom will be larger tha or equal to 16 There is a probability of 0995 that a Chi-square radom variable with 18 degrees of freedom will be larger tha or equal to 66 O the basis of the above, the rule of thumb is more or less empirical 33 Calculatio of IV The calculatio of IV is simple oce biig is doe I this sese, IV is a subjective measure It depeds o how the biig is doe ad how may bis are used Differet biig methods may result i differet IV values, whereas the logistic regressio i the later stages will ot use the iformatio of these bis I practice, 10 or 0 bis are used The more the bis, the better the chace the good accouts will be separated from the bad accouts Yet, we caot divide the values of x idefiitely sice we may ot avoid 0 good accout or 0 bad accout i some bis To overcome the limitatio of the logarithm fuctio i the J-divergece, the biig should avoid 0 good accout or 0 bad accout i ay bis The idea of biig is to assig values of x with similar behaviors to the same group or bi I particular, the same values of x must fall ito the same bi A atural way of biig is to sort the data first ad the divide them ito bis with a equal umber of observatios (the last bi may have less umber of observatios) This works well if x has o repeatig values at all I reality, x ofte has repeatig values (called the tied values i statistics), which may cause problems whe the tied values of x fall ito differet bis Proc Rak i SAS serves, a good cadidate for biig (as opposed to fuctio cut i R) Whe there are o tied values i x, it simply divides the values of x ito bis Whe there are tied values i x, it treats the tied values by its optio TIES Proc Rak begis with sortig the values of x withi a BY group It the assigs each omissig value a ordial umber that idicates its rak or positio i the sequece I case of ties, optio TIES will be used Depedig o whether TIES = LOW, HIGH, or MEAN (default oe), the lowest rak, highest rak, or the average rak will be assiged to all the tied values Next, the followig formula is used to calculate thebiigvalueofeachomissigvalueofx: rak, (44) m+1 where is the floor fuctio, rak the value s rak, the umber of bis, ad m the umber of omissig observatios Note that the rage of the biig values is from 0 to 1 Fially, all the values of x arebiedaccordigto theirbiigvaluesicaseoebihaslesstha5%ofthe populatio, we may combie this bi with its eighborig bi To illustrate the use of Proc Rak with = 10 ad TIES=MEAN,letuslookataimagiarydatasetwithoe variable age ad 100 observatios Assume this dataset has bee sorted ad has fifty observatios with a value of 10, thirty observatios with a value of 0, te observatios with avalueof30,ieobservatioswithavalueof40,adoe observatiowithavalueof50 The first 50 observatios have a tied value of 10 Each of them will be assiged a average rak of k = 55 ad hece a biig value of (55 10)/101 = The ext 30 observatios have a tied value of 0 Each of them will be assiged a average rak of ( )/30 = 655 ad hece a biig value of (655 10)/101 = 6 The ext 10 observatios have a tied value of 30 Each of them will be assiged a average rak of ( )/10 = 855 ad hece a biig value of (855 10)/101 = 8 The ext 9 observatios have a tied value of 40 Each of them will be assiged a average rak of ( )/9 = 95 ad hece a biig value of (95 10)/101 = 9 Thelast observatio has a rak of 100 ad hece will be assiged a biig value of (100 10)/101 = 9 Isummary,the100 observatios are divided ito 4 bis: the first 50 observatios, the ext 30 observatios, the ext 10 observatios, ad the last 10 observatios Remark 15 Missig values are ot raked ad are left missig i Proc Rak Yet, they may be kept i a separate bi by meas of Proc Summary or Proc Meas i the calculatio of IV Remark 16 If x has less tha k differet values, the umber of bis by Proc Rak will be less that k

8 8 Joural of Mathematics Table 1: Chi-square table DF P = After biig is doe for x, asimplesasprogramca be writte to calculate IV Meawhile, WOE1 are calculated per bi as for WOE i [17] If IV is less tha 00, we will throw this idepedet variable If IV is large tha 03, over predictig will be checked If IV is betwee 00 ad 03 ad WOE1 are liear, we will iclude this idepedet variable as a cadidate variable i logistic regressio If IV is betwee 00 ad 03 but WOE1 are ot liear, we may make trasformatios of the idepedet variable to make WOE1 more liear If a trasformatio ca preserve the rak of the origial idepedet variable, the biig by Proc Rak will be preserved Therefore, we have obtaied the followig result Propositio 17 IV, whe biig by Proc Rak, is ivariat uder ay strictly mootoic trasformatios 34 Mathematical Properties of IV IV is the iformatio statistic for the differece betwee the iformatio i the good accouts ad the iformatio i the bad accouts Ideed, IV is the J-divergece with distributios of good accouts ad bad accouts Thus, IV is lower bouded by the square of L 1 (P Q) by Theorem 10 Property 1 IV (L 1 (P Q)) Property IV satisfiesthe iequalities (15); that is, 1 Δ (P Q) I(P Q) h(p Q) 4 4d(P Q) IV T(P Q) 8 1 ψ (P Q) 16 I particular, IV is upper bouded by ψ(p Q): IV 1 (45) (p i ) (p i + ) p i (46) Note that there is a direct proof of the above iequality, which is much easier tha that i [14] Let us assume without loss of geerality that p 1 q 1, p q,,p k q k,adp k+1 < q K+1,,p <q Proof Usig the idetity p i / = 1 + ((p i )/q 1 ) ad makig Taylor s expasio of fuctio l(p i / ) aroud 1 for,,,k,weobtai l ( p i )= l (1) + 1 p i 1 (1+ξ) (p i ) 1 (47) p i Multiplyig p i 0 ad summig up from 1 to k, we obtai k (p i ) l ( p i ) 1 k (p i ) 1 (p i ) (48) Similarly, l( /p i ) (1/)(( p i )/p i ) Multiplyig p i 0 ad summig up from k to,weobtai ( p i ) l ( ) 1 p i 1 The proof is completed by otig that (p i ) l ( p i ) = = k k (p i ) l ( p i )+ (p i ) l ( p i )+ ( p i ) p i (p i ) p i (p i ) l ( p i ) ( p i ) l ( p i ) (49) (50)

9 Joural of Mathematics 9 Table : Outlier domiatio to IV Bi Cout Goods Distr good Bads Distr bad IV cotr % % % % Total % % Theorem 18 IV is covex i the pair (p, q) Proof Applyig Theorem 7 from [15] to both D(P Q) ad D(Q P),weobtai J(λP 1 + (1 λ) P λq 1 + (1 λ) Q ) Weight of evidece Age Figure 1: Logical Tred of WOE1 for Variable Age =D(λP 1 + (1 λ) P λq 1 + (1 λ) Q ) +D(λQ 1 + (1 λ) Q λp 1 + (1 λ) P ) λd(p 1 Q 1 )+(1 λ) D(P Q ) +λd(q 1 P 1 )+(1 λ) D(Q P ) =λ(d(p 1 Q 1 )+D(Q 1 P 1 )) + (1 λ)(d (P Q )+D(Q P )) =λj(p 1 Q 1 )+(1 λ) J(P Q ) (51) cabealterativestoiviparticular, L p (P Q)(p 1)are good alteratives to IV They overcome all the 3 shortcomigs of IV: (1) L p (P Q)are all metrics; () they allow bis to have 0 bad accouts or 0 good accouts; ad (3) They all have a much arrow rage, from 0 to While L P (P Q)dootseemtohaveatiewithweight of evidece, they ca be as quatifiable as IV For istace, we may adopt a rule of thumb for L P (P Q): (i) weak: 6% of its upper boud, (ii) medium: 6% to 30% of its upper boud, (iii) strog: Larger tha 30% of its upper boud Remark 19 J(P Q)is ot covex albeit a metric Property 3 If more tha 95% of populatio of x have the same value, the IV = 0 I particular, if x has just oe value, the IV = 0 Proof Assume more tha 95% of populatio of x have the same value x 0 The,allthepopulatiowithvaluex 0 will fall ito the same bi, called the majority bi The rest of populatio whose values are differet from x 0 will be combied ito the majority bi Thus, there will be oly oe bi for all the values of x Therefore g 1 =g, b 1 =b, ad hece IV = 0 Remark 0 Ifthepopulatiowhosevaluesareot x 0 are ot combied ito the majority bi, the IV could be larger tha 00 As show i Table, x has observatios, where 958% or 9580 observatios have the same value, say, ad the rest of 40 observatios have aother value, say 4 Both bis cotribute a value larger tha 00 to IV Statistically, 4% of the populatio are outliers ad ca be eglected Hece, it is more meaigful to say x has o iformatio to y 35 Alteratives to IV As we have see above, IV has 3 shortcomigs: (1) it is ot a metric; () o groups are allowed to have 0 bad accouts or 0 good accouts; ad (3) its rage is too broad, from 0 to Theoretically, ay divergece measures of the differece or distace betwee good distributios ad bad distributios I particular, (i) weak: 01 for L 1 (P Q), 0085forL (P Q), ad 006 for L (P Q), (ii) medium: 01 to 060 for L 1 (P Q),0085to044for L (P Q), ad 006 to 030 for L (P Q), (iii) strog: 060+ for L 1 (P Q), 044+forL (P Q), 03+ for L (P Q) Remark 1 If L 1 (P Q) > 01,theIV> The lower boud 01 of L 1 (P Q) cabeadjusted as eeded It ca also be combied with IV to ehace the accuracy For istace, for the umber of idepedet variables is large eough, we may select oly those which satisfy both lower bouds of L 1 (P Q)ad IV 4 Numerical Results To illustrate our results, we use Exhibit 6 i [17]butaddoe colum for WOE1 We use the real WOE, ot its more userfriedly form 100 times WOE From Table 3,weseethatL <L <L 1 < IV From Figure 1, we also see that the WOE1 for omissig values has alieartredforvariableage 5 Coclusios I this paper, we have proposed a class of metric divergece measures, amely, L p (P Q), p 1, ad studied their mathematical properties We studied iformatio value, a

10 10 Joural of Mathematics Table 3: Calculatio of IV ad WOE Age Cout Tot distr Goods Distr good Bads Distr bad WOE WOE1 Missig % % % % % 960 5% % % % % % % % % % % 00 51% % % % Total % % % IV = 06681,ad IV = L 1 =06684,L =0987,adL =01659 importat divergece measure widely used i credit scorig After explorig the mathematical reasoig of a rule of thumb ad weight of evidece, we suggested a alterative to weight of evidece Fially, we proposed usig L p (P Q) as alteratives to iformatio value to overcome its disadvatages Refereces [1] I J Taeja, Geeralized relative iformatio ad iformatio iequalities, Joural of Iequalities i Pure ad Applied Mathematics,vol5,o1,pp1 19,004 [] E Helliger, Neue begrüdug der theorie der quadratische forme vo uedliche viele veräderliche, Joural Für Die Reie ud Agewadte Mathematik, vol136,pp10 71, 1909 [3] C E Shao, A mathematical theory of commuicatio, The Bell System Techical Joural,vol7,pp379 43,1948 [4] S Kullback ad R A Leibler, O iformatio ad sufficiecy, Aals of Mathematical Statistics,vol,pp79 86,1951 [5] H Jeffreys, A ivariat form for the prior probability i estimatio problems, Proceedigs of the Royal Society,vol186, pp ,1946 [6] F Topsøe, Some iequalities for iformatio divergece ad related measures of discrimiatio, Istitute of Electrical ad Electroics Egieers, vol 46, o 4, pp , 000 [7] S S Dragomir, J Šude, ad C Buşe, New iequalities for Jeffreys divergece measure, Tamsui Oxford Joural of Mathematical Scieces,vol16,o,pp95 309,000 [8] K Pearso, O the criterio that a give system of deviatios from the probable i the case of correlated system of variables is such that it ca be reasoable supposed to have arise from radom samplig, Philosophical Magazie, vol 50, pp , 199 [9] R Sibso, Iformatio radius, Zeitschrift für Wahrscheilichkeitstheorie ud Verwadte Gebiete, vol14,o,pp , 1969 [10] J Burbea ad C R Rao, Etropy differetial metric, distace ad divergece measures i probability spaces: a uified approach, Joural of Multivariate Aalysis, vol 1, o 4, pp , 198 [11] J Burbea ad C R Rao, O the covexity of some divergece measures based o etropy fuctios, Istitute of Electrical ad Electroics Egieers,vol8,o3,pp ,198 [1] I J Taeja, New developmets i geeralized iformatio measures, i Advaces i Imagig ad Electro Physics, P W Hawkes,Ed,vol91,pp37 136,1995 [13] N Cressie ad T R C Read, Multiomial goodess-of-fit tests, JouraloftheRoyalStatisticalSocietyB, vol 46, o 3, pp , 1984 [14] I J Taeja, Geeralized Symmetric Divergece Measures ad Iequalities, RGMIA Research Report Collectio, vol 7, o 4, 004 [15] TMCoveradJAThomas,Elemets of IformatioTheory, Joh Wiley & Sos, New York, NY, USA, 1991 [16] G A T F da Costa ad I J Taeja, Geeralized Symmetric Divergece Measures ad Metric Spaces, ComputigResearch Repository, 011 [17] N Siddiqi, Credit Risk Scorecards Developig ad Implemetig Itelliget Credit Scorig,JohWiley&Sos,006 [18] N Siddiqi, Credit Risk Scorecards Developmet ad Implemetatio Usig SAS, LULU, 011 [19] D Dowig ad J Clark, Barro s E-Z Statistics, Barro s Educatioal Series, 009

11 Advaces i Operatios Research Volume 014 Advaces i Decisio Scieces Volume 014 Joural of Applied Mathematics Algebra Volume 014 Joural of Probability ad Statistics Volume 014 The Scietific World Joural Volume 014 Iteratioal Joural of Differetial Equatios Volume 014 Volume 014 Submit your mauscripts at Iteratioal Joural of Advaces i Combiatorics Mathematical Physics Volume 014 Joural of Complex Aalysis Volume 014 Iteratioal Joural of Mathematics ad Mathematical Scieces Mathematical Problems i Egieerig Joural of Mathematics Volume Volume 014 Volume Volume 014 Discrete Mathematics Joural of Volume Discrete Dyamics i Nature ad Society Joural of Fuctio Spaces Abstract ad Applied Aalysis Volume Volume Volume 014 Iteratioal Joural of Joural of Stochastic Aalysis Optimizatio Volume 014 Volume 014

Research Article A Unified Weight Formula for Calculating the Sample Variance from Weighted Successive Differences

Research Article A Unified Weight Formula for Calculating the Sample Variance from Weighted Successive Differences Discrete Dyamics i Nature ad Society Article ID 210761 4 pages http://dxdoiorg/101155/2014/210761 Research Article A Uified Weight Formula for Calculatig the Sample Variace from Weighted Successive Differeces