Bounds for rating override rates

Bunds fr rating verride rates Dirk Tasche First versin: March 10, 2012 This versin: July 13, 2012 arxiv:1203.2287v4 [q-fin.rm] 31 Aug 2012 Overrides f credit ratings are imprtant crrectives f ratings that are determined by statistical rating mdels. Financial institutins and banking regulatrs agree n this because n the ne hand errrs with ratings f crprates r banks can have fatal cnsequences fr the lending institutins and n the ther hand errrs by statistical methds can be minimised but nt cmpletely avided. Nnetheless, rating verrides can be misused in rder t cnceal the real riskiness f brrwers r even entire prtflis. That is why rating verrides usually are strictly gverned and carefully recrded. It is nt clear, hwever, which frequency f verrides is apprpriate fr a given rating mdel within a predefined time perid. This paper argues that there is a natural errr rate assciated with a statistical rating mdel that may be used t infrm assessment f whether r nt an bserved verride rate is adequate. The natural errr rate is clsely related t the rating mdel s discriminatry pwer and can readily be calculated. Keywrds: Credit rating, rating verride, discriminatry pwer, accuracy rati, misclassificatin rate. 1 Intrductin Overrides f credit ratings generated by statistical mdels are a smewhat cntrversial tpic. Financial institutins and regulatrs alike acknwledge that statistical mdels while being useful fr the acceleratin f rating prcesses and quantificatin f rating results, in principle are still inferir t careful expert assessments f creditwrthiness. Nnetheless, even experienced credit experts culd undeliberately be biased t under- r verestimate the creditwrthiness f certain brrwers r grups f brrwers. The fllwing cmment by the Basel Cmmittee n Banking Supervisin (BCBS, 2006, extract frm paragraph 417) cnfirms and extends these bservatins: Credit scring mdels and ther mechanical rating prcedures generally use nly a subset f E-mail: dirk.tasche@gmx.net The authr currently wrks at the UK Financial Services Authrity. The pinins expressed in this paper are thse f the authr and d nt necessarily reflect views f the Financial Services Authrity. 1

available infrmatin. Althugh mechanical rating prcedures may smetimes avid sme f the idisyncratic errrs made by rating systems in which human judgement plays a large rle, mechanical use f limited infrmatin als is a surce f rating errrs. Credit scring mdels and ther mechanical prcedures are permissible as the primary r partial basis f rating assignments, and may play a rle in the estimatin f lss characteristics. Sufficient human judgement and human versight is necessary t ensure that all relevant and material infrmatin, including that which is utside the scpe f the mdel, is als taken int cnsideratin, and that the mdel is used apprpriately. While it is generally expected that verrides imprve the quality f rating assignments 1, the Basel Cmmittee has a clear view n the need t mnitr verriding activity (BCBS, 2006, paragraph 428): Fr rating assignments based n expert judgement, banks must clearly articulate the situatins in which bank fficers may verride the utputs f the rating prcess, including hw and t what extent such verrides can be used and by whm. Fr mdel-based ratings, the bank must have guidelines and prcesses fr mnitring cases where human judgement has verridden the mdels rating, variables were excluded r inputs were altered. These guidelines must include identifying persnnel that are respnsible fr apprving these verrides. Banks must identify verrides and separately track their perfrmance. This paper is abut analysing the perfrmance f the verrides f the utput f statistical rating mdels. Hw t d this is by n means bvius. Lk fr instance at the case where a brrwer s rating is suggested t be pr by a statistical mdel but is verridden t a high quality rating grade. Assume that the brrwer afterwards defaults within the fllwing twelve mnths. At first glance, this incident might be cnsidered clear evidence f an unjustified verride. It culd, hwever, just be an ccurrence f bad luck. Only if we bserved a significant number f such utcmes f verrides we wuld be able t draw a cnclusin n systematic bias f the verrides. Frm this example, we can derive a first perfrmance criterin fr rating verrides: The discriminatry pwer f a rating system pst verrides shuld nt be lwer than the discriminatry pwer f the system withut the verrides. Anther criterin, als based n a significant sample f verrides, relates t the tendency f the verrides: If the majrity f the verrides is twards rating grades indicating prer credit quality ne has t investigate whether r nt this is caused by underestimatin f the prbabilities f default (PDs) 2 assciated with the grades f the rating system. Careful analysis shuld be applied t the reasns quted as causing the verrides: They culd give indicatins f risk drivers nt captured by the statistical mdel r n lnger being predictive f creditwrthiness. Financial institutins shuld als lk at the frequency f verrides applied t the utcmes f rating mdels. Intuitively, bserving a high frequency f verrides might indicate that smething is wrng with the mdel. This is why mst financial institutins mnitr the verride frequencies 1 The empirical evidence f the impact f verrides n rating perfrmance is nt unambiguus. Thus, in a recent study Brwn et al. (2012) fund that verall, ur results suggest that the widespread use f discretin by lan fficers may nt result in mre accurate assessments f the creditwrthiness f brrwers. This finding, hwever, might be a cnsequence f the fact that mst f the banks prviding data fr the study were nt subject t the minimum requirements f the Basel II Internal Ratings Based Apprach. 2 In the fllwing we will ften use the acrnym PD fr prbability f default. 2

f their whlesale rating systems and, typically, trigger crrective actins when pre-defined bunds fr the verride rates are exceeded. What, hwever, des high frequency mean? Wuld it be 10% r rather 40% f the ratings assigned within a year? By intuitin ne wuld say that the critical frequency f verrides t indicate prblems with a statistical rating mdel shuld be related t the discriminatry pwer f the mdel. As demnstrated in figure 1, a mdel with lw pwer needs mre crrectins in the shape f verrides than a mdel with high pwer because the verlap between the rating prfiles f defaulting and slvent brrwers is larger fr lw pwer rating mdels. In this paper, we suggest a methd t determine bunds fr verride rates that shuld nt be exceeded if all were right with the underlying statistical mdel. We then investigate the link between the discriminatry pwer f the statistical rating mdel and the prpsed rating verride bunds, finding that indeed the bunds are the tighter the mre pwerful the mdel is. We als demnstrate hw ex ante estimates f discriminatry pwer can be inferred frm the PD curves assciated with statistical rating mdels. Such estimates may then be used t derive the verride rate bunds needed fr effective mnitring. This is particularly useful in the case f lw default prtflis where due t the lack f default bservatins mnitring and validatin t a large extent must be based n the analysis f verrides. The paper is rganised as fllws: In sectin 2 we discuss the cnnectin between rating errr rates and the statistical ntin f misclassificatin rate. We argue that the misclassificatin rate related t a particular classificatin rule prvides a suitable apprximate upper bund fr the rating errr rate and therefre als fr the rate f verrides f the utputs f a statistical rating mdel. In sectin 3 we describe hw t calculate the verride rate bunds in the special cases f a rating mdel with a finite number f grades and a rating mdel with nrmally distributed cnditinal scre distributins. The case f the cnditinal nrmal distributins, in particular, allws us t study the cnnectin between verride rate bunds and discriminatry pwer. We als present sme numerical examples t illustrate this cnnectin. In sectin 4 we discuss hw the prpsed bund fr rating verride rates fits int an illustrative framewrk fr the mnitring f rating verrides. Sectin 5 summarises and cncludes the paper. A nte n the terminlgy. In this paper, we study rating verrides in the cntext f rating systems with a small finite number k f grades 3. Fr the purpse f the paper, it is assumed that a statistical mdel is used t determine prpsed ratings which may then be cnfirmed r verridden by experts t give the final ratings. The direct utput f the statistical mdel is called scre and may be discrete with a large (cmpared t the number f rating grades k) range f scres r n a cntinuus scale. The range f the scres is decmpsed int a number f disjint intervals each f which is then mapped t ne f the rating grades, thus generating the prpsed ratings. The decmpsitin f the scre range fr the mapping t the grades usually is based n the prbabilities f default assciated with the scres. 3 In rder t achieve cmparability f their internal ratings t agency ratings, financial institutins smetimes k chse k as seven, the number f perfrming unmdified grades used by the majr rating agencies, r as seventeen, the number f perfrming mdified grades used by the agencies. In the latter case, grade 17 wuld crrespnd t S&P grade AAA, grade 16 t S&P grade AA+ etc. until grade 1 that wuld crrespnd t S&P grade CCC. 3

Figure 1: Uncnditinal and cnditinal rating distributins. The distributins cnditinal n default and survival respectively have been inferred frm the uncnditinal distributin under the assumptin that the uncnditinal PD is 5% and the accuracy rati assciated with the rating mdel is either lw at 25% r high at 75%. See example 3.3 fr details f the calculatins. Accuracy rati = 0.75 Prbability 0.00 0.05 0.10 0.15 Defaulters Uncnditinal Survivrs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Grade Prbability 0.00 0.04 0.08 0.12 Accuracy rati = 0.25 Defaulters Uncnditinal Survivrs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Grade Cnventin. Lw values f the mdel utput scre indicate lw creditwrthiness, high values f the scre indicate high creditwrthiness. The mapping f the scres t the rating grades (expressed as psitive integers) is increasing, i.e. high grades mean high creditwrthiness. 2 Apprximating rating errr rate by misclassificatin rate In the fllwing, we assume that fr every real-life rating mdel there are unavidable rating errrs because defaults are unpredictable. We want t determine an apprximatin fr the prprtin f such unavidable rating errrs cmpared t all rating actins. We als assume that the rating mdels we cnsider are crrectly calibrated such that n verrides are needed in rder t accunt fr PD estimatin bias. Fr the purpse f this paper, such verrides wuld be cnsidered avidable (by recalibratin f the rating mdel). Establishing bunds fr unavidable errrs will help t identify verrides that are due t miscalibratin and thus in principle avidable. 4

2.1 The typical prcess fr rating verrides Let us lk at a typically whlesale (e.g. crprates r financial institutins) prtfli f brrwers that are in the scpe f applicatin f the rating system under cnsideratin. A rating actin fr ne f the brrwers cnsists f three steps: Fr the first step, a statistical mdel is applied t data (risk factrs) related t the brrwer, typically financial ratis frm the brrwer s balance sheet and/r qualitative assessments by credit fficers marked up n a standard scale. The result f this step is a scre s. Fr the secnd step, the scre s is mapped (e.g. by means f a lk-up table) t a grade g {1,..., k}. The grade g is called prpsed rating. In the third step, the rating prpsal g is reviewed by a credit expert r a cmmittee f credit experts. The experts can decide t accept the rating prpsal and assign the brrwer the final rating g = g. But the experts can als decide t reject the rating prpsal and assign the brrwer a final rating g g (but with g {1,..., k}). If a prpsed rating is rejected, the experts have t recrd bth the prpsed and the final ratings and the ratinale fr the rejectin and the chice f the final rating. The ccurrence f g g in step three is called verride (f the prpsed rating). In mst credit institutins verrides are subject t certain restrictins which culd include the fllwing: Overrides are nt allwed if the prpsed ratings are better than r equal t sme threshld grade k < k. This rule might be established when it is felt that the relative differences between the better rating grades are s small that the grades cannt be meaningfully differentiated by the human mind. Overrides are nly allwed if the final ratings are significantly different t the prpsed ratings, e.g. if g g b fr sme threshld value b. Such a rule might again be based n the intuitin that the human mind cannt differentiate between adjacent rating grades. Only dwngrade verrides are allwed. This rule wuld express a desire fr rating cnservatism. If n is the number f rating actins 4 with the rating system under cnsideratin within a predefined time perid (e.g. ne year), resulting in pairs (g 1, g 1),..., (g n, g n ) f prpsed and final ratings, then the verride rate is defined as the rati f the number f rating actins with different prpsed and final ratings and the ttal number f rating actins: 2.2 Right and wrng ratings Override rate = #{i : g i g i}. (2.1) n Frm (2.1) we get the bvius upper bund f 100% fr verride rates nt very satisfactry. It might then seem a natural idea t try t derive an expected verride rate fr a statistical 4 The same brrwer might be rated several times but usually each brrwer needs t be rated at least nce per year. 5

rating mdel by analysing the decisin prcesses f the credit experts charged with reviewing the rating prpsals. One culd argue that credit experts are likely t base their verride decisns n cmparisns t brrwers rated in the past. This wuld mean that the experts lk fr similarities between the brrwer t be reviewed and the brrwers rated earlier. Such similarities might relate t financial cnditins, management quality r ther descriptive infrmatin abut the brrwers. Hence the prcedure by credit experts t decide abut pssible verrides might be described in mathematical terms as a methdlgy similar t the m nearest neighburs (m-nn) methdlgy (see, e.g., Hand, 1997, sectin 5.3), which is well-knwn in fields like Pattern Recgnitin r Statistical Classificatin. Hwever, this analgy is nt a prmising path t the predictin f verride rates. Assuming we were able t identify the number m f neighburs t be included in the m-nn analysis, we wuld end up with develping anther rating system especially fr the verride decisin prcess. But there is n reasn t be sure that such a cmpetitr rating wuld be mre predictive f creditwrthiness than the statistical mdel under review. A mre prmising apprach t bunds fr verride rates is t start frm an interpretatin f verrides as crrective actins. Then all wrng in a sense t be determined ratings prpsed by the statistical mdel shuld be verridden. Hence the prprtin f wrng prpsed ratings wuld be the natural verride rate. As in practice it is impssible t identify wrng ratings with certainty the natural verride rate wuld be an upper bund fr the number f verrides. This fllws frm the fllwing cnsideratins: Assume the bserved verride rate exceeds the natural verride rate. Then there are tw pssible explanatins fr this. The credit experts culd have verridden t many ratings and, hence, have turned sme crrect ratings int wrng ratings. Or the prprtin f wrng ratings is higher than expected and, hence, the discriminatry pwer f the statistical mdel is lwer than predicted. Bth explanatins imply a deficiency in the rating prcess that shuld be rectified. Assume the bserved verride rate is less than the natural veride rate. Again there are tw pssible explanatins. The credit experts culd have verridden t few ratings and, hence, have failed t amend all wrng ratings. Or the prprtin f wrng ratings is lwer than expected and, hence, the discriminatry pwer f the statistical mdel is higher than predicted. While the first explanatin may r may nt indicate a deficiency in the rating prcess, the secnd explanatin actually wuld be gd news. With a view n the ptential cnsequences f these tw cases ( verride rate exceeds the natural verride rate and verride rate is belw the natural verride rate ), we bserve that the first case is clearly mre f a cncern than the secnd case. Fr the first case indicates with certainty a deficiency f the rating prcess while in the secnd case there is a pssibility that the statistical mdel actually is better than expected. This is why in practice the natural verride rate can nly be regarded as an upper bund fr the bserved verride rate whse breach shuld trigger crrective actin. S far, s well but hw d we knw which prprtin f rating prpsals is wrng? Clearly, if we knew which wuld be the right r maybe nly the mst predictive ratings we wuld be dne because then we culd insert the right ratings as the final ratings g i in (2.1). Interestingly 6

enugh, if we assume that the mst predictive ratings can be determined based n the analysis f a finite number f risk factrs, then in thery a mst predictive statistical rating mdel exists. This fllws frm the Neyman-Pearsn lemma (see, e.g., Casella and Berger, 2002, Therem 8.3.12) that identifies the mst predictive statistic fr the test f tw simple hyptheses as the rati f the tw multi-variate jint densities f the risk factrs n the defaulter ppulatin and the survivr ppulatin respectively. Unfrtunately, even if it all relevant risk factrs culd be identified, in practice it wuld be impssible t estimate accurately enugh the tw risk factr densities. Alternative candidates fr the mst predictive ratings might be agency ratings as well as the results f expert ranking exercises. Hwever, bth agency ratings and expert rankings are nt always available. Mre imprtantly, even if the quality f agency ratings fr crprates is gd in general, there is n reasn t cnsider them the right ratings, implying the right ranking. If fr nthing else, this fllws frm the fact that ratings frm different agencies fr the same entities cincide ften but nt always. By exceptin, the assessments by different agencies may differ significantly. Similarly, fr samples f brrwers ranked by experts, the resulting rankings will depend mre ften than nt n the selectin f the experts and pssibly als n the way the assessments by the single experts are cmbined t ne ranking. S, as it prves difficult t cmpare the statistical rating mdel in questin t right r mst predictive ratings, why nt cmparing it t a perfect (r prphetic ) rating system? This apprach might seem strange at first glance, but it turns ut t be viable even if it delivers nly an apprximatin t the natural verride rate. A perfect rating system bviusly needs nly tw grades: Default and survival. We have t define a mapping frm the rating grades that are assigned by the statistical rating mdel in questin t the tw states f default and survival in rder t be able t calculate which prtin f the brrwers in the prtfli wuld have t be mved were the distributin f brrwers acrss the range f prpsed grades t be transfrmed int a distributin accrding t a perfect rating system. Fr this purpse, we assume that the perfect rating system fr the cmparisn is realised n the rating scale f the rating mdel under cnsideratin. This implies that there is a multitude f perfect rating systems that culd be used as the target fr the cmparisn. Fr each rating system that assigns defaulters and survivrs t disjint sets f grades is perfect. T devise the mapping frm the prpsed ratings t the perfect ratings, inspectin f the cncept f investment and speculative grades emplyed by the majr rating agencies prves helpful. Mdy s define Investment grade as the cmbinatin f grades Aaa, Aa, A, and Baa while they call Speculative grade the cmbinatin f grades Ba, B, and Caa-C. Similarly, S&P define Investment grade as the cmbinatin f grades AAA, AA, A, and BBB and Speculative grade as the cmbinatin f grades BB, B, and CCC-C. Table 1 presents lng-run average annual default rates fr investment and speculative grade brrwers as bserved by Mdy s and S&P. On the ne hand, these default rates shw that, frm a credit default perspective, Mdy s and S&P s investment grade and speculative grade definitins are bradly equivalent. On the ther hand, and mre imprtant fr this paper, the bserved default rates supprt the descriptin f investment grade brrwers as very unlikely t default within a year and f speculative grade brrwers as at significant risk f default within 7

Table 1: Average annual default rates as recrded by Mdy s (2011, Exhibit 35) and S&P (2011, Table 24). Agency Investment grade Speculative grade Observatin perid Mdy s 0.095% 4.944% 1983-2010 S&P 0.13% 4.36% 1981-2010 a year. One might even be tempted t talk abut investment grade brrwers as safe brrwers and abut speculative grade brrwers as risky brrwers. This can be seen as a reasnable apprximatin f a perfect rating system with the tw grades defaulter and survivr nly. Intuitively, a carse classificatin f brrwers int safe and risky shuld be easier t achieve than the much finer differentiatin envisaged by the rating agencies and mst internal rating systems f financial institutins. Indeed, the restrictins f verrides discussed in sectin 2.1 might be ratinalised by the cnsideratin that it is primarily the safe vs. risky differentiatin that ne has t get right. We adpt this view fr ur apprach t the identificatin f bunds fr verride rates with natural errr rates. Hence, we will define the natural errr rate as the misclassificatin rate in a tw-state carse rating system that is derived frm the statistical rating mdel in questin by cmbining a suitable subset f grades t a safe super-grade and the cmplementary subset f grades t a risky super-grade. 2.3 Natural errr rate We cnsider first hw t determine the sets f scres r grades that define the tw super-grades safe and risky. Then we discuss hw t calculate the crrespnding misclassificatin rate that will be defined as the natural errr rate assciated with the statistical rating mdel in questin. Actually, these tw prblems can be slved in a fully general cntext. Hence althugh in principle we nly need t investigate the cases f a rating variable with a finite range f values and f a cntinuus real-valued scre variable, the majr part f the fllwing discussin applies t any randm variable S with values in a general measurable space nly minr mdificatins f the ntatin might be necessary. In practice the measurable space will be a multi-dimensinal Euclidian space R n when we are dealing with the risk factrs infrming a rating mdel, an pen real interval I R when we are talking abut the scre utput f a rating mdel, r a finite rdered set (withut lss f generality {1, 2,..., k}, with k being the number f perfrming rating grades) when the discussin is abut the ratings prpsed by a rating mdel r the final ratings after verrides. Ntatin. We assume that the cnditinal and uncnditinal distributins f S have densities with respect t a suitable measure. T make it clear when a statement r an equatin is general we adpt the likelihd functin ntatin where a likelihd l can stand bth fr a prbability density functin in a cntinuus r general cntext r fr a prbability mass functin in a discrete 8

cntext. Speaking in technical terms, in this paper we study the jint distributin f a pair (S, Z) f randm variables. As mentined befre, the meaning f the variable S culd be a vectr f risk factrs, a rating mdel scre, r a rating grade, bserved fr a slvent brrwer at a fixed date. The variable Z is the brrwer s state f slvency ne bservatin perid (usually ne year) after S was bserved. Z takes n the tw values 0 and 1. The meaning f Z = 0 is brrwer has remained slvent (slvency r survival), Z = 1 means brrwer has becme inslvent (default). We write D fr the event {Z = 1} and N fr the event {Z = 0}. Hence D N = {Z = 1} {Z = 0} =, D N = whle space. (2.2) The marginal distributin f the state variable Z is characterised by the uncnditinal prbability f default p which is defined as the prbability f the default event D: p = P[D] = P[Z = 1] (0, 1). (2.3) The jint distributin f (S, Z) then can be specified by p and the tw distributins f S cnditinal n the states f Z (i.e. the events D and N respectively). Mst imprtant are the cases where the cnditinal distributins are given by discrete cnditinal prbabilities l D (s) = P[S = s D] and l N (s) = P[S = s N], s = 1,..., k, r by Lebesgue densities l D and l N. In the latter case, the prbabilities f S taking a value in a set M cnditinal n default and survival respectively can be written as P[S M D] = l D (s) ds and P[S M N] = l N (s) ds. (2.4) M The jint distributin f the pair (S, Z) f the scre and the brrwer s state ne perid later can als be specified by the uncnditinal distributin P[S ] f S (i.e. the scre r rating prfile) and the PDs P[D S] = 1 P[N S] cnditinal n S. In the special cases we have mentined befre, by Bayes rule we have the fllwing representatins f the cnditinal PDs: S is understd as rating grade, i.e. S {1, 2,..., k}. Then M P[D S = s] = p P[S = s D], s {1, 2,..., k}. (2.5a) p P[S = s D] + (1 p) P[S = s N] S is a cntinuus scre variable r a vectr f cntinuus risk factrs such that there are Lebesgue densities l N and l D f the distributins cnditinal n default and survival as in (2.4). Then p f D (s) P[D S = s] = p f D (s) + (1 p) f N (s). (2.5b) With the likelihd ntatin, (2.5a) and (2.5b) can be expressed in ne equatin: P[D S = s] = p l D (s) p l D (s) + (1 p) l N (s). (2.5c) Thanks t (2.5c) the cnditinal PDs are determined by the uncnditinal PD and the distributins f the scre cnditinal n default and survival. The cnditinal and the uncnditinal 9

PDs are the infrmatin we need t determine the intended split f the scre r rating range int the tw super-grades safe and risky. The split suggested is based n the minimisatin f the s-called expected misclassificatin cst (see, e.g., Hand and Till, 2001). Prpsitin 2.1 Assume that the expected cst f misclassifying a defaulting brrwer as slvent is c D > 0 and that the expected cst f misclassifying a slvent brrwer as defaulting is c N > 0. Define the events D and N as in (2.2) and the uncnditinal PD p as in (2.3). Assume that a brrwer is classified as defaulting if and nly if the brrwer s scre (r rating r vectr f risk factrs) S takes n a value in a predefined set A. Then the expected cst C f misclassificatin is given by C = c D p P[S / A D] + c N (1 p) P[S A N]. (2.6) The expected cst C f misclassificatin is minimal if and nly if the set A is chsen as the fllwing set A : { } A c N = P[D S] >. (2.7) c N + c D See, fr instance, Nechval et al. (2010, Therem 2) fr a prf f prpsitin 2.1. Nte that (2.7) defines a risky super-grade in the sense that the average PD cnditinal n the scre r rating n the set A is greater than the average PD cnditinal n the scre r rating n the cmplement f the set A : E [ P[D S] S A ] > c N c N + c D E [ P[D S] S / A ]. (2.8) Hw t chse the cst parameters c D and c N in (2.7)? Hand (2009, sectin 4) has cmmented that in general it is extremely difficult t determine values fr the cst parameters. Even as it bviusly suffices t specify the rati f the cst parameters, mst f time it appears difficult t cme t a meaningful assessment. Nnetheless, as nted by Hand (2009, sectin 4), chsing the cst parameters inversely prprtinal t the uncnditinal class prbabilities is ppular. In the setting f this paper, this means c D = 1/p and c N = 1/(1 p). (2.9) In a cntext f risky lending, (2.9) is actually a quite reasnable chice. Assume that the credit decisin is abut a lan with principal M and that the expected lss rate in the case f default is 0 < λ 1. Then the effective (after cst f funding) interest charged fr the lan shuld be at least a prvisin fr the expected lss which equals M λ p. If we prudentially assume that the interest is paid in advance then the lending institutin s lss in the case f lending t a defaulter will be M λ M λ p = M λ (1 p). (2.10a) The lss in case f nt lending t a custmer wh turns ut nt t be a defaulter will be (at least) M λ p. Tgether with (2.10a) this implies fr the rati f the misclassificatin cst parameters M λ (1 p) M λ p = 1/p 1/(1 p) = c D c N, (2.10b) 10

with c D and c N as suggested in (2.9). Even if the assumptin n interest being paid in advance clearly is ften nt satisfied in practice, fr nt t large PD p the rati f misclassificatin csts will still be f a magnitude similar t the left-hand side f (2.10b). We therefre adpt (2.9) fr the chice f the cst parameters in ur applicatin f prpsitin 2.1. Observe that with cst parameters as defined by (2.9), by equatins (2.5c) and (2.7) the ptimal set A f grades r ratings fr classifying brrwers as defaulting can be rewritten as fllws: { } A 1 = 1 + ( (1 p)/p ) ( l N (S)/l D (S) ) > 1 1 + c D /c N { ld (S) = l N (S) > 1 p } c N (2.11a) p c D { } ld (S) = l N (S) > 1. (2.11b) Cmparisn f (2.11a) and (2.11b) shws that the same ptimal set A fr classifying brrwers as defaulting is fund, n matter whether we start frm a general uncnditinal PD p and cst parameters given by (2.9) r frm p = 50% and equal cst parameters c D = c N. Remark 2.2 Assume that the variable S frm prpsitin 2.1 stands fr a ne-dimensinal scre r rating. If the likelihd rati l D(s) l N (s) is decreasing5 in s (r, equivalently, the PD cnditinal n the scre P[D S = s] is increasing in s) then the set A frm (2.7) indicating the classificatin f brrwers as defaulting can be written as A = {S s }, { } s l D (s) = sup s : l N (s) > 1. (2.12) In this case the minimum classificatin cst C accrding t (2.6) with cst parameters defined by (2.9) is equivalent t the Klmgrv-Smirnv statistic applied t the distributin functins f the scre r rating cnditinal n default and survival respectively: C = 1 max s P[S s D] P[S s N]. (2.13) This is an interesting bservatin since it justifies, in ecnmic terms, the use f the Klmgrv- Smirnv statistic and f the clsely related accuracy rati fr the perfrmance measurement f rating systems. With (2.11b) we have identified the decisin rule that shuld be applied t determine the twstate carse rating system that was discussed in sectin 2.2. This rating system is an verlay t the statistical rating mdel whse utput is the variable S we have cnsidered in prpsitin 2.1 and its cnclusins. The misclassificatin rate f the verlaying rating system with the tw super-grades risky and safe gives us a reasnable bund fr the verride rate f the statistical rating mdel S. 5 See Tasche (2008) fr a discussin f the cnnectin between mntnicity f the likelihd rati and ptimality f the mdel scres. 11

Prpsitin 2.3 Define the natural errr rate ɛ(s) f a statistical rating mdel with utput S as the misclassificatin rate assciated with the decisin rule (2.11b) fr classifying brrwers as defaulting. Let p dente the uncnditinal prbability f default. Then ɛ(s) is given by ɛ(s) = p P[l D (S) l N (S) D] + (1 p) P[l D (S) > l N (S) N]. (2.14a) Nte that the natural errr rate as defined by (2.14a) is an actual errr rate in the terminlgy f Hand (1997, sectin 7.2). If the utput f the statistical rating mdel is a rating prpsal r a discrete scre with values in {1,..., k} then the likelihd functins in (2.14a) are prbability mass functins. Let J = { j : P[S = j D] > P[S = j N] }. (2.14b) Then (2.14a) can be written as ɛ(s) = p j / J P[S = j D] + (1 p) j J P[S = j N]. (2.14c) If the utput f the statistical rating mdel is a cntinuus scre with values in an pen interval I then the likelihd functins in (2.14a) are prbability density functins. (2.14a) can then be represented as ɛ(s) = p l D (s) ds + (1 p) l N (s) ds. (2.14d) {s I:l D (s) l N (s)} {s I:l D (s)>l N (s)} As mentined befre, the purpse f this paper is t demnstrate that the natural errr rate f a statistical rating mdel as defined frmally in prpsitin 2.3 is a suitable bund fr the rate f verrides that shuld be applied t the mdel s rating prpsals. In sectin 3, we shw that the natural errr rate has the intuitive prperty that its value increases when the discriminatry pwer f the statistical rating mdels declines. This bservatin strengthens the case we have made in sectin 2.2 fr the adptin f the natural errr rate as a bund fr the rating verride rate. It is wrthwhile nting, hwever, that a regruping f brrwers accrding t the decisin rule (2.11b) underlying the carse apprximate rating system wuld imply mving a much larger prtin f slvent brrwers frm the risky grade t the safe grade than f defaulting brrwers frm safe t risky. This bservatin might seem cunterintuitive at first glance. Nnetheless, in practice the mvement frm risky t safe wuld be quite restrictived as nly brrwers that have been very clearly misclassified shuld be mved. This cnsideratin indicates that the natural errr rate indeed is an upper bund fr the rating verride rate since mst f the misclassified brrwers cannt be identified with reasnable certainty. 3 Override rate and discriminatry pwer Define the cnditinal distributin functins F D and F N f the scre r rating variable S described in sectin 2.3 by F D (s) = P[S s D] and F N (s) = P[S s N], (3.1) 12

and dente by S D and S N independent randm variables that are distributed accrding t F D and F N respectively. Fr quantifying discriminatry pwer, we apply the ntin f accuracy rati (AR) as specified in Tasche (2009, eq. (3.28b)): AR = 2 P[S D < S N ] + P[S D = S N ] 1 = P[S D < S N ] P[S D > S N ] = P[S < s D] df N (s) P[S < s N] df D (s), (3.2) See Hand and Till (2001, sectin 2) fr a discussin f why this definitin f accuracy rati (r the related definitin f the area under the ROC curve) is mre expedient than the als cmmn definitin in gemetric terms. Definitin (3.2) f AR takes an ex pst perspective by assuming the bligrs states D r N ne year after having been scred are knwn and hence can be used fr estimating the cnditinal (n default and survival respectively) scre distributins F D and F N. 3.1 The binrmal case The ex-pst perspective n discriminatry pwer is apprpriate when we cnsider the imprtant binrmal special case where bth cnditinal scre distributins are nrmal distributins with different mean values but equal variances. This special case, in particular, mtivates the chice f the inverse lgit functin fr mdelling PD curves (see, e.g., Cramer, 2003, sectin 6.1) since in practice it is ften reasnable t assume that the tw cnditinal scre distributins are apprximately nrmal. Prpsitin 3.1 Assume that the cnditinal distributins f the scre variable S are given by S D N (µ D, σ) and S N N (µ N, σ) with µ D µ N. Let p dente the uncnditinal prbability f default as given by (2.3). Then the accuracy rati AR assciated with S accrding t (3.2) can be calculated as 6 ( ) µn µ D AR = 2 Φ 1. (3.3a) 2 σ The natural errr rate ɛ(s) f S accrding t prpsitin 2.3 is given by ( ) µd µ N ɛ(s) = Φ. (3.3b) 2 σ With the cst parameters defined by (2.9), the average cnditinal PDs n the set f scres indicating default and n the set f scres indicating survival respectively (as in eq. (2.8)) are given by E [ P[D S] S A ] = E [ P[D S] S / A ] = p (1 ɛ(s)) p (1 ɛ(s)) + (1 p) ɛ(s), p ɛ(s) p ɛ(s) + (1 p) (1 ɛ(s)). (3.3c) 6 Φ dentes the standard nrmal distributin functin. 13

Prf. (3.3a) fllws frm Tasche (2009, eq. (3.14) and prpsitin 3.15). Fr (3.3b) and (3.3c), bserve that in the binrmal case with equal variances we have S / A l D (S) l N (S) S µ D + µ N 2 Frm this bservatin, we deduce that ( ) µn µ D P[l D (S) l N (S) D] = 1 Φ 2 σ. and ( ) µd µ N P[l D (S) > l N (S) N] = Φ. 2 σ These equatins imply (3.3b). With regard t (3.3c), we apply the definitin f cnditinal expectatin t arrive at the fllwing cmputatin 7 : E [ P[D S] S A ] = P[ ] P[D S] 1 {S A } P[S A ] = P[D {S < µ D+µ N 2 }] P[S < µ D+µ N 2 ] P[S < µ D+µ N 2 D] P[D] = P[S < µ D+µ N 2 D] P[D] + P[S > µ D+µ N 2 N] P[N] p Φ ( µ N µ D ) 2 σ = p Φ ( µ N µ D ) ( 2 σ + (1 p) Φ µd µ N ). 2 σ Since Φ ( µ N µ D ) 2 σ = 1 ɛ(s) we btain the first equatin in (3.3c). The secnd equatin fllws in the same way. In prpsitin 3.1, it is a general prperty f the accuracy rati that it des nt depend n the uncnditinal PD. The bservatin that the natural errr rate des nt depend n the uncnditinal PD might be surprising at first glance. This bservatin, hwever, is primarily a cnsequence f the fact that bth cnditinal scre distributins are members f the same lcatin-scale distributin family, different nly by the different lcatin parameters. In additin, symmetry f the lcatin zer member f the family is required t imply the prperty that the errr des nt depend n the uncnditinal PD. Cmbining (3.3a) and (3.3b) prvides a simple relatinship between the discriminatry pwer as measured by the accuracy rati and the natural errr rate in the binrmal case with equal variances. Eq. (3.4) shws, in particular, that the natural rating errr indeed decreases when the discriminatry pwer f the statistical rating mdel increases. 7 We define the indicatr functin 1 M f a set M by 1 M (m) = { 1, m M, 0, m / M. 14

Figure 2: Natural errr rate as functin f the discriminatry pwer (accuracy rati) fr the binrmal case described in crllary 3.2 and fr the discrete rating mdel described in example 3.3. Natural errr rate 0.0 0.1 0.2 0.3 0.4 0.5 Binrmal Discrete, PD= 0.01 Discrete, PD= 0.1 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy rati Crllary 3.2 Assume that the cnditinal distributins f the scre variable S are given by S D N (µ D, σ) and S N N (µ N, σ) with µ D µ N. Then the accuracy rati AR and the natural errr rate ɛ(s) assciated with S are related by the fllwing equatin: ( ɛ(s) = Φ 1 Φ 1( AR + 1 ) ). (3.4) 2 2 See figure 2 fr a graphical representatin f relatin (3.4). 3.2 The discrete case Fr practical applicatins, ften ne cannt assume that the cnditinal scre distributins are nrmally distributed. Mrever, as described in sectin 2.1, it is actually nt the scre utput f a statistical rating mdel that may be verridden but the prpsed ratings derived frm the scres. Typically, threshlds fr verride rates fr a statistical rating mdel are established when the develpment f the mdel is being finished and the dcumentatin f the related rating system is cmpiled. At this stage, it is unlikely that reliable estimates f the scre distributin 15

cnditinal n the brrwer s slvency state r f the prpsed rating distributins cnditinal n the brrwer s slvency state are available. Therefre, we als cnsider the case where the uncnditinal distributin f the prpsed ratings (rating prfile) and the assciated PD curve (i.e. the estimates f the PDs cnditinal n the prpsed ratings) are all that is knwn. Nnetheless, with this infrmatin it is still pssible t cmpute bth the natural errr rate and the accuracy rati fr the statistical rating mdel in questin. Key t these cmputatins is the fllwing bservatin that eq. (2.5a) can be inverted such that the distributins f the prpsed ratings cnditinal n the brrwer s slvency state can be derived frm the uncnditinal rating prfile and the PD curve: p = E [ P[D S] ] = k P[D S = s] P[S = s], s=1 P[S = s D] = P[D S = s] P[S = s]/p, s = 1,..., k, P[S = s N] = ( 1 P[D S = s] ) P[S = s]/(1 p), s = 1,..., k. By (3.5) and (2.14c), the natural errr rate can be calculated frm the prfile s P[S = s] f the prpsed ratings and the PD curve s P[D S = s]. By cmbining (3.2) and (3.5), als the accuracy rati f the ratings prpsed by the statistical mdel can be determined frm the rating prfile and the PD curve: AR = 1 ( 2 p (1 p) + k ( ) s 1 1 P[D S = s] P[S = s] P[D S = t] P[S = t] s=1 t=1 k P[D S = s] ( 1 P[D S = s] ) P[S = s] 2) 1. s=1 Calculatin f the accuracy rati by means f the cmbinatin f (3.5) and (3.6) may be interpreted as predicting the accuracy rati ex ante since it can be dne n the current prtfli as sn as estimates f the PDs cnditinal n the rating grades (PD curve) are available. There is n bvius general example f a discrete rating mdel cmparable t the binrmal mdel frm crllary 3.2 that wuld allw the direct study f the cnnectin between discriminatry pwer and natural errr rate. We therefre explre this cnnectin in the discrete case by means f a specific example. In rder t make the example differ significantly frm the nrmal assumptin discussed befre it has been chsen in a way as t create a certain degree f unsymmetry and verdispersin. Example 3.3 The uncnditinal distributin f the ratings n a discrete scale frm 1 t 17 is given by a crrelated binmial distributin. That is if S stands fr a brrwer s rating grade, S can be written as S = X + 1 where the distributin f X is specified by 8 P[X x] = ϕ(y) x ( ki ) G(λ, ϱ, y) i (1 G(λ, ϱ, y)) k i d y, x = 0,... k, i=0 ( Φ 1 (λ) ϱ y G(λ, ϱ, y) = Φ ). 1 ϱ (3.5) (3.6) (3.7a) 8 ϕ dentes the standard nrmal density functin: ϕ(t) = e t2 /2 2 π. 16

Figure 3: PD curves fr the rating mdel described in example 3.3. The curves have been inferred frm the uncnditinal rating distributin shwn in figure 1 under the assumptin that the uncnditinal PD is 5% and the accuracy rati assciated with the rating mdel is either lw at 25% r high at 75%. Cnditinal PD 0.0 0.2 0.4 0.6 0.8 x x x x x x AR = 0.25 AR = 0.75 x x x x x x x x x x x x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Grade Fr the purpse f this example we chse the fllwing values fr the parameters k (number f grades minus ne), λ (determines mean f the distributin) and ϱ (drives verdispersin cmpared t binmial distributin) in (3.7a): k = 16, λ = 0.55, ϱ = 0.1. (3.7b) We assume that the PD cnditinal n a rating grade S = s is apprpriately discribed by the inverse lgit functin: P[D S = s] = 1, s = 1,..., 17. (3.8) 1 + ea+b s The parameters a and b are determined by quasi-mment matching (Tasche, 2009, sectin 5.2). This apprach wrks by equating the right-hand side f the first equatin f (3.5) and the righthand side f (3.6) t predefined values f P D and AR respectively and slving numerically fr 17

Table 2: Natural errr rate as functin f the discriminatry pwer (accuracy rati) fr the binrmal case described in crllary 3.2 and fr the discrete rating mdel described in example 3.3. See figure 2 fr a graphical representatin. AR Natural errr rate Binrmal Discrete, PD=1% Discrete, PD=10% 0.0 0.500 0.452 0.461 0.1 0.465 0.451 0.448 0.2 0.429 0.450 0.435 0.3 0.393 0.448 0.422 0.4 0.355 0.320 0.306 0.5 0.317 0.319 0.291 0.6 0.276 0.317 0.276 0.7 0.232 0.206 0.260 0.8 0.182 0.204 0.154 0.9 0.122 0.118 0.131 a and b: 17 P D = P[S = s] P[D S = s] = AR = s=1 1 ( 2 P D (1 P D) 17 s=1 17 s=1 P[X = s 1] 1 + e a+b s, 17 e a+b s + ( ) 1 + e a+b s 2 P[X = s 1]2) 1. s=1 e a+b s s 1 1 + e a+b s P[X = s 1] P[X = t 1] 1 + e a+b t (3.9) Figure 3 illustrates the results f quasi-mment matching accrding t (3.9). In particular, it becmes clear that the slpe f the PD curve is primarily cntrlled by the discriminatry pwer f the rating mdel as expressed by its accuracy rati. The tw curves frm figure 3 have been used tgether with the uncnditinal rating distributin as specified by (3.7a) t calculate (by means f (3.5)) the cnditinal rating distributins that are shwn in figure 1. Once the cnditinal rating distributins have been determined, the natural errr rates (and hence bunds fr the verrides) may be cmputed by means f equatins (2.14b) and (2.14c). t=1 3.3 Cmparing the binrmal and the discrete results The same apprach that was used fr the calculatin f the cnditinal rating distributins in figure 1 has als been applied in rder t calculate the discrete errr rate curves fr figure 2, albeit with different parameters. Fr figure 2, the calculatins were dne fr a small uncnditinal PD (1%) and a large uncnditinal PD (10%). Each f the tw PDs was cmbined with the whle range [0, 1) f ptential ARs t first determine cnditinal PD curves by means f (3.9) and 18

Table 3: Average safe and risky PDs as functin f the discriminatry pwer (AR) fr the binrmal case (3.3c) and example 3.3. The assumed uncnditinal PD is 1%. AR (%) Binrmal Discrete Safe PD (%) Risky PD (%) Safe PD (%) Risky PD (%) 0 1 1 1 1 10 0.87 1.15 0.87 1.16 20 0.75 1.33 0.74 1.32 30 0.65 1.54 0.61 1.47 40 0.55 1.8 0.57 1.9 50 0.47 2.13 0.45 2.15 60 0.38 2.58 0.34 2.39 70 0.3 3.24 0.32 3.54 80 0.22 4.33 0.19 4.02 90 0.14 6.75 0.13 7.06 then natural errr rates by means f (2.14b) and (2.14c). The discntinuities in the discrete curves are wed t changes in the index set J frm (2.14b) which cause jumps in the values f the errr rates. Despite the jumps, the tw discrete curves are remarkably clse t the binrmal curve frm crllary 3.2. This bservatin is cnfirmed by table 2 that prvides the numerical values fr sme f the pints n the three curves in figure 2. It appears therefre wrthwhile t cnsider that the nrmal values calculated with (3.4) are taken t indicate the link between discriminatry pwer and verride even fr nn-nrmal cases. In sectin 2.2, the apprach t verride rate bunds via the misclassificatin rate f a carse twstate rating system was mtivated by an inspectin f the investment and speculative grades cncept used by the majr rating agencies. By table 1, we had nted that investment grade may be interpreted as safe and speculative grade may be regarded as risky. Table 3 presents risky and safe grade PDs in the sense f equatin (2.8) fr the nrmal and discrete examples cnsidered in this sectin. Nte the similarity between the average PDs frm the 80% accuracy rati rw f table 3 and the bserved investment and speculative grade default rates frm table 1. Mdy s reprt an verall average annual default rate f 1.8% (Mdy s, 2011, Exhibit 35) and an average ne-year accuracy rati f c. 85% (Mdy s, 2011, Exhibit 15). S&P reprt an verall average annual default rate f 1.6% (S&P, 2011, Table 24) and an average ne-year accuracy rati f c. 84% (S&P, 2011, Table 2). Hence the bserved similarity between average PDs n the risky and safe super-grades and the investment and speculative grade default rates recrded by the agencies might nt be an incident. Rather ne might guess that there is an expected cst cncept like the ne presented in prpsitin 2.1 behind the agencies investment versus speculative classificatin. 19

4 Mnitring rating verrides In rder t put the cnsideratins frm the previus sectins int cntext, in this sectin we describe an illustrative framewrk fr the mnitring f rating verrides. In particular, we make suggestins fr the rle that the natural errr rate defined in prpsitin 2.3 culd play as a bund (limit) fr the verride rate in such a framewrk. Let us start by recalling the bservatins n statistical rating mdels and rating verrides that have lead us t suggesting the natural errr rate as pssible limit fr rating verride rates. N statistical rating mdel can predict credit defaults with certainty. In particular if the financial cnsequences f wrng rating decisins can be serius, therefre, rating prpsals by mdels shuld be scrutinised by credit experts and be verridden if necessary. Hwever, als the credit experts can be wrng with their decisins. Hence, there is a need t mnitr the perfrmance f the verrides, t. On a single case basis, it is nt pssible t decide whether a rating decisin was right r wrng. Even an verride f a CCC rating prpsal t a AAA rating, fllwed immediately by the brrwer s default, culd in principle be just an ccurence f bad luck. Nnetheless, n the basis f the expected cst f misclassificatin, it is pssible t identify sets f safe and risky rating grades, thus verlaying a carse rating system nt the riginal system with a finer rating scale. The safe and risky super-grades represent a will survive and a will default predictin respectively. The expected errr rate f the carse rating system is called natural errr rate. If the applicatin f verrides is restricted t fundamental errrs (i.e. safe brrwers are classified as risky r risk brrwers are classified as safe) the natural errr rate is als a natural verride rate. On prtflis with sufficient numbers f default bservatins, the natural errr rate can be directly estimated (ex pst). On prtflis with lw numbers f default bservatins, the natural errr rate can still be inferred frm the rating prfile and the PD curve f the rating mdel (ex ante). An bserved verride rate higher than the natural errr rate culd be a cnsequence f unnecessary and ptentially errneus verrides r f an verestimatin f the discriminatry pwer f the rating mdel. In either case, the rating prcess wuld have been fund t perate sub-ptimally. An bserved verride rate lwer than the natural errr rate culd be a cnsequence f verlked rating errrs r f an underestimatin f the discriminatry pwer f the rating mdel. N clear immediate cnclusin n the well- r malfunctin f the rating prcess wuld be pssible. Based n these bservatins and with a view n affirming crrect calibratin f the rating mdel, an verriding mnitring framewrk culd include the fllwing elements: At the beginning f the bservatin perid (typically ne year) Infer the natural errr rate frm the rating prfile and the PD curve, by making use f 20

(2.14c) and (3.5), r frm the accuracy rati, by (3.4). Discurage credit experts frm minr rating verride. Only fundamental rating errrs shuld be amended by verrides. At the end f the bservatin perid If there is a sufficient number f default bservatins, the discriminatry pwer f the rating system pre and pst verrides shuld be cmputed. If the discriminatry pwer pst verrides is significantly lwer than the pwer pre verrides an update f the verride gvernance might be necessary. If the bserved verride rate exceeds the natural errr rate, the cnsistency f the PD curve with the bserved discriminatry pwer shuld be checked (if there is a sufficient number f defaults) by cmparing it t the ex ante accuracy rati (3.6) because the slpe f the PD curve might be t large. If the number f bserved defaults is nt sufficient fr the ex pst estimatin f discriminatry pwer it shuld be cnfirmed that the verrides were justified (e.g. by interviews with the credit experts). If this is cnfirmed the PD curve must be amended as t reflect the lwer than expected discriminatry pwer. If the bserved verride rate is lwer than the natural errr rate it shuld be cnfirmed (e.g. by internal audit) that the due prcedure is fllwed with regard t verrides. If this is the case amendment f the PD curve as t reflect the higher than expected discriminatry pwer culd be cnsidered. An imbalance f upward and dwnward rating verrides with mre upward verrides might be an indicatin f PD verestimatin. If further analyis finds (e.g. based n statistical tests pre and pst verrides if there is a sufficient number f default bservatins) that there is n verestimatin an update f the verride gvernance might be necessary. An imbalance f upward and dwnward rating verrides with mre dwnward verrides might be an indicatin f PD underestimatin. If further analyis finds that there is n underestimatin an update f the verride gvernance might be necessary. 5 Cnclusins The first part f this paper presents a suggestin f hw t determine bunds fr the rates f verrides f rating prpsals determined by statistical rating mdels. In practice, ften the guidance fr such verrides tends t restrict them t essential cases where the mdel s assessments f brrwers as risky r safe might nt be crrect. Mtivated by this bservatin, the suggested bund fr the verride rate is the misclassificatin rate f a carse rating system with the tw grades risky and safe. This carse rating system is fed by the utput f the rating mdel in questin and cmbines its rating grades t tw super-grades, thereby minimising the expected cst f misclassificatin. The rate f misclassificatins with the carse rating system is called natural errr rate. It is argued that the natural errr rate is an apprpriate bund fr the verride rate f the statistical rating mdel in questin. In the secnd part f the paper, methds fr determining the natural errr rate and its use in 21