Kamakura Risk Information Services default probabilities are designed to send clients “true signal,” with no caps, floors, or smoothing of the default rates that come from Kamakura’s reduced form, Merton and hybrid credit models. Clients can impose caps, floors or a smoothing technique themselves, they tell us, and we shouldn’t hide the information in the models with such artificial constraints on default probabilities. Because of that, we’re often asked why other analysts impose a cap on the maximum default rate in their credit models. This post discusses the problems that a cap creates and discusses the other motivations for a cap on default rates.
Given the current credit crisis, accuracy in default probability analysis for all types of counterparties is critical. Bankers, investors, and regulators are doing all they can to improve the accuracy of their legacy credit assessment techniques to make them more relevant to the current crisis. Analysts are re-examining the “conventional wisdom” of legacy credit assessment techniques that haven’t been as useful as hoped in the current crisis. One aspect of credit risk modeling that is getting intense scrutiny is overriding the default probabilities produced by a credit model with a cap, or maximum, at say, 20%. Why, we are asked, would an analyst do this?
In Credit Risk Models and the Basel Accords (John Wiley & Sons, 2003), Kenji Imai and I outline how the receiver operating characteristics (ROC) accuracy ratio is determined. The ROC accuracy ratio is one of the major statistics used to rank credit models by accuracy, as required by the Basel II accords from the Basel Committee on Banking Supervision. Hosmer and Lemeshow (Applied Logistic Regression, 2000) describe the ROC accuracy ratio as follows:
1. From the entire history of your default data base, form all possible pairs of companies such that each pair contains one defaulting observation (say Enron, December 2001) and one non-defaulting observation (e.g. IBM, January 1996).
2. Compare the default probabilities of the pair, and award points as follows: 1 point if the defaulter is correctly ranked as more risky, 1/2 point for a tie, and 0 points if the defaulter is incorrectly ranked as less risky
3. Add up all of the points for all pairs and divide by the number of pairs. This is the ROC accuracy ratio.
Now, let’s look at the implication of the cap at 20%. What if company A, a non-defaulter, had a default probability of 20% and company B, a defaulter, had a default probability of 99%. For purposes of the ROC accuracy calculation, these raw default probabilities correctly rank the defaulter as the more risky firm in this pair. What if we now impose a cap of 20%? Both firms then have the same default probability and the pair earns 1/2 point instead of 1 point, lowering the measured accuracy of the model. Why would anyone intentionally destroy the measured accuracy of a model with such a cap?
From a modeling accuracy point of view, everyone’s primary concern, the destruction of accuracy by the imposition of the cap is an unforgivable sin and there’s no excuse for it. The modeler should be sentenced to spend the rest of his career at McDonald’s.
Why does it happen? After thinking about this question for years, my colleagues and I finally discovered a rational commercial reason for doing so because of a number of (as always) insightful suggestions by Eric Falkenstein, which we first came across in his write up (with Andrew Boral) of the Moody’s Private Firm Model before both authors left the firm. One of the many tests of a credit model that Eric suggested was to put all of the observations in a historical data base of default probabilities in order from lowest to highest. Having done this, Eric suggested dividing the ordered data base into a 100 groups that each represent a different percentile rank of the default probabilities from lowest (0 percentile) to highest (99th percentile). Eric then advised analysts to plot the percentage distribution of actual defaulting observations across these 100 percentiles.
With colleagues Li Li and Xiaoming Wang, I did exactly this for the KRIS reduced form and Merton models in a recent paper (“Another Look at Advanced Credit Model Performance Testing to Meet Basel Requirements: How Things Have Changed,” The Basel Handbook, Second Edition, RISK Publications, Michael Ong, editor, 2007). What the analysis showed was that the reduced form models were very, very successful in capturing the defaulting observations in the highest percentiles of the default probability data base. 42% of the defaulters were found in the 99th percentile, another 14% in the 98th percentile, and another 8% in the 97th percentile. The Merton model, by contrast, was much less successful, with only 2% of the defaulters captured in the 99th percentile and a peak of about 6% of defaulters in the 95th percentile. A simple visual inspection of the results made it very obvious that the reduced form models were much more successful in isolating the defaulters and assigning them high default probabilities.
What would happen if one imposed a cap on both models so that the top 15% of the sample all had the same default probability? It becomes impossible to distinguish between the two models, because almost any credit model can get most of the defaulting observations in the top 15% of default probabilities. Imposing the cap leads you to spread the actual defaults equally over those top 15 percentiles, making the Merton model look relatively better and the reduced form models look relatively worse. Again, from an accuracy point of view and a model selection point of view, this is very unfortunate. The only reason to impose such a cap is to obscure the ability of a model user to measure its accuracy.
How should one react when presented with a model that has caps, floors, or smoothing? One must assume that this has been done because the model otherwise would appear unreasonable or unrealistic. There is no other reason to override a model’s default probabilities. For testing purposes, only the original default probabilities BEFORE the caps, floors and smoothing are suitable for model performance testing. Such tests, like those Li, Wang and I discuss in the paper above, should be extremely revealing.
Donald R. van Deventer
Honolulu, April 22, 2009