One of the more curious human traits that affects credit modeling is the tendency of the user of a default model to look only at how the model works on firms or individuals that have defaulted, ignoring how the model works on non-defaulters. This tendency can result in very serious errors in both model selection and in day to day business. This post explains why and gives some examples.
Several years ago, I was in Taipei at a credit seminar when an eager credit analyst approached me and posed this comment to me: “Just last week a vendor from Japan came to Taipei and told us that their new small business default model had successfully predicted 9,000 of the last 10,000 small business defaults in Japan. Can your model perform that well?” I’m not usually an evil person but I couldn’t resist making this offer to her. “Tell you what. I’d like to provide you with a model that correctly predicted all 10,000 of the last 10,000 small business defaults in Japan, and I am going to give it to you for free. Here is the model—the default probabilities for all firms at all points in time are 100%.” The analyst conceded then that there is more to measuring the performance of a credit model than its performance on the defaulting observations alone.
Nonetheless, the tendency on the part of most credit model users and potential subscribers to credit models is overwhelming—they want to look at default probability histories of defaulters and no one else. When someone is advocating the use of their model, they often show a graph that looks like this:
In the graph above, we’ve plotted default probabilities for “Model A” and “Model B” for Federal National Mortgage Association (“FNMA”) prior to its failure and conservatorship under the U.S. government on September 6, 2008. Although some argue that FNMA didn’t “fail,” the vast majority remember that ISDA declared the conservatorship an event of default on credit default swap contracts and that common shareholders and preferred stock holders were wiped out in spite of the government’s rescue of senior bond holders.
Back to the graph. A vendor or model builder advocating Model B would use a pitch like this: “Look how much better Model B was at predicting FNMA default than Model A. Model B was much higher at the point of failure and it shows much greater early warning than Model A.” For numerous examples of this pitch, one should visit the default probability section of a legacy rating agency’s web site. I’ve explained this pitch technique to thousands of people, and only once has someone interrupted me: “You don’t know enough to make that statement,” he said. And that comment is 100% correct. To prove it, I can reveal that Model A is the actual 1 year default probabilities for Kamakura Risk Information Services version 4.1 Jarrow-Chava reduced form model. This model is correctly benchmarked to have the best possible consistency between estimated default probably levels and the true incidence of actual defaults. What is Model B? I just took Model A and I multiplied it by 3. Model B, then, is NOT more accurate than Model A, it’s less accurate because it’s biased high by a factor of three. Model B provides no more early warning than Model A either. What are the consequences of using a biased Model B? There are many. Using Model B would over estimate credit losses. It would result in too much capital being allocated to the relevant business unit. It would result in perceived underperformance on this inflated capital level. Bonuses to the head of the business unit would be so low that his or her children won’t be able to afford university and will have to spend their entire careers working at McDonalds.
Why would a model builder or model vendor allow such a bias? That’s the moral hazard. Because it makes it easier to sell the model. If potential buyers of the model are only looking at the defaulters, “higher is better.” The vendor will never be “caught” at this game of “grade inflation” until the model has been purchased and implemented, and only then if the user is rigorously using the model test regimes outlined in these three articles:
- Jarrow, Robert A. and Donald R. van Deventer, “Practical Usage of Credit Risk Models in Loan Portfolio and Counterparty Exposure Management: An Update,” Credit Risk Models and Management, second edition, David Shimko, editor, Risk Publications, 2004.
- van Deventer, Donald R., Li Li and Xiaoming Wang, “Another Look at Advanced Credit Model Performance Testing to Meet Basel Requirements: How Things Have Changed,” The Basel Handbook: A Guide for Financial Practitioners, second edition, Michael K. Ong, editor, Risk Publications, 2006.
- van Deventer, Donald R. “Why Would An Analyst Cap Default Rates in a Credit Model?” Kamakura blog, www.kamakuraco.com, April 22, 2009.
This pattern of bias for commercial reasons was first pointed out in this piece by Eric Falkenstein and Andrew Boral before they left one of the legacy rating agencies:
- E. Falkenstein and A. Boral, 2000. “RiskCalcTM for Private Companies: Moody’s Default Model,” Moody’s Investors Service memorandum.
The “Falkenstein and Boral” test for detecting this bias is described in their 2000 article and implemented on the Kamakura Risk Information Services data base in van Deventer, Li, and Wang (2006).
Often, this concern about a bias in default probabilities is dismissed with a statement like this: “I don’t care about false positives as long as the model is giving me enough early warning on actual defaulters.” Like the young credit analyst in Taipei, those who make comments like this are oblivious to the harm caused by this commercially driven model bias. They are in denial that the problem is real.
Kamakura has often acted as advisor to large financial institutions who need help in meeting the Basel II credit model testing requirements. Our April 22, 2009 blog reports on one of those experiences, where the model testing regime we outlined detected a huge bias in a legacy default probability service. That legacy default service at the time imposed a cap of 20% on the default probabilities of the firms labeled “most risky.” We asked our client to calculate the actual annualized default rate on the universe of observations at that 20% cap, which was about 15% of all observations in the 15 year history available to our client. The bank found that the actual default rate was only 6-7%, not 20%, and senior management was outraged at the bias. They were outraged that the vendor had allowed this bias to persist for 15 years and they were outraged that the staff of the bank had not detected it years earlier. Finally, they were outraged at the years of incorrect business decisions that were made on the assumption that the legacy default probability service was not biased.
Later this week, we spend some time on the adverse impacts of “false positives” in credit modeling on day to day business. For now, like anything in finance, we strongly recommend a “buyer beware” approach to credit risk assessment tools and strongly urge institutions to conduct the tests outlined in van Deventer, Li and Wang (2006). If the vendor itself does not disclose such test results in a way that can be replicated, there is probably a commercial reason why they choose not do so. As the old saying goes, “Fool me once, shame on you, fool me twice, shame on me.”
Donald R. van Deventer
Kamakura Corporation
May 16, 2010