Just today, bankers from Indiana to Johannesburg logged in with thoughts on the real problem with common practice default modeling for retail, business and sovereign credits. As Robert Jarrow and Sudheer Chava showed in the Journal of Banking and Finance in 2004, the logistic regression approach to default modeling is the “maximum likelihood estimator,” that is, the approach that produces default probabilities that are most likely to be consistent with the data. For recent state of the art studies of corporate default in a logistic regression/reduced form framework, we recommend Bharath and Shumway (Review of Financial Studies, May 2008) and Campbell et al (Journal of Finance, December 2008).
What these studies do enormously well is to estimate the probability of default over a time interval that has the periodicity of the data. In the Bharath and Shumway case, that data frequency was quarterly. In the Campbell case, that interval was monthly. We believe that the monthly interval is the interval that best captures the impact of macro-economic factors on the probability of default. So far, so good.
What went wrong in the current crisis? Lenders who were truly at the state of the art had a default model built recently that could predict the probability of default as of time zero over the next 1, 3, 6 or 12 month time period. Most of these models used powerful explanatory variables like the number of times 30, 60, and 90 days past due on that loan. There’s only one problem. This approach is woefully unhelpful in forecasting the month by month default probabilities on a new loan structure like option ARMS or Alt-A loans that are going to be originated next month. Without the ability to predict default on loans for which one has no history, one’s lending is based on faith, not science. The combination of both faith and science produces much better results.
Since at least 2002, another approach has become increasingly popular. This approach, documented in Kamakura Risk Information Services Technical Guides Version 2, 3, and 4.1 (February 2006), shows how a family of logistic regressions can be used with time zero inputs only to produce default probability estimates not only for the next 30 days, but for month N, conditional on surviving the first N-1 months. A nested family of these regressions out to sixty months is used to form the term structure of default probabilities for both corporate and sovereigns on the Kamakura Risk Information Services default probability service.
The accuracy of this approach is significantly better than a simpler approach where one predicts the CUMULATIVE probability of default by using, say, a one year default flag and data with one year periodicity. Behind that one year default flag are firms that have defaulted in 1 month, 2 months, 3 months, etc. out to 12 months. Unfortunately, the variables that predict default in one month are different and have different weightings than the variables that predict default in month 12, conditional on surviving the first 11 months. Mixing them together in one “data bucket” simply lowers accuracy. The decline in accuracy can be measured explicitly by modelers careful enough to do the modeling both ways, rather than engaging in a debate about which is better. That debate only ends when one does the work and sees the proof.
What was very important in the South African insights today was the observation that the variables that best predict default in month N, conditional on survival through months N-1, could be very different from the variables that best predict default in month 1. Consider 40 candidate variables X(1), X(2),…X(40). It may be that variables from X(1) to X(15) are statistically significant at the one month time frame, while variables X(5) to X(20) are statistically significant in predicting conditional default in month N.
Most modelers, if they are careful enough to model default on a multiperiod forward basis, tend to restrict the variables they use to those found to be statistically significant for the first month. My South African friend wisely pointed out that starting with month N and moving forward may well lead to the discovery of new variables of great importance in the long run, even though they don’t have statistical significance in the short run.
If care is taken using this multiperiod approach, there would have been no reason for New Century Financial Corporation to make Option ARMS on faith. If one is going to make billions of dollars of these loans, why is it that one doesn’t spend money in advance to figure out whether you get your principal back? Z-man said today, ”I believe it was Benjamin Graham who indicated that there are two rules of investing. The first rule is to get back your principal. The second rule is not to forget the first.” How true, how true. Whether it’s for measuring impairment or risk management purposes, anyone who makes a loan today without a high quality estimate of the probability of default over the life of the loan is likely to fail Benjamin Graham’s standard. There’s no excuse for that.
Thanks to Z-man and to everyone in SA for helpful comments!
Donald R. van Deventer
July 28, 2009