Detective Lord Peter Wimsey said in 1927, “I have a trivial mind. Detail delights me.” (Dorothy L, Sayers, Unnatural Death, HarperCollins Publishers, page 5). This is the right attitude for an analyst seeking either to build a credit model or to validate someone else’s. How much data is needed for this exercise? This is a frequently debated question, but the answers are very simple. First, no one in the world has ever had more data than they needed for a default model–everyone wants more data than they have. Second, simple rules of thumb about how much data is necessary are just that: simple. The real world is more complex than those rules allow. This post summarizes some of the lessons learned in 15 years of credit risk modeling at Kamakura Corporation on counterparties that span the full spectrum, from retail borrowers to small businesses, public firms, and sovereigns.
The amount of data needed to build a successful credit model can be determined both by the need for statistical significance in the results and the quality of the statistical significance that is achieved. The latter attribute of the model is, by necessity, a qualitative one that involves an assessment of the integrity of the economic insights of the model and potential pitfalls embedded in the data set that may make it less than perfect in assessing the probability of default looking forward. In this post, we focus on the impact of data quantity on the quality of statistical significance that a given data set can produce. Specifically, we’re trying to answer questions like this:
1. Working from a Mumbai or Beijing perspective, do I have a data set that is good enough to build a high quality default model for public firms in India or China?
2. Working from a North American perspective, should I build a separate model for financial institutions or analyze them as part of an “all public firm model”?
3. Should I be working with individual borrower level default data or with aggregated default data?
4. Should I be working with a monthly data base, a quarterly data base, a semi-annual data base, or an annual data base?
We illustrate the answers to these questions with a series of lessons or stories that have come out of practical experience. For complete details of the model building and testing process, clients of Kamakura Risk Information Services can refer to the Kamakura Technical Guides for mortgage defaults, public firm defaults and sovereign defaults. Kamakura Risk Information Services Technical Guide Version 4.1, authored by Robert A. Jarrow, Li Li, Mark Mesler, and Donald R. van Deventer gives a complete overview of public firm model coefficients, explanatory variables, and in-sample and out of sample test rests. In this post, we focus on the “big picture” from 30,000 feet, because many modeling and data pitfalls are mistakes of logic, not statistical technology.
We start by answering questions 3 and 4, not just because they are the easiest to answer, but because they can often be the most important credit model data decisions that a modeler makes.
Lesson 1: A Monthly Data Base is Always More Accurate Basis for Default Modeling Than Quarterly or Annual Data
Given that most of the public firm models early in this century were constructed using an annual periodicity in the data, this conclusion will surprise some readers, but it’s an obvious and powerful conclusion. Whether it’s retail mortgage default modeling or public firm modeling, the current crisis has most of the world fixated on U.S. home price levels. If one were using an annual data base, what value would you enter for the 1 year change in home prices in the model? The change from December 31 to the next December 31? But what if home prices were up 15% to June 30 and down 15% in the following six months? Doesn’t that contain useful information over and above the one year change from 12/31 to 12/31? This is particularly true when the absolute value of the macro factor (say the yen-dollar exchange rate) is an input–with an annual model, the decision of whether to input the macro factor’s average level, starting level or ending level can have a dramatic impact on the answer. With monthly data, the impact of this issue diminishes considerably and makes it much easier to measure the impact of macro factors on default, as the U.S. Treasury has recently required of the 19 largest banks in the United States. What if I am building a small business model and I only have financial statements once a year? The answer is that you rule the data, it doesn’t rule you. One would input the most recently available financial statements, often with a dummy variable indicating how old they are or whether they are later than usual. Professor John Y. Campbell of Harvard, for example, has found that lateness in the delivery of financial statements is a statistically significant predictor of default. Another issue is seasonality, which is enormously important for companies in retailing, agriculture, tourism, and so on. For this reason, a monthly time interval should always be one’s preferred solution.
Lesson 2: Individual Borrower-Level Models are Always More Accurate Than Models Based on Aggregated Default Rates
As part of Kamakura’s KRIS default probability service, we offer mortgage default models that are based on aggregate default rates of national data on prime mortgages and subprime mortgages, both fixed rate and adjustable rate. We emphasize to clients, however, that individual borrower default data is always a superior modeling choice and that the aggregated data models should be used only when time or budget constraints make individual borrower-related modeling impossible. Why is this conclusion such an obvious one? Consider the national U.S. mortgage data. One can’t easily ascertain the average age of the mortgages in the national aggregate, but mortgage age is a very important driver of default. One can’t easily ascertain the original or current loan to value ratio on the mortgage loans for the aggregate, even though at an individual loan level they are critical drivers of default. Finally, one can’t determine the average consumer credit score of the national mortgage aggregates very easily, even though at the individual loan level it’s critical to understanding the probability of default. Similar examples would make the same points for small business credits and public firm modeling. Without Lord Peter Wimsey’s love of detail, one is doomed to a “second best” modeling effort.
Lesson 3: The Rolling Stones were Right: You Can’t Always Get What You Want
A beloved client once said to me, “Don, we’re Canadian and you’re not. We want a Canada only model, not a North American model for public firm defaults.” The right reply to that, of course, was “Yes, sir!” followed by a hearty but off-tune rendition of “Oh, Canada.” At first glance, the modeling looked promising. We had more than 300,000 observations on Canadian public firms, and we had hundreds of defaults, many multiples of what serious researchers like the Deutsche Bundesbank said were the minimum necessary for a successful modeling exercise. From a statistical perspective, everything looked fine. We had a high degree of accuracy, and we had excellent statistical significance on a mix of financial ratios, macro factors, and equity market inputs. But the model was nonsense. It implied that if stock prices on the Toronto Stock Exchange went up, more corporates would default in Canada. With the client, we drilled down into the data to determine why that nonsensical conclusion was indicated by the data. The nonsense was due to an accident of history. Just at the height of the high tech boom in 2001-2002, the Toronto Stock Exchange launched a new venture exchange, and a wide variety of small companies came into view as public firms and started to default in large numbers, even though the economy was booming. Given the amount of data we had, this group of small high tech firms was very influential in determining the coefficients of the model. The conclusion was simple–we needed to wait for this historical event to fade (not disappear) into history, so that we had a longer history on the high tech firms. Once this happens, we were confident that economic normality would be restored, and we’d indeed find that lower stock index prices meant a bad economy and more defaults. Until then, our clients have concluded that the North American model was more accurate.
Lesson 4: Too Good to Be True is Too Good to Be True
As a credit modeler, it’s not good to be from a lucky country. Consider these modeling environments:
a. Small businesses in China have never had to deal with a floating exchange rate, so even though they are big exporters, a China only model would show there is no foreign exchange market influence on small business defaults. This statistically valid conclusion is another example of a nonsensical model.
b. As of 1996, no Japanese bank had defaulted for more than 50 years. Use of this data set would lead to nonsensically low default predictions as well, given that 6 of the 21 largest banks in Japan had failed within a few years of this modeling data set’s construction.
c. In mid-2005, South Africa was enjoying a broadly based economic boom that stemmed in large part from the end of apartheid and the reintegration of South Africa into the world economy. The result was extraordinarily low default rates, and bankers knew that their own data would lead to nonsense. “It can’t go on like this, and we know it–we can’t base our default models on this data set,” one Johannesburg banker said to me.
In each of these examples, a very good modeler would have recognized the problem even before starting statistical estimation of a default model. It should have been obvious that grouping data from other regions would dramatically improve the quality of the default forecasts, even if one does go through with the “lucky data set” as a benchmark. In case b, for example, an international financial institutions model or international public firm model would have made it clear that a fall in the Nikkei Stock index from almost 39,000 at the end of 1989 to 7,000 some years later was going to cause a lot of bank defaults. The “lucky Japan” data set would not have contained this insight. This is a subtle but very important issue that a modeler has to be aware of.
Lesson 5: Listen to Cindi Lauper, Time After Time
Lots of time, modelers choose to use less than all of the data they have, and this can lead to disaster. Consider the following true story. I was introduced socially to the head of the mortgage business at a sophisticated bank that was one of the four largest in its country. He chatted about the problems he had running the business. “We have 300,000 mortgage loans, “he said, “and if I build a monthly default data base that’s 3.6 million observations a year and 36 million observations over a decade. Throw in 10 explanatory variables for each observation and that gives us 396 million numbers in the data base. That’s too much hassle so we just use the last 12 months data and rebenchmark the model each year. It’s incredible, you know, how much our consumer default behavior changes every year.” My friend was buying the drinks, so I didn’t have the heart to tell him that it was impossible for consumer behavior to be changing–the bank had 25% market share in their country. By wasting 90% of the data and using only the most recent one year’s data, the bank lost the ability to perceive the impact of changes in the economy on defaults. What he thought was a change in the nature of the borrowers was in fact a change in their financial circumstances as the business cycle rises and falls. “Time after time” is important, because only over a long period of time can one accurately perceive the impact of macro factors on individual loan level defaults.
Lesson 6: Brother Can You Spare a Dime?
My colleagues and I were called into to review the in-house models of a $500 billion asset bank which had constructed a series of 19 industry-country models. There was a model, for example, for U.S. investment banks. The 19 models spanned the full range of the bank’s counterparties, and they produced both a rating and a default probability for a specific time interval. At first glance, we suspected that a million dollar data base (in terms of data quality) had been sliced into pieces that were only worth 10 cents, but how to prove it? We’ve spent a lot of time on “how good is good enough” in credit modeling, so we chose our favorite technique. We grabbed a single model (in our case our KRIS version 4.1 public firm models) and compared the accuracy of that single model with the aggregated ratings and default probabilities produced over time by the bank’s 19 models. What we were able to show them is that this concrete third party benchmark had an ROC accuracy ratio of 95% on their historical counterparties but the bank’s 19 models had an equivalent accuracy ratio of only 74%. By building 19 models instead of one, the bank had destroyed 19% of the potential accuracy of the data set. They took a million dollar data set and made it worth 10 cents. This is a difficult dilemma for those of us who, like Lord Wimsey, have trivial minds and a passion for detail. The cold hard truth, however, is that narrowing a data set to make it more specific by industry and/or geography can destroy accuracy. Since we modelers are paid “by the model,” analysts always have an incentive to build more models than they have now. Management and the modelers themselves, however, have the obligation to prove at each step of the modeling process that making the data set smaller is in fact making the model more accurate. Our experience has been that returns to the miniaturization of models turn negative very quickly.
We’d like to hear you modeling experiences. Please contact us at firstname.lastname@example.org with comments, suggestions and questions.
Donald R. van Deventer
Honolulu, April 14, 2009