In the past week, I have spoken with many regulators and bankers on the proper role of intuition in the econometric estimation of credit models for the Federal Reserve’s Comprehensive Capital Analysis and Review 2015. In our review of best practices for stress testing , value at risk, and credit value at risk on October 20, 2014, there was no role for “intuition,” just for science. The same is true for our November 13, 2014 update of model validation procedures for CCAR 2015
Why? In quotes from Kathryn Schultz, Nobel Prize Winner Daniel Kahneman, and Professors King and Soneji below, we explain that the very DNA of human beings leads us to be overconfident in our own intellectual powers. Rather than relying on modern econometric methods, most humans would rather guess an answer and would normally be supremely confident in its accuracy.
In this note, we start by reviewing the modern literature on whether or not human beings can outperform modern econometric methods. Obviously, the thoughtful answer is no. In light of that, we explain the best practice procedures for deriving a link between the Federal Reserve’s CCAR macro factors and the 3 month default probabilities of Citigroup Inc. (C), a financial institution so big that the firm is a counterparty to the employers of most of the readers of this note. We rely on a careful explanation from Professor Robert Jarrow in implementing this best practice approach. We close by showing an example of variable selection by “intuition” for Citigroup as well. The result? A loss of more than 24 percentage points of explanatory power and the conclusion that only 2 variables have signs that are sufficiently “intuitive” to be included in a stress test model for Citigroup Inc.
Clearly, such a result is nonsense. A senior regulator told me earlier this week “I have been telling my colleagues for years that requiring the signs of correlated variables in econometric relationships to be ‘intuitive’ is bad practice. The same message has to be delivered over and over because many of our staff just don’t get it.” The persistence of “intuition” in model building happens because many of those who impose such requirements on the modeler have the power to end the modeler’s career, either because they are members of senior management or they are regulators with “yes” or “no” power of approval of the banks’ CCAR submissions. Requiring “intuitive signs” is voodoo econometrics, and this note explains why.
How Accurate is Intuition?
We start with a summary of what some prominent authors and scientists have found about the way human beings think. The president of Harvard University, Prof. Drew Faust, was interviewed in the New York Times on May 24, 2012. She was asked this question which sheds light on the question at hand:
New York Times: Is there any book you wish all incoming freshmen at Harvard would read?
Drew Faust: Kathryn Schulz’s “Being Wrong” advocates doubt as a skill and praises error as the foundation of wisdom. Her book would reinforce my encouragement of Harvard’s accomplished and successful freshmen to embrace risk and even failure.
The publisher’s description of Being Wrong: Adventures in the Margin of Error (2011) is consistent with the issues we deal with in this note:
In the tradition of The Wisdom of Crowds and Predictably Irrational comes Being Wrong, an illuminating exploration of what it means to be in error, and why homo sapiens tend to tacitly assume (or loudly insist) that they are right about most everything. Kathryn Schulz, editor of Grist magazine, argues that error is the fundamental human condition and should be celebrated as such. Guiding the reader through the history and psychology of error, from Socrates to Alan Greenspan, Being Wrong will change the way you perceive screw-ups, both of the mammoth and daily variety, forever.
Daniel Kahneman was the winner of theNobel Prize in Economic Science in 2002. In his book, Thinking Fast and Slow (2013), Prof. Kahneman explains that is the basic DNA of humans and the way our brains are wired that makes us impulsively offer an “intuitive” answer to a complex question, rather than delving into a more intensive analysis before answering. Here are some important quotes from his book:
- Questioning what we believe and want is difficult at the best of times, and especially difficult when we most need to do it, but we can benefit from the informed opinions of others. (Page 3)
- Systematic errors are known as biases, and they recur predictably in particular circumstances. When the handsome and confident speaker bounds onto the stage, for example, you can anticipate that the audience will judge his comments more favorably than he deserves. The availability of a diagnostic label for this bias-the halo effect-makes it easier to anticipate, recognize, and understand. (Pages 3-4)
- We are often confident even when we are wrong, and an objective observer is more likely to detect our errors than we are. (Page 4)
- As expected, we found that our expert colleagues, like us, greatly exaggerated the likelihood that the original result of an experiment would be successfully replicated even with a small sample. They also gave very poor advice to a fictitious graduate student about the number of observations she needed to collect. Even statisticians were not good intuitive statisticians. (Page 5)
- Our subjective judgments were biased; we were far too willing to believe research findings based on inadequate evidence and prone to collect too few observations in our own research. (Page 5)
- We documented systematic errors in the thinking of normal people, and we traced these errors to the design of the machinery of cognition rather than to the corruption of thought by emotion. (Page 8)
- By and large, though, the idea that our minds are susceptible to systematic errors is now generally accepted. (Page 10)
- The question the executive faced (should I invest in Ford stock?) was difficult, but the answer to an easier and related question (do I like Ford cars?) came readily to his mind and determined his choice. This is the essence of intuitive heuristics: when faced with a difficult question, we often answer an easier one, instead, usually without noticing the substitution. (Page 12)
- The difficulties of statistical thinking contribute to the main theme of Part 3, which describes a puzzling limitation of our mind: our excessive confidence in what we believe we know; and our apparent inability to acknowledge the full extent of our ignorance and the uncertainty of the world we live in. (Pages 13-14)
- We are prone to overestimate how much we understand about the world and to underestimate the role of change in events. (Page 14)
- Overconfidence is fed by the illusory certainty of hindsight. (Page 14)
- The underestimation of the impact of evidence has been observed repeatedly in problems of this type. It has been labeled “conservatism.” (Page 422)
Misconceptions of chance are not limited to naïve subjects. A study of the statistical intuition of experienced research psychologists revealed a lingering belief in what may be called the “law of small numbers,” according to which even small samples are highly representative of the samples from which they are drawn…As a consequence, the researchers put too much faith in the results of small samples and grossly overestimated the replicability of such results. (Page 422)
These considerations can have a significant impact on financial decisions and analysis. Professor Gary King (Harvard University) and Professor Samir Soneji (Dartmouth), summarize their conclusions in “Statistical Security for Social Security,” Demography, 2012. We highlight them here because they are directly relevant to any decision to rely on “expert” intuition or modern econometric methods:
“This is especially advantageous because informal forecasts may be intuitively appealing (Morera and Dawes 2006), but they suffer from humans’ well-known poor abilities to judge and weight information informally (Dawes et al. 1989). Indeed, a large literature covering diverse fields extending over 50 years has shown that formal statistical procedures regularly outperform informal intuition-based approaches of even the wisest and most well-trained experts (Grove 2005; Meehl 1954). (There are now even popular books on the subject, such as Ayres (2008).)”
In light of these observations, we now turn to the proper econometric procedures for fitting the kinds of models used in stress testing.
Proper Econometric Procedures and Stress Testing
Since 2002, the following explanation has been an important part of the documentation of Kamakura Corporation’s default probability models for public firms, non-public firms, sovereigns, U.S. banks, commercial real estate loans, small business loans, and retail loans. We include it here because it is the correlation among explanatory variables in a default model which causes the signs of some variables to be “non-intuitive.” Prof. Jarrow explains why this is normal and explains why retaining these variables is essential for maximum accuracy in the model.
Prof. Robert Jarrow’s Comments on the Impact of Correlated Input Variables on Default Estimation
“In order to achieve the highest levels of accuracy a model must incorporate a large number of explanatory variables. In such a setting it is possible that some of the input variables will be correlated; extreme correlation is often referred to as multicollinearity. Nevertheless, even if variables are highly correlated, it is very often still the case that each of them adds information and explanatory power.
“Multicollinearity is a well-studied issue in econometrics (see Johnson  and Maddala  for background). The term “multicollinearity” refers to the condition in a regression analysis when a set of independent variables is highly correlated among themselves. This condition only implies that it will be hard to distinguish the individual effect of any of these variables, but their joint inclusion is not a concern. This is not an econometrics “problem” with the regression analysis. Indeed, as long as the independent variables are not perfectly correlated (so that the X’X matrix is still invertible), the estimated coefficients are still BLUE (best linear unbiased estimates).
“The only concern with multicollinearity in a regression is that the standard errors of the independent variables in the set of correlated variables will be large, so that the independent variables may not appear to be significant, when, in fact, they are. This can lead to the incorrect exclusion of some variables based on t-tests. However, if the set of variables help the fit of the regression (have a significant F-statistic for their inclusion), they should be included.
“The idea is best explained by considering the simple regression y = a+bx+cz+e. Suppose x and z are highly correlated, but not perfectly correlated. Then, x and z span a 2-dimensional space. Both dimensions are important in explaining y. The inclusion of both x and z in the regression is important. Excluding one will only give a 1-dimensional space, and the explanatory power of the regression will be significantly less. The inclusion of x and z is necessary for the best model, for forecasting purposes. If x and z are highly correlated, the only issue is that both coefficients b and c are not accurately estimated. However, bx+cz, the linear combination’s influence on y, is not adversely influenced. In our context, the final default probabilities are simply unaffected by correlated explanatory variables (unlike the individual coefficients).
“For these reasons, Kamakura has not taken and does not need to take any special steps to deal with multicollinearity in default probability estimation. As described above, we take care not to exclude a variable that we believe has an economic rationale in explaining default if it appears to be statistically insignificant due solely to correlation with other input variables.”
For interested readers, we have incorporated a series of quotations from 11 leading econometrics books on this topic in the Appendix.
Best Practice: The Example of Citigroup Inc. Default Probabilities
We now employ both best practice econometrics and the “that’s not intuitive” variable selection procedures in a case study of the 3 month default probabilities of Citigroup Inc. (C). We select Citigroup Inc. as the case study for these reasons:
- The firm is a credit counterparty to almost all of the employers of the readers of this note.
- The firm played a prominent role in the recent credit crisis, so the models are fitted to data that is definitely real signal, not noise.
- The default probabilities are publicly available by subscription to Kamakura Risk Information Services.
We selected the annualized 3 month default probabilities to model because that maturity fits the periodicity of the Fed’s CCAR process, which projects 3 scenarios out for 13 quarters. We review the default probabilities in three distinct periods for ease of viewing. The first is the period from 1991 to 2007:
Default probabilities rose sharply in 2008 to 2009 in the heart of the credit crisis:
Default probabilities then fell over the 2010 to 2014 period.
For stress testing purposes, if Citigroup is our counterparty, we need a model that links the Federal Reserve’s 28 macro factors summarized in this chart to the historical evolution of Citigroup’s default probabilities:
To these 28 macro factors, we add the 1, 2, and 3 year percentage changes (expressed as a decimal) in the commercial real estate index, the home price index, and the unemployment rate. We call the latter 9 variables the “transformed macro factors.” We could also specify the initial financial ratios and stock returns of Citigroup as time zero candidate explanatory variables, but we omit those variables here to focus on the “intuition” regarding macro variables. Before modeling, we must specify how to transform the default probability. It is not appropriate to do a linear regression on the default probability level, because normally distributed error terms would occasionally lead to predictions of default probabilities that are below 0% or more than 100%. Similarly, it is just as incorrect to use the log of the default probabilities, because that still allows for predicted default probabilities of more than 100%.
Instead, we choose from one of two common cumulative probability distributions and assume that the default probability is either a normal or logistic function of our macro factors, which become linear arguments to the normal or logistic functions. We invert the probability distribution to get a transformed probability that we then model using a linear combination of the CCAR macro variables and the transformed macro factors. We can employ either generalized linear methods or linear regression.
We follow this procedure:
- We do a stepwise regression on the full set of macro factors and transformed macro factors to identify the initial “best fit.”
- Given the resulting explanatory variables, we re-estimate the model using all observations on those variables. Stepwise regression uses only those observations that are available for the shortest time series of the candidate explanatory variables. We take this step to avoid losing the data omitted in the stepwise process.
- We then confirm that the “best model” is robust to the order in which the candidate variables are considered.
The result is a model that fits the transformed 3 month Citigroup default probability over 96 quarters with an adjusted r-squared of 78.54% and a root mean squared error of 1.1163 on the transformed default probability. Ten of the candidate variables (28 CCAR variables and 9 transformations of them) are statistically significant at the 5% level.
We also ran individual regression equations on all 37 CCAR-related variables against the transformed 3 month default probability for Citigroup Inc. one at a time as naïve models. The graph below shows the single variable regression that fits the time series of default probabilities for Citigroup with the 5 year U.S. Treasury yield:
The link between the 5 year U.S. Treasury yield and the transformed 3 month Citigroup default probability is easier to see in this scatter diagram:
The graph shows that, on average, higher interest rates mean lower default probabilities for Citigroup. This is confirmed by the single variable regression results, shown here, where we predict the 3 month transformed Citigroup default probability as a function of the 5 year U.S. Treasury yield alone:
The coefficient of the U.S. Treasury yield is negative, which means default probabilities on average will fall when Treasury yields rise. The t-score is -5.40, a very statistically significant level. Overall, the regression of this single variable explains 22.84% of the variation in the default probabilities of Citigroup Inc.
With this best practice as background, we now turn to the procedures of the naïve “intuitive” analyst.
Improper Procedures Based on “Intuition”
One of the challenges facing the naïve intuitive analyst is that the analyst must have an intuitive sense of the proper sign of each of the 37 CCAR variables and their transformations for the 24 year history of the Citigroup 3 month default probabilities. This is not an easy task, yet the intuitive analyst always has an opinion. A typical dialogue goes like this:
Intuitive Analyst: “I will not approve any default model that does not have intuitive signs for each macro factor.”
Modeler: “Well, we have 37 variables and 1,237 counterparties. Can you tell me which signs positive or negative you are expecting for each of those 37 variables for each of the 1,237 counterparties?”
Intuitive Analyst: “We don’t have time for that”
Modeler: “Then I need a procedure to automate your intuition because we don’t have much time for the CCAR modeling. What if I run single variable regressions for each of the CCAR variables and their transformations for each counterparty, a total of 37 x 1,237 = 45,769 regressions? Is that all right with you?”
Intuitive Analyst: “That will do for a start. We will override any of the results that I don’t find intuitive.”
The poor modeler must start to work with no formal procedures on variable selection. He turns to Citigroup’s results using the full set of 37 candidate CCAR variables and their transformations that we displayed above:
The modeler notes that the signs of the 10 statistically significant variables are sometimes different from the “intuitive signs” of the single variable regressions for Citigroup. He summarized the signs of the coefficients from the single variable regressions and notes where the naïve analyst will claim the signs are “wrong” in the best practice results above:
The naïve analyst has power over the modeler. He may be a member of senior management who can fire the analyst. He may be a bank regulator who can “criticize” the bank on its stress test. The modeler may lack the confidence or knowledge to explain the result. Most likely, the modeler fully understands Professor Jarrow’s comments above and that the best practice results are correct but he has no choice. He throws them out and drops three variables “because they have the wrong sign” in the all-knowing mind of the naïve analyst. He then re-runs the regression without these 3 variables and finds that 3 other variables are dropped for a lack of statistical significance. He gets the following results:
The modeler looks at the results. The signs are consistent with the naïve analyst’s demand that the signs be the same as the single variable regressions, but he’s worried. The explanatory power of the model has dropped more than five full percentage points. The interest rate variables have all dropped from the model, so there will be no stress test impact from interest rates in this model. This is true in spite of the clear evidence we showed above that the 5 year U.S. Treasury yield has a statistically significant link to Citigroup’s default probabilities. Also, the naïve analyst didn’t predict that the 2 year percentage change in the commercial real estate index would have the opposite sign from the commercial real estate index level. He delivers the results:
Modeler: “I followed your instructions to the letter. Here is the final model.”
Naïve Analyst: “This model is totally unacceptable. The two year change in the commercial real estate index and the index itself have opposite signs. That’s nonsense. That can’t be right. I refuse to approve this model unless you drop one of the commercial real estate variables.”
The modeler knew it was fruitless to argue with the all-knowing intuition of the naïve analyst if he wanted to stay employed. He looked at the t-scores of the two commercial real estate variables and dropped the 2 year change because it had the lower t-score. He reran the regression and this is what he got:
The modeler shuddered and predicted what would happen when he reviewed it with the naïve analyst.
Modeler: “Here are the results I got when I followed your instructions. What do you think?”
Naïve Analyst: “This model is garbage. Look at the sign on the commercial real estate index. It’s positive—that means that the probability of default on Citigroup rises when the commercial real estate index is high. I’m flunking this model. The only chance you have for approval is if you drop the commercial real estate index because of its non-intuitive sign.”
The modeler had three children and needed the job, so he followed instructions. This is what emerged from the next version of the model:
The adjusted r-squared of the model without the 3 interest rate variables and commercial real estate variables was more than 24 full percentage points below the best practice model. Home prices and volatility were the only variables that survived the naïve analyst’s intuition. 27 of the 28 CCAR variables had been eliminated and would show no direct impact in the stress testing process. Only 1 of the 9 transformations of the CCAR variables, the 2 year percentage change in home prices, survived as well.
Intuition is sometimes the only alternative to science when there is very little data. “Very little data” means the data environment for modeling the default risk of project financing on a new highway in Bangkok, where we literally have no historical data. That is not the case with the default risk of the typical borrowers from U.S. commercial banks. There are nearly 25 years of default probability history on most public firms. There are more than 10 million observations for modeling non-public firm and small business loan defaults. Commercially available data bases on home mortgages have more than 70 million mortgages for analysis. There are more than 2 million observations available for modeling commercial real estate loan defaults. Returning to our example, the extremely bad consequences from using “intuition” to eliminate variables from consideration in our example are typical. There is a ton of evidence, as outlined by King and Soneji above, that modern statistical methods outperform the informal analysis of even the most qualified “naïve analysts.” We close with a quote from Daniel Kahneman: “The difficulties of statistical thinking contribute to the main theme of Part 3, which describes a puzzling limitation of our mind: our excessive confidence in what we believe we know; and our apparent inability to acknowledge the full extent of our ignorance and the uncertainty of the world we live in.”
Appendix: Econometricians on Multicollinearity
What about correlated explanatory variables in a default model?
May 25, 2009
Donald R. van Deventer
One of the most frequently asked questions when people review predictive models of default is this: “Aren’t those explanatory variables correlated, and doesn’t this create problems with multi-collinearity?” Since almost every default model has correlated explanatory variables, this is a question that comes up often. This post collects quotes on this issue from 11 popular econometrics texts to answer this question.
The texts that we consulted were the following highly respected econometrics texts.
Angrist, Joshua D. and Jörn-Steffen Pischke, Mostly Harmless Econometrics, Princeton University Press, Princeton, New Jersey, 2009.
Campbell, John Y, Andrew W. Lo, and A. Craig McKinley, The Econometrics of Financial Markets, Princeton University Press, 1997.
Goldberger, Arthur S. A Course in Econometrics, Harvard University Press, 1991.
Hamilton, James D. Times Series Analysis, Princeton University Press, 1994.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd edition, Springer, 2009
Johnston, J. Econometric Methods, McGraw-Hill, 1972
Maddala, G. S. Introduction to Econometrics, third edition, John Wiley & Sons, 2005.
Stock, James H. and Mark W. Watson, Introduction to Econometrics, second edition, Pearson/Addison Wesley, 2007.
Studenmund, A. H. Using Econometrics: A Practical Guide, Addison-Wesley Educational Publishers, 1997.
Theil, Henri. Principles of Econometrics, John Wiley & Sons, 1971.
Woolridge, Jeffrey M. Econometric Analysis of Cross Section and Panel Data, The MIT Press, 2002.
We’ve selected the following quotes on multi-collinearity from the texts above:
From Goldberger, page 246:
“The least squares estimate is still the minimum variance linear unbiased estimator, its standard error is still correct and the conventional confidence interval and hypothesis tests are still valid.”
“So the problem of multicollinearity when estimating a conditional expectation function in a multivariate population is quite parallel to the problem of small sample size when estimating the expectation of a univariate population. But researchers faced with the latter problem do not usually dramatize the situation, as some appear to do when faced with multi-collinearity”
From Johnston, page 164
“If multicollinearity proves serious in the sense that estimated parameters have an unsatisfactorily low degree of precision, we are in the statistical position of not being able to make bricks without straw. The remedy lies essentially in the acquisition, if possible, of new data or information, which will break the multicollinearity deadlock.”
From Maddala, page 267
“…Multicollinearity is one of the most misunderstood problems in multiple regression…there have been several measures for multicollinearity suggested in the literature (variance-inflation factors VIF, condition numbers, etc.). This chapter argues that all these are useless and misleading. They all depend on the correlation structure of the explanatory variables only…high inter-correlations among the explanatory variables are neither necessary nor sufficient to cause the multicollinearity problem. The best indicators of the problem are the t-ratios of the individual coefficients. This chapter also discusses the solution offered for the multicollinearity problem, such as ridge regression, principal component regression, dropping of variables, and so on, and shows they are ad hoc and do not help. The only solutions are to get more data or to seek prior information.”
Stock and Watson, page 249
“Imperfect multicollinearity means that two or more of the regressors are highly correlated, in the sense that there is a linear function of the regressors that is highly correlated with another regressor. Imperfect multicollinearity does not pose any problems for the theory of the OLS estimators; indeed, a purpose of OLS is to sort out the independent influences of the various regressors when these regressors are potentially correlated.”
Studenmund, page 264
“The major consequences of multicollinearity are
- Estimates will remain unbiased…
- The variances and standard errors of the estimates will increase…
- The computed t-scores will fall…
- Estimates will become very sensitive to changes in specification…
- The overall fit of the equation and the estimation of non-multicollinear variables will be largely unaffected…”
Theil, page 154
“The situation of multi-collinearity (both extreme and near-extreme) implies for the analyst that he is asking more than his data are able to answer.”
Angrist and Pischke
The terms multi-collinearity and correlation literally do not appear in the index, as the authors evidently feel these issues are well understood.
Copyright ©2014 Donald van Deventer