The Federal Reserve documents released on Friday include the Comprehensive Capital Analysis and Review 2015 Summary Instructions and Guidance and Amendments to the Capital Plan and Stress Test Rules.
While the Fed’s announcements are critical to large bank holding companies like Bank of America (BAC), Citigroup (C), JPMorgan Chase & Co. (JPM), and Wells Fargo & Co. (WFC), the accuracy of bank risk models and the need for model validation is world-wide, consistent with the stress testing regimes of the European Central Bank and the United Kingdom. There is a similar need for high quality model validation for value at risk and credit-adjusted value at risk. The same principles apply to all three calculation types. The purpose of this note is outline the kind of modeling assumptions that constitute best practice. In the course of that exercise, we will also identify common errors in modeling assumptions that should render a stress-testing calculation unacceptable.
An Overview of the Key Modeling Issues
Why have major regulators set forth stress tests that are a function of a small number of scenarios instead of a best practice Monte Carlo simulation from which the full probability distribution of outcomes can be drawn? Even though the capability for a full Monte Carlo simulation has been available for more than two decades via systems like Kamakura Risk Manager, one guesses that regulators feared that many banks and, perhaps, the regulators themselves did not have the systems capability to undertake a Monte Carlo simulation. Unfortunately, the only way to assure that the models used to do 3 scenario stress tests at 13 points in time, like the Fed’s CCAR regime requires, is to do a Monte Carlo simulation.
We illustrate the reasons with a simple example. Consider a mortgage-backed security held by the bank with a time zero observable market value of 101. The bank calculates stress test values of 100, 97 and 95, and submits value changes of -1, -4, and -6 for the three scenarios. What is wrong with this? Perhaps nothing, but the best way to determine whether the submissions for this single asset are accurate or not is to do a Monte Carlo simulation of the value of the mortgage backed-security at time zero. If one does that and finds that the simulated time zero value is 99, the fundamental assumptions behind the model are invalid and the stress test submissions are wrong. If the model is wrong by 2 in all three scenarios, then the proper stress test submissions were 1, -2, and -4. This form of model validation is extremely simple to do, and yet we find that very few efforts at model validation include this fundamental credibility check.
In more formal academic terms, this is a check of the “no arbitrage” properties of the model. Please keep in mind that some Monte Carlo simulation exercises can be done so that sampling error is literally zero (an important feature of the Heath Jarrow and Morton interest rate modeling approach, as summarized in Jarrow’s Modeling Fixed Income Securities and Interest Rate Derivatives and Advanced Financial Risk Management ). More typically, there will be sampling error in a Monte Carlo simulation and a model error is defined as a simulated price that is outside of a two or three sigma range given the number of simulations done.
We now turn to a series of tests that can detect common errors in model assumptions and simulations with a minimum of incremental effort.
An Overview of the Model Validation Regime
As noted by Prof. Robert Jarrow in a recent article, there are no perfect models. The question for model validation is two-fold: is the model “good enough” to provide value, and is there an alternative modeling approach that is better?
There are four steps in the model validation process:
- A check of the accuracy of the assumptions of the model. Since there are no perfect models, we do not insist on perfection. Nonetheless, we have no tolerance for fatal inaccuracy either.
- A check of the implications of the model given its assumptions.
- A check of econometric procedures used to parameterize the model
- A check of the no arbitrage properties of the model. Does it accurately value traded securities with observable prices at time zero?
Many of the common errors in stress-testing, VAR and credit-adjusted VAR (“CVAR”) stem from very predictable human tendencies that can be considered issues of behavioral economics. We list a few tendencies here that raise red flags that call for intense scrutiny:
- To use the historical data set provided by regulators for modeling with no additions. For example, the Fed data set includes foreign exchange rates but no interest rate variables for the foreign countries. No arbitrage FX rates cannot be modeled without knowing local interest rates, as we discuss below. We need to supplement the Fed data set to ensure no arbitrage.
- To use the data field (say the 30 year fixed rate mortgage yield) provided by regulators without consideration for the proper no arbitrage specification for linking mortgage yields to macro factor movements. We discuss that below.
- To assume the historical variables are normally or lognormally distributed with a constant mean and variance
- To assume that the historical variables are independently and identically distributed over time
- To use the phrase “that’s not intuitive to me” to override a valid model
We give some concrete examples from past blogs and then move on to some Fed CCAR examples.
Model Validation Issues Previously Discussed
We’ve discussed a number of model validation issues in past blogs that we won’t repeat here, except to call the reader’s attention to these resources:
- It is grossly inaccurate to assume the credit spread on an asset equals (1 minus the recovery rate) times the default probability.
- It is grossly inaccurate to assume that only 1 or 2 factors drive any yield curve, particularly the risk free curve like the U.S. Treasury curve. Typically 6 to 10 factors are needed.
- It is grossly inaccurate to use the Nelson-Siegel or Svensson yield curve smoothing methods in stress-testing, VAR and CVAR, because these methods are inconsistent with observable data. In other words, both models are not consistent with no arbitrage, even though the Fed has used the Svensson method in some of its calculations.
- The Merton model of risky debt is much less accurate than the reduced form credit models in forecasting corporate defaults, as shown by Campbell, Hilscher and Szilagyi (2008,2011) and Bharath and Shumway (2008)
- An increase in default risk substantially reduces the ability to sell a fixed income security, and the valuation methods use should take this into account.
With these issues out of the way, we cover some other model validation checks that allow us to pick the “low hanging fruit” of common errors.
A Check on the Accuracy of Model Assumptions
In this and subsequent sections, we use historical data provided by the Federal Reserve as part of the 2014 CCAR stress testing exercise for illustration purposes. We start with a simple question. Are any of the variables normally distributed, either in terms of absolute level, quarterly changes or quarterly percentage changes? We know that normality can be an elusive goal, especially for small data sets. How far from this goal are we? We use four statistical tests to answer this question: theShapiro Wilk test, theShapiro Francia test, theskewness and kurtosis test, and the Kolmogorov-Smirnov test. When we apply these four tests to the absolute level of selected CCAR variables, we find the following:
The results show that, via the Shapiro Wilk test, the hypothesis of a normal distribution in the absolute levels of the 28 CCAR 2014 variables is rejected in 24 out of 28 cases. The variable descriptions are available in the documentation of the Federal Reserve historical data at the link given above.
The probability values (“p-values”) for each test are given here:
Next, we check to see whether it is correct to assume that the changes in the values of variables are normally distributed. We apply the same four tests with these results:
In this case, the hypothesis of normally distributed quarterly changes is rejected by the Shapiro-Wilk test in 26 out of 28 cases.
Next, we examine the hypothesis that the quarterly percentage changes are normally distributed, that is that the variable itself has a lognormal distribution. We present the results in this table:
In the case of quarterly returns, the hypothesis of normality is rejected for 21 of the 28 variables supplied by the Federal Reserve.
Along with the hypothesis of some form of normality, there is often an overpowering tendency to assume that the variables are independently and identically distributed. This assumption implies that there should be no autocorrelation in the residuals from period to period and that the assumption of a constant variance (i.e. no heteroskedasticity) is valid. We can test whether these features are true or not using standard tests for autocorrelation and heteroskedasticity. We present the results only for the first assumption, normality in the levels of the variable, in the interests of brevity.
The results show that the hypothesis of constant variance (no heteroskedasticity) cannot be rejected in only 6 of the 28 cases. The hypothesis of no autocorrelation is rejected 27 of 28 times.
The model validation exercises that have been done so far require no software other than a good stat package, and yet they have provided a long check list of false assumptions. We now turn to another area that yields a rich check list of model validation: the implications of model assumptions.
A Check on Model Implications
It is often the case that, when a user chooses a set of assumptions that produce unreasonable outcomes, the user blames the software that reveals the bad outcomes instead of himself. A proper model validation effort will reveal bad outcomes before bad assumptions are fed into a sophisticated risk management system. One important check on model implications is to alert the analyst when a set of assumptions produces values that are “unreasonable” by an objective criterion. The easiest such cases are outcomes that are literally impossible (like a negative unemployment rate, which could happen if the unemployment rate is assumed to be normally distributed). Another set of useful criterion that are “unreasonable” are outcomes that exceed the highest and lowest values ever recorded with a large probability.
To illustrate this procedure, we test the probability that the user-selected probability distributions fall outside of historical minimums and maximums for time horizons from 1 to 30 years using quarterly time periods. We take advantage of the formulas in the appendix to calculate the means and standard deviations, given the user’s assumptions, of the macro-economic variable at each time horizon. Here are the results:
Consider the U.S. Dollar/Euro exchange rate, variable 4. The user has chosen to model this as a lognormally distributed variable. Unfortunately, given the parameters selected, a Monte Carlo simulation of this variable will produce simulations that are outside of the historical maximum and minimum 20.62% of the time in a one year simulation and 76.77% in a 30 year simulation. These probabilities are career-threatening. They are the fault of the user, not the software which delivered the bad news about the user’s choices.
Consider another variable, variable 11, which is the annualized change in real gross domestic product. The assumption of lognormality produces what we label a “span error,” the inability of the selected probability distribution to span the range between the historical minimum and maximum. In this case, the real GDP growth rate can be negative but the lognormal assumption will not allow that outcome.
The formulas in the appendix make this model validation check nearly painless.
A Check of Econometric Procedures
Users of reduced form default probability models are very familiar with the logistic regression formula shown here:
The default probability P[t] is a function of the n explanatory variables X1 to Xn. The alpha and B coefficients are derived using a historical data base of defaults and a good statistical package. What happens when the statistics are done on a quarterly data base but the stress tests require, like CCAR, forward default probabilities 1, 2, 3, …, and 13 quarters ahead? How are the input variables Xi determined in that case? That is a question with an answer that Kamakura Corporation has embedded in its Kamakura Risk Information Services and KRM software system. Our focus here is more about what should NOT be done. It is very common for an analyst to forecast Xi with a linear regression and then drop the result into the default probability formula above.
This problem was discussed by Joshua Angrist and Jorn-Steffen Pischke in their classic book Mostly Harmless Econometrics: An Empiricist’s Companion on pages 190-192:
“Forbidden regressions were forbidden by MIT professor Jerry Hausman in 1975, and while they occasionally resurface in an under-supervised thesis, they are still technically off limits. A forbidden regression crops up when researchers apply 2SLS (two stage least squares) reasoning directly to non-linear models…As a rule, naively plugging in first-stage fitted values in non-linear models is a bad idea.”
The authors go on to explain why this is unacceptable at great length. We leave that background to Angrist and Pischke. Suffice it to say that a well-trained regulator should reject this approach as soon as they see it in a stress testing, VAR or CVAR exercise.
There are many other examples one could give but this is one of the most common errors we see among model builders.
A Check of No Arbitrage Integrity
A final but extremely important area for model validation is to ensure that the assumptions chosen are accurate enough that a Monte Carlo simulation of the value of all transactions with an observable price are priced correctly. This includes non-traded transactions with an active primary market like mortgage loans, auto loans, and charge card transactions. Obviously the bank would not be originating these transactions unless the perceived value of the newly made loan is higher than par value plus origination expenses. For securities with a traded price, a Monte Carlo simulation should either produce an exact match of observable market values or a calculated price within two or three sigmas (sampling error sigmas) of the observable price.
An invalid approach to no arbitrage model validation is to confirm the observable price of a bond using the well-known formula that depends on zero coupon bond prices times cash flows, as in this formula:
There is nothing wrong with this formula except the choice to use it for model validation. The zero coupon bond prices P(t) are a function of the time zero yield curve, not the Monte Carlo simulation parameters whose validity we are trying to test. In Kamakura Risk Manager, for example, the user simply selects “Monte Carlo” as the valuation technique, avoiding the default selection of the formula above.
A simpler test of violation of no arbitrage pricing restrictions is obvious from the user’s model selection.
- If the user’s choice of the distribution of an interest rate has a constant mean, this is a violation of no arbitrage restrictions that have been well understood for nearly 30 years. The forward rate curve features in the single factor interest rate models of Ho and Lee and Hull and White. More importantly and more accurately the forward rate curve is critical in the general n-factor Heath Jarrow and Morton no arbitrage constraints. This means that the mean of the interest rate variable has to be updated with each time step after the new risk free curve is simulated. If the user has not done this, the Monte Carlo pricing of bonds will never be correct and this is known without doing any calculations in any software.
- If the user’s choice of the mean in a foreign exchange rate distribution is constant, this also violates no arbitrage restrictions because the FX rate depends on forward rates in both countries. This was shown in the random interest rate case by Amin and Jarrow.
- For traded assets in general, like the mortgage yield case referred to above, the proper econometric specification that insures no arbitrage models the total return on the mortgage (not the time series on new mortgage loan yields) less the risk free rate, i.e. the excess return on the mortgage. This approach has been well understood for more than 50 years, dating back to thecapital asset pricing model of Sharp, Lintner and Mossin. Amin and Jarrow generalized the no arbitrage restrictions for the case of a risk free curve with n factors, for any traded asset and any set of non-interest rate macro factors.
The chart below applies this set of tests for no arbitrage consistency on an arbitrary set of assumptions about the Fed’s CCAR 2014 macro factors.
This note has covered a check list of common errors to be avoided when conducting a calculations for a stress test, value at risk, or credit-adjusted value at risk. In all three cases, even if a Monte Carlo simulation is not the objective of the analyst, a Monte Carlo simulation of time zero values is an essential model validation test. We have shown in the previous pages that there is a long list of well-intended but erroneous modeling choices that are made frequently. By avoiding these common errors, institutions subject to both regulatory and managerial stress tests can avoid the ultimate risk management nightmare: running a perfect hedge on the wrong number.
Useful Formulas for Normal and Lognormal Distributions in Simulations
Lognormal distribution of returns
Consider a security with value S(0) at time zero and value S(k) after a forward-looking simulation of k periods. One common assumption about the distribution of securities returns is that they are lognormal with a mean, volatility, and correlation derived from historical data. If the m return assumed for the period is a constant m and the standard deviation of returns during the period is σ, the simulated value of the security’s value after k periods is
The variable r(m,σ,i) is the random, simulated return on the security in the ith period. The mean return is of course m and the periodic volatility is σ 2. We can solve for the probability distribution of S(k) by taking the log of both sides of the expression above and simplifying using the properties of the normally distributed returns, which we assume are independent from period to period.
Rearranging and taking expectations gives us the mean continuously compounded return over the k periods:
The variance of the continuously compounded return over the k periods is
And the standard deviation of the continuously compounded return over k periods is
Normally distributed variables in a simulation
What if another variable X is normally distributed from period to period with changes c(m,σ,i) in period i with a mean of m again and a standard deviation of σ? Then the value of X(k) after k periods is
Since the changes are again assumed to be independent, the mean, variance and standard deviations of the changes in X over k periods can be written as follows: