As we discussed in Part 2 of this series, we don’t arbitrarily define a specific function as the “best yield curve”. Instead, we define the economic and mathematical criterion for “best” subject to a set of constraints whose alternatives we listed in Part 2 of this series. We then derive the functional form that is consistent with our criterion for best and apply it to the same sample set of data through the rest of this blog series.
Sample Data for the Basic Building Blocks of Yield Curve Smoothing
For each of our smoothing techniques to be comparable, we will use the same set of data throughout the series. We will limit the number of data points to six points and smooth the five intervals between them. In order to best demonstrate how the definition of best affects the quality of the answer, we choose a simple set of sample data with two “humps” in the yield data.
Choosing a pattern of yields that falls on a perfectly straight line, for example, would lead a naïve analyst to the logical but naïve conclusion that a straight line is always the most accurate and “best” way to fit a yield curve. Such a conclusion would be nonsense. Similarly, restricting oneself to a yield curve functional form that allows only one “hump” would also lead one to simplistic conclusions about what is “best.”
We recently reviewed 4,435 days of “on the run” U.S. Treasury yields reported by the Federal Reserve in its H15 statistical release at maturities (in years) of 0.25, 0.5, 1, 2, 3, 5, 7,10 and 30 years. In comparing same-day yield differences between maturities (i.e. the difference between yields at 0.25 and 0.5 years, 0.5 and 1 year, etc.), there were 813 days where there were at least two decreases in yields as maturities lengthened. For example, using the smoothing embedded in common spreadsheet software, here is the yield curve for April 15, 1982:
A more complex pattern can be seen on December 18, 1989:
October 15, 1998 shows an equally complex interaction between yields at various maturities:
Finally, on December 21, 1998, another complex set of curves emerges from the data:
Therefore, it is very important that our sample data have a similarly realistic complexity to it or we will avoid challenging the various yield curve smoothing techniques sufficiently to see the difference in accuracy among them. It is important to note that the “yields” in these pictures are the simple “yields to maturity” on instruments which predominately are coupon bearing instruments. These yields should NEVER be smoothed directly as part of a risk management process. The graphs above are included only as an introduction to how complex yield curve shapes can be. When we want to do accurate smoothing, we are very careful to do it in the following way:
a. We work with raw bond net present values (price plus accrued interest), not “yields,” because the standard yield to maturity calculation is riddled with inaccurate and inconsistent assumptions about interest rates over the life of the underlying bond.
b. We only apply smoothing to prices in the risk free bond and money markets. If the underlying issuer of fixed income securities has a non-zero default probability, we apply our smoothing to credit spreads or forward credit spreads in such a way that bond mispricing is minimized. We never smooth yields or prices of a risky issuer directly, because this ignores the underlying risk-free yield curve.
We will illustrate these points later in our series on the basic building blocks of yield curve smoothing. We now start with Example A, the simplest approach to yield curve smoothing.
Example A: Stepwise-Constant Yields and Forward Rates
From this point on in this series, unless otherwise noted, “yields” are always meant to be continuously compounded zero coupon bond yields and “forwards” are the continuous forward rates that are consistent with the yield curve. The first step in exploring a yield curve smoothing technique is to define our criterion for best and to specify what constraints we impose on the “best” technique to fit our desired trade-off between simplicity and realism. We answer the nine questions posed in Part 2 of this series.
Step 1: Should the smoothed curves fit the observable data exactly?
1a. Yes
1b. No
1a. Yes. With only six data points at six different maturities, it would be a poor exercise in smoothing if we could not fit this data exactly. We note later that the flawed Nelson-Siegel function is unable to fit this data and fails our first test.
Step 2: Select the element of the yield curve and related curves for analysis
2a. Zero coupon yields
2b. Forward rates
2c. Continuous credit spreads
2d. Forward continuous credit spreads
2a. Zero coupon yields is our choice. We find in the end that 2a and 2b are equivalent given our other answers to the following questions. If we were dealing with a credit-risky issuer of securities, we would have chosen 2c or 2d, but we have assumed our sample data is free of credit risk.
Step 3: Define “best curve” in explicit mathematical terms
3a. Maximum smoothness
3b. Minimum length of curve
3c. Hybrid approach
3b. Minimum length of curve. This is the easiest definition of “best” to start with. We’ll try it and show its implications. The following article on www.wikipedia.com explains how to calculate the length of a curve given the mathematical function that produced the curve:
http://en.wikipedia.org/wiki/Arc_length
The length s of a yield curve or forward rate curve between maturities a and b is
where f’(x) is the first derivative of the yield curve or forward rate curve. We want to minimize s over the full length of the yield curve. By doing so, we generate a series of line segments with maximum “tension.” Some analysts have suggested using “tension splines” that take a hybrid approach to balancing maximum smoothness and tension. Leif Andersen proposes such an approach in this paper in the Review of Derivatives Research in 2007:
We now move on to specifications on curve fitting that represent our desired trade-off between realism and ease of calculation.
Step 4: Is the curve constrained to be continuous?
4a. Yes
4b. No
4b. No. By choosing no, we are allowing discontinuities in the yield curve.
For an example of a high quality academic paper which makes a similar assumption, see this paper by Robert Jarrow and Yildiray Yildirim, where the authors use a four-step piece-wise constant function for forward rates:
Step 5: Is the curve differentiable?
5a. Yes
5b. No
5b. No. Since the answer we have chosen above, 4b, does not require the curve to be continuous, it will not be differentiable at every point along its length.
Step 6: Is the curve twice differentiable?
6a. Yes
6b. No
6b. No. For the same reason, the curve will not be twice differentiable at some points on the full length of the curve.
Step 7: Is the curve thrice differentiable?
7a. Yes
7b. No
7b. No. Again, the reason is due to our choice of 4b.
Step 8: At the spot date, time 0, is the curve constrained?
8a. Yes, the first derivative of the curve is set to zero or a non-zero value x.
8b. Yes, the second derivative of the curve is set to zero or a non-zero value y.
8c. No
8c. No. For simplicity, we answer No to this question. We relax this assumption in later posts in this series.
Step 9: At the longest maturity for which the curve is derived, time T, is the curve constrained?
9a. Yes, the first derivative of the curve is set to zero or a non-zero value j at time T.
9b. Yes, the second derivative of the curve is set to zero or a non-zero value k at time T.
9c. No
9c. No. Again, we choose No for simplicity and relax this assumption later in the blog.
Now that all of these choices have been made, both the functional form of the line segments and the parameters that are consistent with the data can be explicitly derived from our sample data. The resulting forward rate curve or yield curve that is produced by this method has these attributes:
- Given the constraints imposed on the curve and the raw data, the curve is the “best” that can be drawn consistent with the analyst’s definition of “best”
- The data will be fit perfectly
- All constraints will be adhered to exactly
We always follow these steps in this series on the basic building blocks of yield curve smoothing:
- We select the criterion for best
- We impose constraints we think are realistic given the problems at hand
- We derive the functional form and parameters for the curve
- We prove the curve is “better” than alternatives by our definition
We now derive the functional form and parameters for Example A.
Deriving the Form of the Yield Curve Implied by Example A
The key question in the list of 9 questions above is question 4. By our choice of 4b, we allow the “pieces” of the yield curve to be discontinuous. By virtue of our choices in questions 5-9, these yield curve pieces are also not subject to any constraints. All we have to do to get the “best” yield curve is to apply our criterion for “best”—the curve that produces the yield curve with shortest length.
The length of a straight line between two points on the yield curve has a length that is known, thanks to Pythagoras, who rarely gets the credit he deserves in the finance literature:
The length of a straight segment of the yield curve that goes from maturity t1 and yield y1 to maturity t2 and yield y2 is the square root of (the square of [t2-t1] plus the square of [y2-y1]. As we noted above, the general formula for the length of any segment of the yield curve between maturity a and maturity b is given by this formula, which is the function one derives as the different between t2 and t1 becomes infinitely small:
where f’(x) is the first derivative of the line segment at each maturity point, say x. If the line segment happens to be straight, the segment can be described as
y=f(t)=mt+k
and the first derivative f’(t) is of course m. How can we make this line segment as short as possible? By making f’(t)=m as small as possible in absolute value, that is m=0. Very simply, we have DERIVED the fact that the yield curve segments which are “best” (which have the shortest length) are flat line segments. We are allowed to use flat line segments to fit the yield curve because our answer 4b does not require the segments to join each other in continuous fashion. The functional form of the “best” yield curve given our definition and constraints can be derived more elegantly using the calculus of variations as Oldrich Vasicek did in the proof of the maximum smoothness forward rate approach in Adams and van Deventer (1994), reproduced in Chapter 8 in van Deventer, Imai and Mesler’s Advanced Financial Management (John Wiley & Sons, 2004).
Given that we have 6 data points, there are five intervals between points. We have taken advantage of our answer in 4b to treat the “sixth interval” from 0 years to maturity to 0 years to maturity as a separate segment. Given our original data, we know by inspection what the continuously compounded flat line segment has to be: the actual value of the input data for y at the right hand side of the line segment, and our “sixth segment” has as a value the given value 4.000% for a maturity of zero:
Note that the value of the continuous forward rates are identical to the value of zero coupon yields of the fourth relationship between zero yields, forwards and zero coupon bond prices that were summarized in our August 14, 2009 blog on Nelson-Siegel versus spline technologies:
Since the first derivative of the yield “curve” in each case is zero, the forward rates are identical to the zero yields.
While the stepwise constant approach is extremely simple, it is commonly used in academic work of the highest quality. For an example, see this paper by Robert A. Jarrow and Yildiray Yildirim:
How did we do in terms of minimizing the length of the yield “curve” over its 10 year span? We know the length of a flat line segment from t1 to t2 is just t2-t1, so the total length of our discontinuous yield curve is
Length=(0-0)+(0.25-0)+(0.5-0.25)+(1-0.5)+(3-1)+(5-3)+(7-5)+(10-7)=10.000
We now compare our results for our “best” Example A yield curve and constraints to the popular but flawed Nelson-Siegel approach.
Fitting the Nelson-Siegel Approach to Sample Data
We now want to compare Example A, the model we derived from our definition of “best” and related constraints, to the Nelson-Siegel approach. The following table emphasizes the stark differences between even the basic Example A model and Nelson-Siegel:
As the table notes, we know even before we start this exercise that the Nelson-Siegel function will not fit the actual market data we’ve assumed because there are more data points than there are Nelson-Siegel parameters AND the functional form of Nelson-Siegel is not flexible enough to handle the kind of actual U.S. Treasury data we reviewed earlier in this blog. The other thing that is important to note is that Nelson-Siegel will NEVER be superior to a function that has the same constraints and is derived from either (a) the minimum length criterion for best or (b) the maximum smoothness criterion.
We have explicitly chosen to answer “no” to questions 4-9 in Example A, while the answers for Nelson-Siegel are “yes.” This is not a virtue of Nelson-Siegel; it is just a difference in modeling assumptions. In subsequent installments of this blog, we will impose the constraints in questions 4-9 and we will find the results are still superior in every case to Nelson-Siegel.
In fitting Nelson-Siegel to our actual data, we have to make a choice of the function we are optimizing:
a. Minimize the sum of squared errors versus the actual zero coupon yields
b. Minimize the sum of squared errors versus the actual zero coupon bond prices
If we were using coupon bond price data, we would always optimize on the sum of squared pricing errors versus true net present value (price plus accrued interest) because legacy yield quotations are distorted by inaccurate embedded forward rate assumptions. In this case, however, all of the assumed inputs are on a zero coupon basis, and we have another issue to deal with. The zero coupon bond price at a maturity=0 is 1 for all yield values, so using the zero price at a maturity of zero for optimization is problematic. This means that we need to optimize in such a way that we minimize the sum of squared errors versus the zero coupon yields at all of the input maturities, including the zero point. We need to make 2 other adjustments before we can do this optimization using common spreadsheet software:
- We optimize versus the sum of squared errors in yields times 1 million in order to minimize the effect of rounding error and tolerance settings embedded in the optimization routine
- We note that, at maturity zero, the Nelson-Siegel function “blows up” because of a division by zero. Since y(0) is equal to the forward rate function at time zero f(0), we make that substitution to avoid dividing by zero.
That is, at the zero maturity point, instead of using the Nelson-Siegel yield function
we use the forward rate function:
We now chose the values of alpha, beta, delta and gamma that minimize the sum of squared errors (times 1 million) in the actual and fitted zero yields. The results of that optimization are summarized in this table:
After two successive iterations, we reach the best fitting parameter values for alpha, beta, delta and gamma. To those who have not used the Nelson-Siegel function before, the appallingly bad fit might come as a surprise. Even after the optimization, the errors in fitting zero yields are 32 basis points at the zero maturity, 36 basis points at 0.25, almost 12 basis points at 1 year, 36 basis points at 3 years, 33 basis points at five years, and 6 basis points at 10 years. The Nelson-Siegel formulation fails to meet the necessary condition for the consideration of a yield curve technique in this series: the technique MUST fit the observable data. Given this, why do people use the Nelson-Siegel technique? The only people who would use Nelson-Siegel are people who are willing to assume that the model is true and the market data is false. That’s not a group likely to have a long career in risk management.
The graph below shows that our naïve model Example A, which has step-wise constant (and identical) forward rates and yields, fits the observable yields perfectly. The observable yields are plotted as black dots. The red dots represent the step-wise constant forwards and yields of Example A. The green dots are the Nelson-Siegel yields and the purple dots are Nelson-Siegel forward rates.
Now we pose a different question: given that we have defined the “best” yield curve as the one with the shortest length, how does the length of the Nelson-Siegel curve compare with Example A’s length of 10 units?
There are two ways to answer this question. First, we could evaluate this integral using the first derivative of the Nelson-Siegel yield formula to evaluate the length of the curve, substituting y’ for f’ below:
The second alternative is suggested by the link on length of an arc (given above): we can approximate the Nelson-Siegel length calculation by using a series of straight line segments and the insights of Pythagoras to evaluate length numerically. When we do this at monthly time intervals, the first 12 months’ contributions to length are as follows:
We sum up the lengths of each segment, 120 months in total, to get a total line length of 10.2314, compared to a length of 10.0000 for Example A’s derived “best” curve, a step function of forward rates and yields. Note that the length of the curve, when the segments are not flat, depends on whether yields are displayed in percent (4.00) or decimal (0.04). To make the differences in length more obvious, our length calculations are based on yields in percent.
In this blog, we have accomplished three things:
a. We have shown that the functional form used for yield curve fitting can and should be derived from a mathematical definition of “best” rather than being asserted for qualitative reasons
b. We have shown that the Nelson-Siegel yield curve fitting technique fails to fit yield data with characteristics often found in the U.S. Treasury bond market
c. We have shown that the Nelson-Siegel technique is inferior to the step-wise constant forward rates and yields that were derived from Example A’s specifications: that “best” means the shortest length and that a continuous yield function is not required, consistent with the Jarrow and Yildirim paper above
In part 4 of this series, we impose the continuity constraint and again derive the “best yield curve.”
Donald R. van Deventer
Kamakura Corporation
November 19, 2009