Time Series Models of Heteroskedasticity

Chapter 21 Time Series Models of Heteroskedasticity There are no worked examples in the text, so we will work with the Federal Funds rate as shown on page 658 and below in Figure 21.1. It will turn out that this is not an ideal example for demonstrating GARCH models 1, but it will be a good example of reading the diagnostics. 20.0 17.5 15.0 12.5 10.0 7.5 5.0 2.5 0.0 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Figure 21.1: U.S. Federal funds rate Choosing the Mean Model The assumptions for the ARCH model in Hamilton are: (a) y t = x t β +u t (b) u t = h t v t (c) {v t }i.i.d.,ev t = 0,Ev 2 t = σ 2 (d) h t = ζ +α 1 u 2 t 1 +...+α m u 2 t m The first thing to note is that, because of (b) and (c), the ARCH process is defined as a white-noise process. Thus we need a model of the series which generates residuals that are serially uncorrelated in their mean. In most ARCH and GARCH models, there are separate mean and variance models. One problem with choosing a model for the mean is that most relatively simple methods for 1 A simple ARCH or GARCH doesn t fit it very well. 61

Time Series Models of Heteroskedasticity 62 choosing a model for the serial correlation of a series assume that the residuals are homoscedastic. However, if we re expecting ARCH or GARCH residuals, that won t be the case. Despite that, the mean model is generally chosen using standard methods we then test the residuals to see if any adjustments need to be made once we ve estimated a complete model. Just a quick look at the graph shows that the funds rate is strongly serially correlated. We will choose an autoregressive model for the mean using the @ARAutolags procedure, allowing for a maximum of 12 lags. The selection of the mean model and tests for ARCH effects is in Example 21.1. cal(m) 1955 open data fedfund.rat data(format=rats) 1955:1 2000:12 graph(footer=$ "Figure 21.1 U.S. federal funds rate (monthly averages)") # ffed @ARAutoLags(crit=bic,maxlags=12,table) ffed The chosen number of lags using BIC is 3 (Table 21.1). Different criteria give different results here, as both Hannan-Quinn (CRIT=HQ) and Akaike (CRIT=AIC) choose 12. For various reasons, though, it probably makes sense to start with a smaller model and expand it if that seems to be needed. Remember, again, that all of these are derived under the assumption of homoscedasticity. When there may be (and, in this case are) some overly large residuals, it s much easier to get (apparently) large autocorrelations by chance. Table 21.1: Bayesian IC Lag Analysis of AR Models Lags IC 0 2.400 1 0.950 2 1.066 3 1.068 4 1.057 5 1.051 6 1.040 7 1.030 8 1.026 9 1.041 10 1.041 11 1.035 12 1.033 Based upon this, we will use an AR(3) model for the mean throughout. The first step is to estimate the base mean model: linreg ffed

Time Series Models of Heteroskedasticity 63 If we do a standard Ljung-Box Q test on the residuals from this, we would (strongly) reject that they are white noise: @regcorrs(number=36,qstat,footer="residuals from AR(3)") produces Figure 21.2. However, despite the strong rejection of white noise, it doesn t suggest any obvious changes to the model to fix this. The large autocorrelations at 7, 8 and 9 would not easily be corrected by any simple adjustment. 1.00 0.75 0.50 0.25 0.00-0.25-0.50-0.75 AIC= 1.587 SBC= 1.618 Q= 94.61 P-value 0.00000-1.00 5 10 15 20 25 30 35 Residuals from AR(3) Figure 21.2: Residual Analysis from AR(3) mean model However, the standard Q that is used in @REGCORRS is again based upon the assumption that the process is homoscedastic. A more general test is provided by West and Cho (1995), which is implemented in the RATS procedure @WestChoTest. The West-Cho test with the same number of correlations yields quite a different result as @westchotest(number=36) %resids produces Series %RESIDS Q(36) 37.62 Signif. 0.3951 which is near the middle of the distribution rather than being far out in the tail. Thus, we are led to conclude that the apparent significant correlations in the residuals from the basic model appear to be artifacts of the heteroscedastic process. Testing for ARCH Effects There are two principal methods to test for ARCH effects in the residuals for a regression (or ARMA model). The first is the LM test from Engle (1982), which

Time Series Models of Heteroskedasticity 64 takes the squared residuals and regresses on a constant and lagged squared residuals. This is discussed in the text on pp 664-5. The test is similar to a Breusch-Godfrey test for serial correlation applied to the squares instead of the residuals themselves, but with one critical difference: in the BG test, you include the original regressors as well as the lagged residuals. In the ARCH test, you can ignore the fact that the residuals came from a regression because there is a standard result that, in the case of a regression with heteroscedastic errors, the parameter estimates for the regression itself and for the scedastic function are asymptotically uncorrelated (assuming no direct connection like a shared parameter). The other test is called McLeod-Li from McLeod and Li (1983): it s just a Ljung-Box Q test applied to the squared residuals. The first type of test can be implemented fairly easily using alinreg instruction, though there is also an@archtest procedure, which we will use as well. You need to square the residuals from the AR(3) model (after we copy them to the separate series U for use later), and regress the square on the CONSTANT and a chosen number of lags (here 4) of the square. We re also including here a test just of the last two lags. set u = %resids set u2 = uˆ2 linreg u2 # constant u2{1 to 4} exclude(title="arch Test: F Variant") # u2{1 to 4} exclude # u2{3 4} cdf(title="arch Test: Chi-Squared Variant") chisqr %trsquared 4 There are several ways to convert the information from this auxiliary regression into a test statistic. This shows the F-test variant, which just tests the lagged squares using a standard F-test. Note that we don t even have to do the separate EXCLUDE instruction, since it is just reproducing the Regression F in the standard LINREG output. There s also a chi-squared variant which uses TR 2 this has asymptotically a χ 2 with degrees of freedom equal to the number of tested lags. 2 The two should give very similar significance levels, particularly with the high number of degrees in this case (540). The coefficients on the regression are typically not of independent interest we just need the overall significance of the statistic. However, here the test on the final two lags suggests that for an ARCH model, two lags will be adequate: 2 In RATS variables, TR 2 is%trsquared.

Time Series Models of Heteroskedasticity 65 ARCH Test-F Variant Null Hypothesis : The Following Coefficients Are Zero U2 Lag(s) 1 to 4 F(4,540)= 10.56140 with Significance Level 0.00000003 Null Hypothesis : The Following Coefficients Are Zero U2 Lag(s) 3 to 4 F(2,540)= 0.55418 with Significance Level 0.57486950 The @ARCHTEST procedure can also be used to do the LM test. This does the test for up to 4 lags, showing the results for every number of lags from 1 to 4. 3 You pass the residuals themselves (or whatever series you want to test) to the procedure, not the square. @ARCHTEST handles the squaring of the series itself. @ARCHtest(lags=4,span=1) u produces Test for ARCH in U Using data from 1955:04 to 2000:12 Lags Statistic Signif. Level 1 5.528 0.01907 2 20.691 0.00000 3 14.016 0.00000 4 10.561 0.00000 The lags=4 value matches what we had above. Given the results on the separate exclusion on lags 3 and 4, it s not a surprise that the test statistic (while still very significant) is not as large for 4 lags as for 2. As with any test like this, the power is reduced if you put in more lags than actually needed. The advantage of doing@archtest with thespan option is that you can see whether the effect is there at the short lags even if it has been diluted in the longer lags. The McLeod-Li test is done using the procedure@mcleodli. As with@archtest, this takes the residuals rather than their squares as the argument: @McLeodLi(number=36) u This strongly agrees with the conclusion from above the squared residuals are clearly not serially independent: McLeod-Li Test for Series U Using 549 Observations from 1955:04 to 2000:12 McLeod-Li(36-0) 169.494718 0.00000 Though not typically necessary, you can also use @REGCORRS applied to the squared residuals to get a graphical look at the correlations of the squares. The following creates @regcorrs(number=36,nocrits,$ title="correlations of Squared Residuals") u2 3 TheSPAN indicates the distance between lag lengths used. SoLAGS=24,SPAN=4 will do 4, 8, 12,..., 24.

Time Series Models of Heteroskedasticity 66 1.00 0.75 0.50 0.25 0.00-0.25-0.50-0.75-1.00 5 10 15 20 25 30 35 Figure 21.3: Correlations of Squared Residuals Maximum Likelihood Estimation with Gaussian v t The estimation of the ARCH model is in Example 21.2. The estimation of an ARCH(m) model is very similar to what is described on pp 660-661 except that RATS doesn t (by default) condition on the first m data points. 4 Instead, RATS uses where h t = ζ +α 1 u 2 t 1 +...+α m u 2 t m (21.1) u 2 s = { (ys x s β) 2 if s 1 ˆσ 2 ifs 0 (21.2) ˆσ 2 is a sample estimate of the variance of the residuals. Thus pre-sample squared residuals are replaced by a sample estimate of their mean. In this notation t = 1 is the first data point for which y t x t β is computable. With x requiring three lags, that will actually be entry 4 in the model in use. The difference between the two estimates is likely to be slight. Unlike the case of an ARMA model, there is no unconditional density for the pre-sample of an ARCH process. Different software will handle the pre-sample differently and so will give somewhat different results. Based upon the results of the test for ARCH, we will use 2 lags in the variance process. The RATS instruction for estimating the model is garch(q=2,regressors) / ffed The Q=2 selects an ARCH(2) model. The REGRESSORS option is used to indicate that there is a regression model for the mean the default is to just use 4 There is a CONDITION option which allows you to compute the likelihood for an ARCH conditioning on the early observations. Here you would use CONDITION=2 to get the likelihood as described in Hamilton.

Time Series Models of Heteroskedasticity 67 a simple process mean, the equivalent of using just the CONSTANT. As with a LINREG, the explanatory variables follow on a supplementary card using regression format. You ll note that the sample range (/ to mean the default) comes before the name of the dependent variable. This is because the GARCH instruction can be used for both univariate and multivariate models, so the list of dependent variables needs to be open-ended. The output is in Table 21.3. By constrast, the OLS results for the mean model are in Table 21.2. Table 21.2: OLS Estimates of Mean Model Linear Regression - Estimation by Least Squares Dependent Variable FFED Monthly Data From 1955:04 To 2000:12 Usable Observations 549 Degrees of Freedom 545 Centered Rˆ2 0.9740 R-Barˆ2 0.9738 Uncentered Rˆ2 0.9942 Mean of Dependent Variable 6.1715 Std Error of Dependent Variable 3.2939 Standard Error of Estimate 0.5331 Sum of Squared Residuals 154.8716 Regression F(3,545) 6792.5340 Significance Level of F 0.0000 Log Likelihood -431.6169 Durbin-Watson Statistic 2.0073 Variable Coeff Std Error T-Stat Signif 1. Constant 0.1284 0.0486 2.6436 0.0084 2. FFED{1} 1.4165 0.0423 33.4520 0.0000 3. FFED{2} -0.5872 0.0696-8.4394 0.0000 4. FFED{3} 0.1509 0.0423 3.5704 0.0004 One thing you will note right away is that there are many fewer summary statistics in the output for the ARCH model. There are a number of reasons for that, the most important of which is that almost all the summary statistics for least squares are interesting only if you are minimizing the sum of squared residuals, which ARCH most definitely is not. 5 The one summary statistic that they have in common is the log likelihood, which is comparable here since the two are estimated over the same sample range. 6 And the log likelihood for the ARCH estimates is much higher (-173.6 vs -431.6). In the GARCH output, the mean model parameters are listed first, followed by the variance parameters. The C in the output is the constant in the variance model (ζ in Hamilton s notation) and the A(1) and A(2) are the coefficients 5 R 2 is based upon a ratio of sums of squares, the standard errors are estimates of a single variance using sums of squares, etc. 6 This is possible because of how RATS handles the pre-sample squared residuals. If the likelihood were were conditional on the lagged residuals, the ARCH estimates would have two fewer observations.

Time Series Models of Heteroskedasticity 68 Table 21.3: ARCH(2) with Gaussian Errors GARCH Model - Estimation by BFGS Convergence in 54 Iterations. Final criterion was 0.0000034 <= 0.0000100 Dependent Variable FFED Monthly Data From 1955:04 To 2000:12 Usable Observations 549 Log Likelihood -173.5741 Variable Coeff Std Error T-Stat Signif 1. Constant 0.2085 0.0228 9.1417 0.0000 2. FFED{1} 1.2601 0.0553 22.8025 0.0000 3. FFED{2} -0.2503 0.0839-2.9847 0.0028 4. FFED{3} -0.0516 0.0381-1.3518 0.1765 5. C 0.0299 0.0040 7.5645 0.0000 6. A{1} 0.8646 0.1271 6.8033 0.0000 7. A{2} 0.3895 0.0835 4.6673 0.0000 on the lagged variances (α 1 and α 2 ). There is one obvious problem with this: α 1 + α 2 is bigger than one much bigger than one. The process has no unconditional variance, and out-of-sample predictions of the variance will rather rapidly increase without limit. 7 If the sum of the two coefficients were only slightly above 1 (such as 1.02), it wouldn t be a serious problem if we did a variance forecast for 1000 data points, the fact that the process was mildly explosive would become obvious, but forecasting 1000 data points given an actual data set with only around 500 is a bad idea. Thus the model we ve just estimated would have to be rejected as a poor description of the data. We will do some standard diagnostics to see if we can figure out how to adjust it. Diagnostics for univariate ARCH models TheGARCH instruction defines the series%resids, which are the y t x t β. We can graph those with graph(footer="arch(2) Residuals") # %resids to produce Figure 21.4. These are only very slightly different from the least squares residuals, and aren t directly very useful for diagnostics for the same reasons the least squares residuals weren t the heteroscedasticity invalidates most diagnostic procedures. Instead, diagnostics for ARCH and GARCH models are generally based upon standardized residuals. The ARCH model provides an estimate of the timevarying variance (h t ) of the residual process. Dividing the model residuals by ht gives a set of residuals which should be: 7 Despite the explosive coefficients, the estimated ARCH process is stationary with infinite variance. This is a peculiarity of ARCH and GARCH models.

Time Series Models of Heteroskedasticity 69 4 2 0-2 -4-6 -8 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Figure 21.4: Residuals from ARCH(2) 1. Serially uncorrelated if our mean model is correct 2. Homoscedastic (actually variance 1.0) if our variance model is correct You can save the h t series on thegarch instruction using thehseries option. In practice, you would just include that right off, but for illustration we left it off the original estimation. However, now we need it, so we change the instruction to: garch(q=2,regressors,hseries=h) / ffed We can compute and graph the standardized residuals with set ustd = %resids/sqrt(h) graph(footer="arch(2) Standardized Residuals") # ustd which produces Figure 21.5. The first indication of a problem with these is the value close to 6.0 in early 1980. If the residual process were Gaussian (which is what is assumed with the options used ongarch), the probability of a value that large is less than one in a million. There are also quite a few others that are 3 or larger in absolute value, which have probablities of roughly.001. So the data are much fatter-tailed than the assumption of Gaussianity would suggest. Three diagnostics to apply to the standardized residuals are: 1. Test for Gaussianity (normality) 2. Test for serial correlation 3. Test for remaining ARCH effects Passing a test for normality isn t strictly required the estimates are still consistent even under broader assumptions as discussed on page 663. However,

Time Series Models of Heteroskedasticity 70 6 4 2 0-2 -4-6 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Figure 21.5: Standardized Residuals from ARCH(2) rejections in the other two will show some inadequacy in the model. The most commonly used test for normality is Jarque-Bera, which is included in the output from the standard STATISTICS instruction. We can do both the tests for serial correlation and for remaining ARCH using the@regcorrs procedure, one applied to the standardized residuals, one to the squared standardized residuals: stats ustd @regcorrs(number=36,dfc=3,nocrits,qstat,$ title="standardized Residuals") ustd disp "Q for Residual Serial Correlation" %qstat $ "significance level" %qsignif set ustd2 = ustdˆ2 @regcorrs(number=36,dfc=2,nocrits,qstat,$ title="standardized Squared Residuals") ustd2 disp "McLeod-Li for Residual ARCH=" %qstat $ "significance level" %qsignif The degrees of freedom corrections are 3 for the test for serial correlation (because of the 3 AR lags) and 2 for the test for residual ARCH (because of the 2 ARCH lags). All three tests are overwhelmingly rejected. The normality test isn t a surprise given the very large standardized residuals. The pattern on the serial correlation is, however, unexpected given the results of the West-Cho test on the least squares residuals, with Figure 21.6 showing a long string of individually significant coefficients starting at lag 1. The difference, however, is that the West-Cho test uses a separate adjustment for heteroscedasticity each lag in the autocorrelation, and the standardized residuals are adjusted using the specific ARCH model.

Time Series Models of Heteroskedasticity 71 Statistics on Series USTD Monthly Data From 1955:04 To 2000:12 Observations 549 Sample Mean 0.103853 Variance 0.991020 Standard Error 0.995500 SE of Sample Mean 0.042487 t-statistic (Mean=0) 2.444346 Signif Level (Mean=0) 0.014826 Skewness 0.102856 Signif Level (Sk=0) 0.326503 Kurtosis (excess) 2.978447 Signif Level (Ku=0) 0.000000 Jarque-Bera 203.895458 Signif Level (JB=0) 0.000000 Q for Residual Serial Correlation 117.53281 significance level 2.06067e-011 McLeod-Li for Residual ARCH= 93.04870 significance level 2.15408e-007 1.00 0.75 0.50 0.25 0.00-0.25-0.50-0.75-1.00 Q= 117.53 P-value 0.00000 5 10 15 20 25 30 35 Figure 21.6: Residual Correlations of Standardized Residuals from ARCH(2) If we hadn t already rejected the model based upon the explosive ARCH process, we would reject it now for the failure of the diagnostics. Both the graph of the standardized residuals and the results from the Jarque-Bera test suggest that the assumption of an ARCH model with Gaussian residuals is untenable. We thus try a different conditional density. Maximum Likelihood Estimation with Non-Gaussian v t The GARCH instruction offers two alternatives to the Normal for the conditional density of the residuals: the Student-t and the generalized error distribution (GED). The t has strictly fatter tails than the Normal, which is the limit distribution as the degrees of freedom ν. The GED family includes both fatterand thinner-tailed densities. Of the two, the t is much more commonly used. However, you have to be a bit careful in using these, as the formula (21.1) generates the variance of the residuals. The variance of a standardtwithν degrees of freedom is ν/(ν 2), so the t density has to have its scale reparameterized to give the required variance. Because the variance of a t doesn t exist when ν 2, 2 is the lower limit on the degrees of freedom the rescaled likelihood is undefined below that. The model is estimated by adding thedistrib=t option to thegarch instruction. We will also save the variances into a different series than before this one calledht. The results are in Table 21.4

Time Series Models of Heteroskedasticity 72 garch(p=0,q=2,regressors,distrib=t,hseries=ht) / ffed Table 21.4: ARCH(2) with Student-t Errors GARCH Model - Estimation by BFGS Convergence in 58 Iterations. Final criterion was 0.0000079 <= 0.0000100 Dependent Variable FFED Monthly Data From 1955:04 To 2000:12 Usable Observations 549 Log Likelihood -104.6074 Variable Coeff Std Error T-Stat Signif 1. Constant 0.0469 0.0219 2.1406 0.0323 2. FFED{1} 1.3480 0.0486 27.7134 0.0000 3. FFED{2} -0.2122 0.0775-2.7378 0.0062 4. FFED{3} -0.1426 0.0419-3.4031 0.0007 5. C 0.0371 0.0120 3.0795 0.0021 6. A{1} 1.1553 0.4076 2.8343 0.0046 7. A{2} 0.5892 0.2319 2.5406 0.0111 8. Shape 2.8268 0.4038 6.9998 0.0000 The new model gives a dramatic improvement to the likelihood over the ARCH(2) with Gaussian errors, now -104.6 versus -173.6 before. However, the ARCH process is even more unstable than before. Rather than worry too much about this, we will move on to the (much) more commonly used GARCH processes, in hopes that a GARCH will work better. GARCH Models The ARCH(m) process is different in behavior from the (apparently) similar MA(q) process. In the MA(q), correlation is zero for data separated by more than q periods. One might think that the volatility relationship in the ARCH(m) would similarly cut off after m periods. However, that s not the case. The difference is that the building block of the ARCH is u 2 t, which has a non-zero expected value, unlike the zero meanu t used in the MA. Instead, we can rewrite the ARCH process in terms of the zero-mean building blocks u 2 t h t. If we look at the ARCH(2) process, we can rearrange this to h t = Eu 2 t = ζ +α 1 (u 2 t 1 h t 1 )+α 2 (u 2 t 2 h t 2 )+α 1 h t 1 +α 2 h t 2 (21.3) Assuming the α are positive (the model makes little sense if they aren t), then shocks which increase the variance going forward are ones where u 2 s h s > 0, that is, a residual is bigger than one standard deviation according to the ARCH recursion. The problem, in practice, with the low order ARCH model is that the variance processes often seem to be fairly persistent. Now, (21.3) can show a high degree of persistence if α 1 and α 2 are non-negative and sum to a number near (but

Time Series Models of Heteroskedasticity 73 less than) one, due to the behavior of the second-order difference equation in h. The problem is that those same fairly large α coefficients also show up in the transient terms u 2 s h s. As a result, the lower order ARCH seems to require overstating the immediate impact of a shock in order to get the persistence correct. Bringing the impact down requires a longer ARCH process, but that runs into the problem faced by Engle in his original paper even an ARCH(4) process is likely to have negative lag coefficients if run unconstrained, since they are squeezed by the need to sum to less than one to keep the process stable. 8 The GARCH (Generalized ARCH) process of Bollerslev (1986) corrects the problem by directly including a persistence term for the variance. This very quickly supplanted the ARCH model to the extent that ARCH models themselves are rarely used except in specialized situations (like switching models) where the GARCH recursion is hard to handle. While it s possible to have higher orders, the vast majority of empirical work with GARCH uses a 1,1 model, which means one ARCH term on the lagged squared residual and one term on the lagged variance: h t = ζ +α 1 u 2 t 1 +β 1 h t 1 (21.4) If we use the same technique of replacing u 2 t 1 with itself minus h t 1, we get h t = ζ +α 1 (u 2 t 1 h t 1 )+(α 1 +β 1 )h t 1 (21.5) Now, for stability of the variance we need (α 1 +β 1 ) < 1, but with the GARCH the value of that is largely decoupled from the coefficient on (u 2 t 1 h t 1 ) we can have a persistent variance process with a large β 1 and small α 1. The one technical issue with fitting GARCH models is that, while the ARCH variance formula (21.1) can be computed exactly given the data andβ, the variance in (21.4) depends upon an unobservable presample value for h. Different ways of handling this give different likelihoods and thus different estimators. The RATS GARCH instruction uses for pre-sample h the same ˆσ 2 value used for presample u 2 s in (21.2). Thus h 1 = ζ + (α 1 +β 1 )ˆσ 2, with all later values being computed using (21.4). If β 1 is close to one, the results can be sensitive to the handling of the presample, 9 so if you have some concern, you can experiment with other values using the option PRESAMPLE, which feeds in a value to use for ˆσ 2. 8 Again, a decided difference between ARCH and ARMA processes: there is nothing wrong with negative coefficients in ARMA models because they apply to data which can take either sign, while in the ARCH the recursions apply to positive data and have to produce positive values. 9 If we do two parallel recursions onh t and h t which differ only in the pre-sample value, then h t = ζ +α 1 u 2 t 1 +β 1 h t 1 and h t = ζ +α 1 u 2 t 1 +β 1 ht 1. Since the first two terms are common to the expressions, h t h t = β 1 (h t 1 h t 1 ) so the difference declines geometrically at the rate β 1.

Time Series Models of Heteroskedasticity 74 Estimation of GARCH models is in Example 21.3. You estimate a GARCH(1,1) model with the GARCH instruction with options P=1 (number of lagged variance parameters) and Q=1 (number of lagged squared residuals), along with other options as needed. In our case: garch(p=1,q=1,regressors,hseries=h) / ffed which produces Table 21.5. The log likelihood is much better than the corresponding ARCH(2) model (Table 21.3). The model is still unstable (α 1 + β 1 1.08), though that s much better than we saw with the ARCH model. Table 21.5: GARCH(1,1) with Gaussian Errors GARCH Model - Estimation by BFGS Convergence in 62 Iterations. Final criterion was 0.0000011 <= 0.0000100 Dependent Variable FFED Monthly Data From 1955:04 To 2000:12 Usable Observations 549 Log Likelihood -142.4728 Variable Coeff Std Error T-Stat Signif 1. Constant 0.1317 0.0272 4.8482 0.0000 2. FFED{1} 1.3443 0.0520 25.8604 0.0000 3. FFED{2} -0.2838 0.0807-3.5161 0.0004 4. FFED{3} -0.0857 0.0480-1.7844 0.0744 5. C 0.0061 0.0015 4.0911 0.0000 6. A 0.4960 0.0676 7.3349 0.0000 7. B 0.5832 0.0378 15.4471 0.0000 The standardized residuals (Figure 21.7) aren t quite as extreme as for the ARCH, but are still far too fat-tailed to be conditionally Normal. So again, we re-estimate with t errors, with results in Table 21.6. 5.0 2.5 0.0-2.5-5.0 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Figure 21.7: Standardized Residuals from GARCH(1,1) The standard diagnostics for the GARCH with t distributed errors are

Time Series Models of Heteroskedasticity 75 Table 21.6: GARCH(1,1) with t errors GARCH Model - Estimation by BFGS Convergence in 84 Iterations. Final criterion was 0.0000043 <= 0.0000100 Dependent Variable FFED Monthly Data From 1955:04 To 2000:12 Usable Observations 549 Log Likelihood -91.3657 Variable Coeff Std Error T-Stat Signif 1. Constant 0.0400 0.0193 2.0771 0.0378 2. FFED{1} 1.3191 0.0502 26.3016 0.0000 3. FFED{2} -0.2015 0.0753-2.6743 0.0075 4. FFED{3} -0.1237 0.0416-2.9719 0.0030 5. C 0.0090 0.0040 2.2826 0.0225 6. A 0.7927 0.2699 2.9373 0.0033 7. B 0.5347 0.0617 8.6701 0.0000 8. Shape 2.8765 0.4629 6.2138 0.0000 Statistics on Series USTD Monthly Data From 1955:04 To 2000:12 Observations 549 Sample Mean 0.031867 Variance 0.773440 Standard Error 0.879455 SE of Sample Mean 0.037534 t-statistic (Mean=0) 0.849016 Signif Level (Mean=0) 0.396243 Skewness 0.041049 Signif Level (Sk=0) 0.695366 Kurtosis (excess) 6.104147 Signif Level (Ku=0) 0.000000 Jarque-Bera 852.490771 Signif Level (JB=0) 0.000000 Q for Residual Serial Correlation 55.92241 significance level 0.00761 McLeod-Li for Residual ARCH= 32.72032 significance level 0.53028 As with the ARCH model, shifting to the t produces an even more unstable model than is estimated with the Gaussian errors. We can look at the estimates of the volatility using set stddev = sqrt(h) graph(footer="garch(1,1) Standard Deviation Estimate") # stddev This graphs the standard deviations rather than variances (Figure 21.8). A graph of the variance itself would show as effectively zero relative to the spike in the early 1980 s. The blip at the beginning is an artifact of using the overall sample variance for the pre-sample value obviously, the full-sample value isn t representative of the early part of the sample. It s clear from looking at this that there is only one really major clustering of large residuals, in the period 1980-1982, which is while the Federal Reserve was switching policy targets to combat inflation. We can try to fix our GARCH model (which so far hasn t produced a satisfactory model) by including a variance shift dummy for this period. This is done using thexregressors option, which puts regressors into the variance equation. If, as here, you need both mean regressors and variance regressors, you list the mean regressors on the first supplementary line.

Time Series Models of Heteroskedasticity 76 7 6 5 4 3 2 1 0 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Figure 21.8: Standard Deviations from GARCH(1,1) set policy = t>=1980:1.and.t<=1982:12 garch(p=1,q=1,regressors,xregressors,dist=t,hseries=h) / ffed # policy The results are in Table 21.7. The shift dummy is quite large (.6367 versus the standard variance intercept of just.0107) but statistically insignificant the log likelihood really isn t that much better. This might seem surprising except that the whole point of the GARCH model is to explain just these sorts of clusters without the dummy the first one or two large residuals might be a surprise but the next three years aren t. It s possible for the type of data that we see here to be generated by a stationary GARCH process with (slightly) explosive roots. Nelson (1990) showed that you can have a stationary GARCH process with infinite variance if (α 1 + β 1 ) > 1, particularly if β 1 is well less than one. The intuition behind this is that the mode of the residual process remains zero even if the variance is large. If much of the persistence comes from the u 2 t 1 term, the process can rather quickly reset itself if you just get a few shocks in a row near zero. If you simulate one of these types of explosive-yet-stationary processes, you will see very occasional clusters of huge variances surrounded by long periods of (relative) calm. On the other hand, it s possible that this interest rates series is governed by a change in regime, and an attempt to model it with a single GARCH across the whole sample is unrealistic. For instance, set test 1 500 = %if(t>=250.and.t<=300,%ran(10.0),%ran(1.0)) @archtest test garch(p=1,q=1) / test will generate a series which seems to have ARCH even though it simply has two variance regimes.

Time Series Models of Heteroskedasticity 77 Table 21.7: GARCH(1,1) with Variance Shift Dummy GARCH Model - Estimation by BFGS Convergence in 91 Iterations. Final criterion was 0.0000027 <= 0.0000100 Dependent Variable FFED Monthly Data From 1955:04 To 2000:12 Usable Observations 549 Log Likelihood -89.7676 Variable Coeff Std Error T-Stat Signif 1. Constant 0.0400 0.0201 1.9881 0.0468 2. FFED{1} 1.3234 0.0474 27.9432 0.0000 3. FFED{2} -0.2052 0.0730-2.8095 0.0050 4. FFED{3} -0.1241 0.0416-2.9813 0.0029 5. C 0.0107 0.0038 2.8092 0.0050 6. A 0.7467 0.0647 11.5389 0.0000 7. B 0.4848 0.0582 8.3320 0.0000 8. POLICY 0.6367 0.5182 1.2286 0.2192 9. Shape 3.0479 0.4071 7.4870 0.0000 GARCH models are much more frequently applied to financial returns data than to macroeconomic data, and are usually done with data observed with greater frequency (weekly, daily or even finer). One problem with a monthly macro series like the interest rates is that the data are almost always averaged across the period, which leads to problems with the timing of events. 10 A spike in interest rates coming near the end of the month will barely affect the reporting period in which it actually occurred since the average will be dominated by the lower values early in the month. Thus, the data you have will show the surprise one month later, which could upset the timing of the GARCH model. 10 Monthly interest rate series in macroeconomic databases are almost always reported as monthly averages of daily values.

Time Series Models of Heteroskedasticity 78 Example 21.1 ARCH Model: Preliminaries This selects the mean model and tests for ARCH effects in the residuals. cal(m) 1955 open data fedfund.rat data(format=rats) 1955:1 2000:12 graph(footer=$ "Figure 21.1 U.S. federal funds rate (monthly averages)") # ffed Choose lag length using @ARAutoLags @ARAutoLags(crit=bic,maxlags=12,table) ffed The "mean model" used throughout this is a third order AR with an intercept. linreg ffed Test for residual serial correlation with a standard Q and a Q adjusted for possible heteroscedasticity. @regcorrs(number=36,qstat,footer="residuals from AR(3)") @westchotest(number=36) %resids Check for ARCH effects in the residuals by regressing the squared residuals on their lags. This is described on page 664. The exclusion tests would seem to indicate that ARCH(2) is appropriate. set u = %resids set u2 = uˆ2 linreg u2 # constant u2{1 to 4} exclude(title="arch Test: F Variant") # u2{1 to 4} exclude # u2{3 4} cdf(title="arch Test: Chi-Squared Variant") chisqr %trsquared 4 Using the ARCH test procedure to do tests for all lags from 1 to 4. @ARCHtest(lags=4,span=1) u McLeod-Li test @McLeodLi(number=36) u @regcorrs(number=36,nocrits,$ title="correlations of Squared Residuals") u2

Time Series Models of Heteroskedasticity 79 Example 21.2 ARCH Model: Estimation This estimates an ARCH model and does diagnostics on the results. cal(m) 1955 open data fedfund.rat data(format=rats) 1955:1 2000:12 ARCH(2) with Gaussian errors garch(p=0,q=2,regressors,hseries=h) / ffed graph(footer="arch(2) Residuals") # %resids set ustd = %resids/sqrt(h) graph(footer="arch(2) Standardized Residuals") # ustd Diagnostics stats ustd @regcorrs(number=36,dfc=3,nocrits,qstat,$ title="standardized Residuals") ustd disp "Q for Residual Serial Correlation" %qstat $ "significance level" %qsignif set ustd2 = ustdˆ2 @regcorrs(number=36,dfc=2,nocrits,qstat,$ title="standardized Squared Residuals") ustd2 disp "McLeod-Li for Residual ARCH=" %qstat $ "significance level" %qsignif ARCH(2) with t-distributed errors garch(p=0,q=2,regressors,distrib=t,hseries=ht) / ffed ARCH(2) using QMLE methods. This uses the Gaussian errors but corrects the covariance matrix for misspecification. garch(p=0,q=2,regressors,robusterrors) / ffed Example 21.3 GARCH Model: Estimation This estimates an GARCH model and does diagnostics on the results. cal(m) 1955 open data fedfund.rat data(format=rats) 1955:1 2000:12

Time Series Models of Heteroskedasticity 80 GARCH(1,1) garch(p=1,q=1,regressors,hseries=h) / ffed set ustd = %resids/sqrt(h) graph(footer="garch(1,1) Standardized Residuals") # ustd Diagnostics stats ustd @regcorrs(number=36,dfc=3,nocrits,qstat,$ title="standardized Residuals") ustd disp "Q for Residual Serial Correlation" %qstat $ "significance level" %qsignif set ustd2 = ustdˆ2 @regcorrs(number=36,dfc=2,nocrits,qstat,$ title="standardized Squared Residuals") ustd2 disp "McLeod-Li for Residual ARCH=" %qstat $ "significance level" %qsignif GARCH(1,1) with t errors garch(p=1,q=1,regressors,dist=t,hseries=h) / ffed Diagnostics set ustd = %resids/sqrt(h) stats ustd @regcorrs(number=36,dfc=3,nocrits,qstat,$ title="standardized Residuals") ustd disp "Q for Residual Serial Correlation" %qstat $ "significance level" %qsignif set ustd2 = ustdˆ2 @regcorrs(number=36,dfc=2,nocrits,qstat,$ title="standardized Squared Residuals") ustd2 disp "McLeod-Li for Residual ARCH=" %qstat $ "significance level" %qsignif set stddev = sqrt(h) graph(footer="garch(1,1) Standard Deviation Estimate") # stddev Add variance shift dummy set policy = t>=1980:1.and.t<=1982:12 garch(p=1,q=1,regressors,xregressors,dist=t,hseries=h) / ffed # policy