Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models Prof. Massimo Guidolin 019 Financial Econometrics Winter/Spring 018

Overview ARCH models and their limitations Generalized ARCH models and the reasons of their success Integrated GARCH Exponential GARCH and asymmetric effects in volatility

ARCH Models The key idea of ARCH is that conditional forecasts are generally vastly superior to unconditional forecasts Because the model is based on the decomposition and σσ tt+1 tt is time-varying provided that at least one of the coefficients α 1, α,, α p is positive, by Jensen s inequality, we have (*) Because ARCH(p) generates a symmetric return distribution to integrate to 1 the inflated tails must be compensated by the absence of probability mass in the intermediate range, ARCH captures the leptokurtic nature of asset returns 14

ARCH Models ARCH captures volatility clustering: large past squared innovations will lead to large forecasts of subsequent variance when all or most α 1, α,, α p coefficients are positive and non-negligible However, ARCH models cannot capture the existence of asymmetric reaction of conditional variance to positive vs. negative shocks ARCH(p), in particular ARCH(1) differs from RiskMetrics in ways: 1 It features no memory for recent, past variance forecasts It features a constant α 0 that was absent in RiskMetrics o When we set α 0 = 0 and α i = 1/W, then an ARCH(W) model simply becomes a rolling window variance model o Appendix A collects the moments and key properties of a ARCH(1) o Algebra in this Appendix establishes that long-run, ergodic variance from ARCH(p) is: Even though conditional variance changes over time, the model can be (covariance) stationary and the unconditional variance exists 15

ARCH Models: One Example Even though ARCH represents progress vs. simple rolling window, one limitation: in many, their specification is richly parameterized o Given the empirical success of RiskMetrics, a need to pick a large p does not come as surprise, because such a selection obviously surrogates the role played by σσ tt tt 1 on the RHS of RiskMetrics Consider 1963-016 CRSP stock excess daily returns o SACF/SPACF and information criteria analyses suggest a MA(1) mean p-values o SACF/SPACF of squared residuals give evidence of AR(5) at least 16

ARCH Models: One Example o BIC criterion for MA(1)/ARCH(6) is.446 and for MA(1)/ARCH(5) is.4513; therefore, we select the former model o Such a BIC is lower vs..8070 from a homoskedastic MA(1) model o The MA(1)/ARCH(6) model estimated by ML in Eviews is o Each of the estimated coefficients is positive and statistically significant and their sum is 0.784 which establishes stationarity o One wonders if a more parsimonious way can be found 17

Are ARCH Models Enough? ARCH models are not set up or estimated to imply unconditional variance = sample variance and this may be embarrassing One constraint often imposed in estimation is variance targeting: o It guarantees that ARCH(p) yields unconditional = sample variance How do assess whether a CH model is adequate for a given application/data sets? 1 If a CH model is correctly specified, then the standardized residuals from the model should reflect any assumptions made when the model has been specified and estimated o E.g., from, check whether holds o Testable by using (i = 1,, ) for sensible choices of the functions h: R R and g:r R (not necessarily identical) A good CH model should accurately predict future variance 18

Are ARCH Models Enough? o What does it means that a CH models yield good forecasts? A requirement is that on average the realized squared residuals must equal the variance forecasts that a model offers: White noise o Empirically, it implies that two simple restrictions must be satisfied in the regression o a = 0 and b = 1, jointly (when this occurs, σσ tt+1 tt offers an unbiased predictor of squared residuals, used as a proxy of realized variance) o The regression R must be large o However, this test of predictive performance may be fallacious: the process εε tt invariably provides a poor proxy for the process followed by the true but unobserved time-varying variance, σσ tt o This follows from 19

Are ARCH Models Enough? 4 o When either σσ tt+1 tt (hence, σσ tt+1 tt ) or the kurtosis of the stdz. residuals are high, VVVVVV[εε tt+1 ] will be large, and using squared residuals to proxy instantaneous variances exposes a researcher to a lot of noise o This choice is almost guaranteed to yield low regression R o We shall examine a few suggested remedies later on o Compare ARCH and RiskMetrics for daily stock returns, 1963-016 o ARCH(6) forecasts are spikier o However, there are significant departures from IID-ness in the squared stdz. residuals from both models 0

Are ARCH Models Enough? o All forecasting power of past squared residuals is well captured o It seems that past US equities losses lead to subsequent higher variance, the leverage effect o Predictive accuracy regressions give (std errors in parentheses): o Crucial to report standard errors and not p-values because the simple null hypothesis of b = 1 requires that we calculate the t ratios: o The null of a = 0 may be rejected with p-values close to 0.000 o Given individual rejections, pointless to apply F-tests of joint hypothesis 1

Generalized ARCH Models o Although R s are not irrelevant, positive significant estimates of intercepts predicted variance is too low vs. realized variance o The two slope coefficients significantly less than 1 realized variance moves over time less vs. what is predicted Because of its limitation, ARCH has soon be generalized from an AR(p)-style model for squared residuals, to ARMA(max[p,q], q): Bollerslev (1987) observed that such a process may be written as

Generalized ARCH Models Key issue of GARCH models is to keep variance forecasts positive o ω > 0, α 1, α,, α p 0, and β 1, β,, β q 0 are only sufficient o Under technical conditions on the lag polynomials characterized by α 1, α,, α p and β 1 β,, β q (provided the roots of the polynomial defined by the βs lie outside the unit circle), positivity constraint is satisfied o As for all ARMA processes, GARCH will be (covariance) stationary if and only if the roots of the characteristic polynomial associated to the coeffs α 1 + β 1, α + β,, α max(p,q) + β max(p,q) lie outside the unit circle o As far strict stationarity goes, in the case of a GARCH(1,1), the condition is sufficient (see Lumsdaine, 1996) o However, because under covariance stationarity, for a GARCH(1,1), covariance stationarity guarantees strict stationarity GARCH is a highly successful and resilient empirical model because with few parameters than a ARCH, it may lead to a more parsimonious representation of volatility clustering 3

GARCH(1,1): The Reasons of Its Success ARCH(p) is simply a GARCH(p,0) model in which there is no memory in the process for past conditional variance predictions Because in forecasting applications it has proven to be very hard to beat (Hansen and Lunde, 005), practitioners usually resort to simple GARCH(1,1) models: o ARMA(1,1) for squared errors o In the case of GARCH(1,1), positivity comes from the restrictions ω > 0, α 0, and β 0 and stationarity from the constraint α + β < 1 o Exploiting equivalence to ARMA(1,1), the stationary long-run variance: Under stationarity, Wold s theorem GARCH(1,1) is a sample variance that downweights distant lagged squared errors: -1-1 -1 4

GARCH(1,1): The Reasons of Its Success o The reason for the success of GARCH(1,1) over complex ARCH(p) with relatively large p is that GARCH(1,1) can be shown to be equivalent to an ARCH( ) model with a special structure of decaying weights! o GARCH(1,1) is a prediction a weighted average of long-term variance (the constant), most recent forecast (GARCH term), and information about volatility observed in the previous period (ARCH term): o Let s study weekly 198-016 returns on 10-year US Treasury notes o A BIC-based specification leads to a simple AR(1) mean model o SACF and SPACF of squared residuals show evidence of ARMA 5

GARCH(1,1): A Fixed Income Example o Attempt to use ARCH leads to a large, possibly ARCH(11) specification o GARCH(1,1) offers best trade-off between simplicity and in-sample fit p-values o The sum of the coefficients is 0.983 (covariance) stationarity o Evidence in favor of GARCH(1,1) is strong: SACF of squared stdz residuals is characterized by absence of additional structure o Regression that tests whether GARCH(1,1) can forecast squared residuals gives (standard errors in parentheses): o Intercept is not significant, while o F-test of hypothesis of a = 0, b = 1 gives 1.687 that with (, 18) d.f. implies a p-value of 0.185 and leads to a failure to reject 6

The Persistence of Shocks in GARCH(1,1) Models Although the persistence index of a GARCH(p,q) model is given by different coefficients contribute to increase σσ tt+1 tt in different ways o The larger are the α i s, the larger is the response of σσ tt+1 tt to new information; the larger are the β j s, the longer and stronger is the memory of conditional variance to past (forecasts of) variance For any given persistence index, it is possible for different stationary GARCH models to behave rather differently and therefore yield heterogeneous economic insights o This plot performs simulations on a baseline estimate Sum = 0.984 on monthly UK stock returns, sample period 1977-016 o The volatility scenarios different from solid blue fix the persistence but impute it to alternative α and β 7

Integrated GARCH Model In many applications to high-frequency financial data, the estimate of turns out to be close to 1 Empirical motivation for IGARCH(p,q), model, where = 1 (a unit root in ARMA for conditional variance) Consequently a shock to the conditional variance is infinitely persistent it remains equally important at all horizons IGARCH may be strictly stationary (under appropriate conditions, 0) but is not covariance stationary In the case α + β = 1 α = 1 - β, i.e., IGARCH(1,1), this is no news: This is just RiskMetrics in which λ = β and with an intercept, which establishes that RiskMetrics is not covariance stationary o The long-run variance does not exist o Yet, then RiskMetrics should be generalized to include an intercept and to have ARMA complexity dimensions p and q that should be either estimated or at least selected on the basis of the data 8

9 Exponential GARCH Model Similarly to ARCH, GARCH captures thick-tailed returns and volatility clustering but it is not well suited to capture the leverage effect because σσ tt+1 tt is only a function of εε tt and not of their signs In the exponential GARCH (EGARCH) model of Nelson (1991), llllσσ tt+1 tt depends on both the size and the sign of lagged residuals and therefore can capture asymmetries

Exponential GARCH Model Because σσ tt+1 tt = exp(llllσσ tt+1 tt ) and exp( ) > 0, EGARCH always yields positive variance forecasts without imposing restrictions o is function of both the magnitude and the sign of past standardized residuals, and it allows the conditional variance process to respond asymmetrically to rises and falls in asset prices o It can be rewritten as: o Nelson s EGARCH has another advantage: in a GARCH, the parameter restrictions needed to ensure moment existence become increasingly stringent as the order of the moment grows o E.g., in case of ARCH(1), for an integer r, the rth moment exists if and only if ; for r =, existence of unconditional kurtosis requires α 1 < (1/3) 1/ o In a EGARCH(p,q) case, if the error process η t in the ARMA representation of the model has all moments and, then all moments of an EGARCH process exist How far better can EGARCH fare versus a standard GARCH model? how important are asymmetries in conditional variance? 30

EGARCH and Asymmetries: One Example o Let s return to the 1963-016 CRSP daily stock excess return data o A model specification search based on information criteria in the space of GARCH and EGARCH(p; q) models yields o The ICs select large models: a GARCH(,) in the GARCH family and even a more complex EGARCH(3,3), the latter being preferred: GARCH(1,1) 31

3 EGARCH and Asymmetries: One Example EGARCH(3,3) o This process implies an odd, mixed leverage effect, because negative returns from the previous business day increase predicted variance, but negative returns from two previous business days depress it

EGARCH and Asymmetries: One Example o Although variance forecasts are not radically different, the scatter plot shows that when volatility is predicted to be high, often GARCH(1,1) predicts a higher level than EGARCH(3,3) does o We have tested the two models for their ability to predict squared realized residuals, obtaining: o While in the case of GARCH(1,1) we obtain the same result as before, in the case of EGARCH the R increases but the results on the intercept and slope point towards a rejection of model accuracy A preview of topics to follow: Are alternative, possibly more complex GARCH-structures useful? How do you estimate a GARCH-type model? Is there any gain in specifying εε tt to be anything but IID N(0,σσ tt tt 1 )? How do you test whether your data are affected by GARCH? 33

Appendix A: Key Properties of ARCH(1) 34

Appendix A: Key Properties of ARCH(1) 35

Appendix A: Key Properties of ARCH(1) 36