Lecture 8: Multivariate GARCH and Conditional Correlation Models

Lecture 8: Multivariate GARCH and Conditional Correlation Models Prof. Massimo Guidolin 20192 Financial Econometrics Winter/Spring 2018

Overview Three issues in multivariate modelling of CH covariances Naïve models that extend univariate models to covariances Full Multivariate GARCH, part 1: VECH GARCH Full Multivariate GARCH, part 2: BEKK GARCH Constant Conditional Correlations (CCC) models Dynamic Conditional Correlations (DCC) models Hints to Estimation and Inferences 2

Motivation: Finance is Multivariate Most relevant and realistic applications in empirical finance are multivariate, that is, they involve N 2 assets/securities/portfolios 1 2 o Collect the N returns in the Nx1 vector, RR tt+1 [RR tt+1 RR tt+1 RR NN tt+1 ] o Then CH is about a matrix of second moments, that is, variances and covariances: o All variances are collected on the main diagonal (they are N), while all covariances are collected off the main diagonal (they are N(N -1)/2) o Because, VVVVVV[RR tt+1 ] is NxN symmetric o See 20135, ptf. choice lectures for examples of applications In the same way in which we have developed CH models for σσ tt+1 tt we now perform the same operation for VVVVVV tt [RR tt+1 ] o The models will be multivariate in nature o They imply a need to model and forecast conditional covariances 3 2

Three Issues with Multivariate VarCov Modelling This goal raises 3 issues: 1 Because there are N + N(N 1)/2 = N(N + 1)/2 moments to be estimated, the availability of sufficiently long-time series may be an issue o E.g., with 15 assets in a ptf. as common in asset mgmt need (i) 15 volatility and (ii) 15x14/2=105 covariance forecasts, for a total of 120 o Even when variances and covariances are constant, with 15 series of returns and 120 parameters 120/15 = 8 data points per series o Yet, this is a unit saturation ratio! To get a ratio of 20, you need 160 data points per series, i.e., 7+ months of daily data, 3+ years of weekly data, 13+ years of monthly data, 40 years of quarterly data o These data requirements are moderate, but not negligible for OTC instruments or recently floated stocks in the aftermath of IPOs o Try the example before with 100 assets you will need to estimate 100 + 100x99/2=5,050 parameters o Because the size of VVVVVV[RR tt+1 ] grows as a function of N 2, the size of the estimation problem and the data requirements grow as a quadratic function of the number of assets (i.e., very quickly) 4

Three Issues with Multivariate VarCov Modelling o To some extent models help in reducing the estimation burden, through multivariate CH models, concerning VVVVVV tt [RR tt+1 ] 2 But models face a unique problem: unless restrictions are imposed, it is difficult to guarantee that VVVVVV tt [RR tt+1 ] be a positive definite (PD) matrix o A PD matrix VVVVVV tt [RR tt+1 ] is one such that for any Nx1 vector (of weights) ωω, the ptf. Return variance ωω VVVVVV tt [RR tt+1 ]ωω > 0 o A PD matrix positive variances and all correlations < 1 3 Many CH models themselves are often over-parameterized and characterized by low saturation ratios o Models react to 1), however face issue 2) unless they are rich, issue 3) Early idea (late 1980s): each element of VVVVVV tt [RR tt+1 ] can modelled separately and using the same models as in lecture 6 o For instance, rolling window moving avg. the avg. of the cross-residual products for a pair i j o Alternative idea that has had some impact on the practice of risk mgmt consists of extending RiskMetrics, with possibly set at 0.94 5

Simple Models Compared: Stock-Bond Correlation o Why not GARCH for covariances? o RiskMetrics is a non-stationary GARCH, unconditional covariance fails to exist, and if tomorrow s covariance is high (low) then it is predicted to remain high (low), rather than reverting back to its mean o Suppose you are in charge of modelling and forecasting the dynamics of both covariances and correlations between weekly returns on US value-weighted stock index and 10-year Treasury notes o The sample is January 1982-December 2016 o Claim that stocks and bonds are negatively correlated is simplistic Estimates on residuals Estimates on stdzed residuals 6

Simple Models Compared: Stock-Bond Correlation o In all cases, correlations computed for each asset using predicted standard deviations generated by a GARCH(1,1) o The estimate of λ is higher than the classical 0.94, in excess of 0.99 o Without restrictions (left panel), the RiskMetrics forecast goes below -1 in a few weeks of 2013 the resulting VVVVVV tt [RR tt+1 ] is not PD o One way to correct problem is to estimate RiskMetrics on stdzed residuals (right panel) + predict correlations in RW case with RW stdev The example shows already one case of non-pd predicted VVVVVV tt [RR tt+1 ] Unfortunately, unless α ij = α and β ij = β for all possible pairs i, j (ω ij is allowed to depend on the pair i and j), even though σσ iiii,tt+1 tt can be anyway predicted, when one organizes such predictions into a covariance matrix VVVVVV tt [RR tt+1 ], this is not guaranteed to be PD o Lecture notes provide a heuristic proof referred to RiskMetrics o Note that the restriction applies to both variances and covariances, e.g., in the GARCH(1,1) case: 7

Stock-Bond Correlation under Restricted GARCH o There is nothing special about a GARCH(1,1), and this can be extended to more general GARCH(p, q) or RiskMetrics(q) structures o Unfortunately, the restrictions α ij = α and β ij = β are often contrary to the evidence in the data o For the stock-bond case, fitting a threshold GARCH(1,1) to capture any asymmetries (also in dynamic covariances), we obtain Asymmetries 65 2 o The common persistence index is 0.985 o The asymmetry term means that when shocks had a different sign one period ago, this increases the negative impact of the current product of shocks on the predicted covariance, a sort of accelerator effect 8

Stock-Bond Correlation under Restricted GARCH Need to impose restrictions paves the way to adoption of complex fully-fledged multivariate GARCH models Key concept: as N grows, such models become over-parameterized and difficult to estimate, even though they allow PD-restrictions 9

Fully-Fledged Multivariate GARCH In this lecture only a pair of examples to be treated as such In a N-dimensional generalization, our framework is: o MD indicates a generic multivariate density parameterized by θ D, such that the multivariate normal or the multivariate t-student o is the square-root, or Cholesky decomposition, of the covariance matrix, such that o This matrix is in no way the matrix of square roots of elements of the full covariance matrix (how would deal with negative covariances?) A first, highly parameterized multi-garch is the VECH-GARCH: where vech( ) ( vector half ) converts the unique upper triangular elements of a symmetric matrix into a 0.5N(N 1) column vector that removes any duplicates, e.g.: 10

VECH GARCH Model o In an unrestricted VECH model, each element of ΣΣ tt+1 tt is a linear function of the lagged squared errors, the cross-products of past errors for all assets, and the lagged values of the elements of ΣΣ tt tt 1 o A and B are 0.5N(N-1)-dimensional square matrices, whereas C is an NxN symmetric matrix, for a total number of parameters equal to: Grows at the same speed as o E.g., for N = 100, a VECH-GARCH(1,1) model has 51,010,050 parameters to be estimated! o If you need a saturation ratio of at least 20 20x51,010,050/100 = 10,202,010 obs. per series or a daily history >> 40,484 yrs per series! o More generally, VECH-GARCH(p, q) models, that naively generalize GARCH to the multivariate case, tend to generate a serious curse of dimensionality problem o Moreover, worse, many of such O(N 4 ) parameters need to be restricted for them to yield forecasts of the covariance matrix that are SP 11

o But then conditional variances depend only on own lags and own lagged squared residuals, and conditional covariances depend only on own lags and own lagged cross products of residuals o This is restrictive because it prevents the detection of any causality in variance, that past shocks to some variable forecast variances of others o Even the diagonal GARCH framework results in O(N 2 ) parameters o Other issue: the coefficients are not restricted to be the same across different assets and pairs constraints will have to be imposed 12 VECH GARCH Model o Tricks such as covariance targeting are often invoked to deal with the curse of dimensionality in VECH GARCH: o Appropriate restrictions on A and B reduction in the number of parameters, e.g., a diagonal multivariate VECH GARCH, A and B diagonal o Algebra shows that then each element of the varcov matrix follows:

Stock-Bond Correlation under VECH GARCH o Appendix A gives a taste for such conditions in a VECH ARCH(1) case o We now apply VECH-GARCH models to investigate the dynamics of the correlation between stock and bond returns o We specify a restricted bivariate VAR(1) as a conditional mean model, in which only lagged bond returns forecast both stocks and bonds o Using BIC, a full C diagonal t-student VECH GARCH(1,1) is preferred: t-student parameter v not a problem, uncond. covariance can be <0 o Conditional variances react to shocks much more (0.074 and 0.062) than conditional covariances do (0.028) o After printing ML estimates and their standard errors, E-Views warns us that Coefficient matrix is not SPD, and this is a reason for concern 13

BEKK (Baba-Engle-Kraft-Kroner) GARCH Because of the limitations of VECH, during the 1990s, one multivariate GARCH model surged to popularity, Engle and Kroner s (1995) BEKK (p, q): Stock-bond corr under VECH GARCH t+1-i t+1-i Non-negative NxN matrices 14

BEKK (Baba-Engle-Kraft-Kroner) GARCH The special sandwich structure of the coefficient matrices guarantees that ΣΣ tt+1 tt is (semi-)pd without imposing other restrictions o The popular BEKK that many empiricists have come to appreciate is a simpler (1,1) diagonal BEKK that restricts the matrices A and B BEKK models possess three attractive properties: 1 When symmetry of A and B is imposed, a BEKK is a truncated, lowdimensional application of a theorem by which all nonnegative, symmetric NxN matrices (say, M) can be decomposed as: for appropriately selected vectors m k,j 2 BEKK ensures (S)PD-ness of ΣΣ tt+1 tt, because by construction, the sandwich form and outer vector products have this property 3 BEKK is invariant to linear combinations, i.e., if R t+1 follows a BEKK GARCH(p, q), then any ptf. of the N assets in R t+1 will also follow a BEKK, see lecture notes for examples and counterexamples under VECH ARCH o However, the number of parameters in BEKK remains rather large 15

Stock-Bond Correlation under BEKK o In the application to US stock and bond returns, BIC leads to select a t- Student diagonal BEKK(1,1): t-student parameter v o No p-values for second moments as products are in sandwich form 16

Conditional Correlation (DCC and CCC) Models o Stationarity and moment convergence criteria for multi-garch are complex and explicit results are only available for a few special cases In the practice of risk and asset management, most chattering is about time-varying correlations and not covariances o E.g., to debate evidence that correlations increase during crises On the one hand, as already exploited in many examples, obvious that given any type of model to forecast covariances and variances, one can compute the implied dynamic (prediction of) correlation: On the other hand, fruitful to directly model correlations although there is one obvious problem, to forecast Any dynamic estimator implies a need to constrain parameters Engle (2002) offers a nifty trick: appealing to model an appropriate auxiliary variable, q ij,t+1, than correlations directly o An auxiliary variable is a by-product of modeling and estimation that has no direct meaning but that can be used in subsequent steps 17

Conditional Correlation (DCC and CCC) Models o DCC approach is based on a generalization of the standard result that to matrices: o Here DD tt+1 is an NxN matrix of predicted standard deviations, σ i,t+1, the ith element of the diagonal and 0 everywhere else o ΓΓ tt+1 tt is a matrix of predicted correlations, ρ ij,t+1 t with 1s on its main diagonal, for instance: The key step of the DCC approach is based on the ability to disentangle the estimation and prediction of DD tt+1 from the estimation and prediction of ΓΓ tt+1 tt ; we proceed in two steps: 1 The volatility of each asset is estimated/predicted through a GARCH or one of the other methods considered in lectures 6-7 o E.g., NAGARCH(p, q) for some assets and GARCH(1,1) for others o Homoskedasticity (constant variance is not ruled out) 2 Model the conditional covariances of the resulting standardized residuals, z i,t+1 i,t+1 /σ i,t+1 t, derived from the first-step GARCH 18

o Complex, asymmetric or nonlinear (e.g., power) GARCH are possible o These models apply to all pairs of assets even when i = j To go from a forecast of auxiliary variable q ij,t+1 to correlations use : 19 Conditional Correlation (DCC and CCC) Models o We exploit the fact that the conditional covariance of the standardized residuals equals the conditional correlation of original residuals: GARCH-type modeling in the second step will not directly concern the covariance of stdzed residuals, but an auxiliary variable q ij,t+1 Typically, the most popular model used in this second DCC step is: a GARCH(1,1) for the auxiliary variable, written in deviations from the unconditional, long-run mean correlation, An alternative is a RiskMetrics-type model: Common across assets

Conditional Correlation (DCC and CCC) Models o This transformation guarantees that o The RiskMetrics/DCC and GARCH/DCC models can be written in matrix form as: o Q t+1 is an NxN symmetric (S)PD matrix that collects the values/predictions of the auxiliary variables q ij,t+1 : o Q t+1 is (S)PD because it is a weighted average of (S)PD and positive definite matrices o This will ensure that the correlation matrix ΓΓ tt+1 tt and the covariance matrix, ΣΣ tt+1 tt will be (S)PD o When i = j, q ij,t+1 in general differs from σ 2 i,t+1 from the first step this represents the approximation burden of two-step DCC estimation Covariance stationarity of all the GARCH processes that populate DD tt+1 along with covariance stationarity of the (matrix) process for QQ tt+1 are sufficient for a DCC model to be weakly stationary and for unconditional variances, covariances, and correlations to exist 20

Stock-Bond Correlation under RiskMetrics DCC 21

Stock-Bond Correlation under TARCH(1,1,1) DCC 22

Estimation and Inference One restricted case of DCC had appeared already in 1990: Bollerslev s constant conditional correlation (CCC) multivariate model, which is a DCC with constant correlation matrix, o This model simply avoids defining and modeling with GARCH-type processes the q ij,t+1 auxiliary variable Multivariate GARCH and DCC estimation applies (Q)ML to jointly estimate the parameters of (conditional) mean variance equations o E.g., in the multivariate normal case, the log-likelihood contributions (i.e., the PDF values for each of the sample observations) is: o The asymptotic properties of ML (and QML) estimators in multivariate GARCH models are not yet firmly established: while consistency has been proven, asymptotic normality of the QMLE is not established o DCC and CCC models are estimated by QMLE by construction: because the model is implemented in three different steps, even though in each of these stages a log-likelihood function is written and maximized o QML efficiency loss derives from treating z i,t+1 as data and not estimates 23

Appendix A: Positive Definiteness in VECH ARCH(1) 24

Appendix A: Positive Definiteness in VECH ARCH(1) 25

Appendix A: Positive Definiteness in VECH ARCH(1) 26

Appendix A: Positive Definiteness in VECH ARCH(1) 27