Very preliminary, please do not cite without permission. ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES LAURENT A. F.

Size: px

Start display at page:

Download "Very preliminary, please do not cite without permission. ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES LAURENT A. F."

Muriel Harrison
5 years ago
Views:

1 Very preliminary, please do not cite without permission. ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES LAURENT A. F. CALLOT VU University Amsterdam, The Netherlands, CREATES, and the Tinbergen Institute. ANDERS B. KOCK CREATES, Aarhus University, Denmark. MARCELO C. MEDEIROS Department of Economics, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil. Abstract. We consider forecasting large realized covariance matrices by penalized vector autoregressive models. Keywords: Realized covariance; vector autoregression; shrinkage; Lasso; forecasting; portfolio allocation. JEL codes: C22 1. Introduction This paper deals with modeling and forecasting large time-varying covariance matrices of daily returns on financial assets. Modern portfolio selection as well as risk management and empirical asset pricing strongly rely on precise forecasts of the covariance matrix of the assets involved. For instance,the traditional mean-variance approach of Markowitz requires the estimation or modeling of all variances and covariances, leading to unstable results when applied to a large set of assets. The evolution of financial markets increases the number of assets, leading the traditional approach to be less suitable to be used by practitioners. Typical multivariate ARCHtype models fail to deliver reliable estimates due the curse of dimensionality and large computational burden. Possible solutions frequently used in practice are (1) a weighted-average of past squared returns as in the Riskmetrics methodology or (2) the construction of factor models. In this paper we will take a different route and will consider the estimation of a vast vector autoregressive models for realized covariance matrices. To avoid the curse of dimensionality we advocate the use of the Least Absolute Shrinkage and Selection Operator (LASSO). The contributions of this paper are as follows. First, we put forward a methodology to model and forecast a large time-varying realized covariance matrices with a addresses: l.callot@vu.nl, akock@creates.au.dk, mcm@econ.puc-rio.br. Parts of the research for this paper were done while the first and second authors were visiting the Department of Economics at the Pontifical Catholic University of Rio de Janeiro, Brazil. Its hospitality is gratefully appreciated. MCM s research is partially supported by the CNPq/Brazil. 1

2 2 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS minimum number of restrictions. Our method can also shed some light on the drivers of the dynamics of these realized covariance matrices as the Lasso also does variable selection. Second, we derive an upper bound for the forecast error which is valid even in finite samples. Third, we show how this bound translates into a bound for the forecast error of the time-varying variance of a portfolio constructed with this large number of assets. Finally, we apply our methodology to the selection of a portfolio with mean-variance preferences. The rest of the paper is organized as follows. Section 2 describes the problem setup, defines notation, and briefly presents the Lasso and some key assumptions. In Section 3 we present some theoretical results. The dataset and computations issues are discussed in Section 4. The empirical results are presented in Section 5. Finally, Section?? concludes the paper. 2. Setup In this section we put forward our methodology and present a finite sample upper bound on the forecast error of our procedure. Let Σ t denote n T n T population conditional covariance matrix as of time t when conditioning on the σ-field σ({σ s : s < t}). Note that the dimension n T of Σ t is indexed by the sample size T. This reflects the fact that n T may be large compared to T and hence standard asymptotics which take dimension n T as a fixed number may not accurately reflect the actual performance in finite samples. Since Σ t is allowed to depend on its past it is a function of many variables. Defining y t = vech Σ t, we shall assume that it follows a vector autoregression of order p T, i.e., (1) y t = p T i=1 Φ i y t i + ɛ t, t = 1,..., T where Φ i, i = 1,..., p T are the k T k T dimensional parameter matrices with k T = n T (n T + 1)/2 and ɛ t N nt (0, Ω). Note that the dimension k T of the parameter matrices increases quadratically in n T. So even for conditional covariance matrices Σ t of a moderate dimension the number of parameters in (1) may be very large. Hence, standard estimation techniques such as least squares may provide very imprecise parameter estimates or even be infeasible if the number of variables is greater than the number of observations. To circumvent this problem we use the Least Absolute Shrinkage and Selection Operator (Lasso) of Tibshirani (1996) which is feasible even when the number of parameters to be estimated is (much) larger than the sample size. We suppress the dependence of n T, k T and p T on T to simplify notation. As mentioned in the introduction we are concerned with stationary VARs, meaning that the roots of I k p j=1 Φ jz j lie outside the unit circle. Equivalently, all roots of the companion matrix F must lie inside the unit disc. Let ρ (the dependence on T is suppressed) denote this largest root. It is convenient to write the model in stacked form. To do so let Z t = (y t 1,..., y t p) be the kp 1 vector of explanatory variables at time t in each equation and X = (Z T,..., Z 1 ) the T kp matrix of covariates for each equation. Let y i = (y T,i,..., y 1,i ) be the T 1 vector of observations on the ith variable (i = 1,..., k) and ɛ i = (ɛ T,i,..., ɛ 1,i ) the corresponding vector of error terms. The fact that y i inherits the gaussianity from the error terms is particularly useful since this means that y i has slim tails. Finally, βi is the kp dimensional parameter vector of true parameters for equation i which also implicitly depends on T. Hence, we may write (1) equivalently

3 as (2) ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 3 y i = Xβ i + ɛ i, i = 1,..., k such that each equation in the (1) may be modeled separately. Or, taking one step back, each element in Σ t is modeled as in (2). The length of β i, namely kp, may be much greater than the sample size if the original conditional covariance matrix Σ t is large. If, for example, n = 30 one has k = 465 which amounts to a total number of parameters per equation of 2325 if p = 5. As a consequence, traditional methods such as least squares will be inadequate in such situations and we will turn to the Lasso instead Notation. Let J i = {j : βi,j 0} {1,..., kp} denote the set of non-zero parameters in equation i and s i = J i its cardinality. s = max {s 1,..., s k } and let Ψ T = 1 T X X be the kp kp scaled Gramian of X. For any x R m, x = m i=1 x2 i, x l 1 = m i=1 x i and x l = max 1 i m x i denote l 2, l 1 and l norms, respectively (most often m = kp or n = s i in the sequel). When regarding the m m matrix A as a linear operator from R m to R m equipped with either the l 1 - or the l 2 -norm, A and A l1 denote the induced operator norms. A shall denote the maximum absolute entry of A. Note that it is not induced by the l -norm. For any vector δ in R n and a subset J {1,..., n} we shall let δ J denote the vector consisting only of those elements of δ indexed by J. For any two real numbers a and b, a b = max(a, b) and a b = min(a, b) Let σi,y 2 denote the variance of y t,i and σi,ɛ 2 the variance of ɛ t,i, 1 i k. Then define σ T = max 1 i k (σ i,y σ i,ɛ ) The Lasso. The LASSO was proposed by Tibshirani (1996). Its theoretical properties have been studied intensively since then, see e.g. Zhao and Yu (2006), Meinshausen and Bühlmann (2006), Bickel et al. (2009), and Bühlmann and Van De Geer (2011) to mention just a few. It is known that it only selects the correct model asymptotically under rather restrictive conditions on the dependence structure of the covariates. However, we shall see that it can still serve as an effective screening device in these situations. Put differently, it can remove many irrelevant covariates while still maintaining the relevant ones and estimating the coefficients of these with high precision. We investigate the properties of the LASSO when applied to each equation i = 1,..., k separately. The LASSO estimates βi in (2) by minimizing the following objective function (3) L(β i ) = 1 T y i Xβ i 2 + 2λ T β i l1 where λ T is a sequence to be defined exactly below. (3) is basically the least squares objective function plus an extra term penalizing parameters that are different from zero The restricted eigenvalue condition. If kp > T the Gram matrix Ψ T is singular, or equivalently, (4) δ Ψ T δ min δ R kp \{0} δ 2 = 0 In that case ordinary least squares is infeasible. However, for the LASSO Bickel et al. (2009) observed that the minimum in (4) can be replaced by a minimum over a much

4 4 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS smaller set. The same is the case for the LASSO in the VAR since we have written the VAR as a regression model. In particular we shall make use of the restricted eigenvalue condition { } κ 2 δ Ψ T δ (5) Ψ T (s i ) = min δ R : 2 R s i, δ R kp \ {0}, δ R c l1 3 δ R l1 > 0 where R {1,..., kp} and R is its cardinality. Note that instead of minimizing over all of δ R kp \ {0} as in (4) the minimum in (5) is restricted to those vectors satisfying δ R c l1 3 δ R l1. As a result, κ 2 Ψ T (r) can be positive even when the Rayleigh-Ritz ratio in (4) is zero. Note that whenever Ψ T has full rank κ 2 Ψ T (s i ) will be positive. Letting Γ = E(Ψ T ) = E(Z t Z t) denote the population covariance matrix we similarly define { } κ 2 i = κ 2 δ Γδ (6) Γ(s i ) = min δ R : 2 R s i, δ R kp \ {0}, δ R c l1 3 δ R l1 > 0 We shall assume throughout that Γ has full rank which implies that κ 2 i > 0. This is a rather standard assumption which is independent of whether T > kp or not. For more details on the restricted eigenvalue condition we refer to Kock and Callot (2012). 3. Theoretical Results Letting w R n denote a set of portfolio weights the true conditional variance of the portfolio is given by while the estimated variance is σ 2 t = w Σ t w ˆσ 2 t = w ˆΣt w As a consequence, one might be interested in measuring the precision of ˆΣ t by considering how much ˆσ 2 and σ 2 t deviate from each other. In the presence of an upper bound on the positions one may take this can be done by bounding ˆΣ t Σ t l2. The following theorem makes this claim precise Theorem 1. Assume that w l2 c for some c > 0. Then, 2 ˆσ t σt 2 ˆΣt Σ t c 2 Hence, in the presence of a restriction on the positions one can take, an upper bound on ˆΣ t Σ t implies an upper bound on the distance between ˆσ t 2 and σt 2. We shall next ( give an upper bound ) on ˆΣ t Σ t. To this end we define π q (s) = 4k 2 p 2 ζt exp + 2(k 2 p 2 ) 1 log(t ) for ζ = s 2 i log(t )(log(k2 p 2 )+1) With this notation in place we have the following theorem. (1 q) 2 κ 4 i ( Γ T i=0 F i ) 2. Theorem 2. Let λ T = 8 ln(1 + T ) 5 ln(1 + k) 4 ln(1 + p) 2 ln(k 2 p)σt 4 /T and 0 < q < 1. Then with probability at least 1 2(k 2 p) 1 ln(1+t ) 2(1 + T ) 1/A π q ( s) 2[k(p + 1)] 1 ln(t ) one has ( ˆΣT +1 Σ 16 ) T +1 2σT 2 ln(k(p + 1)) ln(t ) s qκ 2 i λ T + 1 i

5 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 5 Theorem 2 gives an upper bound on the forecast error of ˆΣ T +1 which is valid even in finite samples. Note that even if we new the true parameter vector βi we could never expect a forecast error which tends to zero since the error terms ɛ T +1 are unforecastable. By combining Theorems 1 and 2 one may achieve the following upper bound on the forecast error of ˆσ T 2. Corollary 1. Under the assumptions of Theorems 1 and 2 one has that ( 2 ˆσ T +1 σt 2 16 ) +1 2σT 2 ln(k(p + 1)) ln(t ) s i λ T + 1 c 2 Corollary 1 provides a finite sample upper bound on the error of the forecast of the portfolio variance under a short selling constraint. Note that this short selling constraint is the only restriction we place on the portfolio weights. 4. Computations A first section describes the practical implementation of the forecasts, the following section discusses variable selection using the Dow-Jones data, and the final section presents forecast results Data. The data we use consists of 437 stocks of the S&P 500, with a total of 1465 daily observations from 2006 to The realized covariances are constructed from 5 minutes returns. thank and cite asger. We consider two transformations of the data both aimed at ensuring that the fitted and forecasted covariance matrices have positive diagonal after reversion of the transformation: log-covariance transformation: take the logarithm of the variances and do not transform the covariances. This transformation has the effect of smoothing the variance series relative to the covariance series. log matrix transformation: compute the matrix logarithm of the covariance matrix. The reverse transformation, the matrix exponential, ensures that the resulting matrix is positive semi-definite and smooths both diagonal and off-diagonal element. The drawback is that under this transformation the diagonal and off-diagonal parameters cannot be interpreted as pertaining to variances and covariances respectively Censoring. The sample we consider covers the financial crisis of 2008 as well as flash-crashes in 2010 and These events lead to very correlated return leading to many extreme values in the stock return correlation series. The Lasso is fragile to these kind of outliers since it works under normality assumptions. In practice, we flag for censoring every day in which more than 25% of the upper diagonal of the entries of the covariance matrix are over 4 of the series standard deviations away from their sample average. These observations are replaced by an average of the nearest 5 preceding and following non-flagged observations. Using this censoring, the flagged observations are concentrated in october 2008, the flash crashs of 2010 and 2011 are also flagged implementation. All the computations are carried using R and the lassovar package which is a wrapper for glmnet, and implementation of the coordinate descent algorithm of Friedman et al. (2010) The VARs are estimated equation by equation using the Bayesian Information Criterion to select the penalty parameter. qκ 2 i

6 6 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS 5. Empirical results This section reports our empirical results, the first part focuses on the variable selection pattern by the different versions of the Lasso. The second part reports forecast results Variable selection. In this section and the next, we focus on the 30 stocks belonging to the dow jones. These stocks can be classified in 7 broad categories highlighted in table 1. Basic Comms Consumer Consumer Energy Financial Industrial Technology Materials Cyclical Non-cyclical Table 1. Number of stocks per category, 30 Dow Jones stocks. We estimate over 400 models using a training sample with a rolling window of 1000 observations. In the tables below we report the average (across data samples) number of variables from a given category (in rows) selected in equations for stocks belonging to a given category (in columns.) The sums are also divided between diagonal (D, the variances) and off diagonal (O, the covariances) equations and covariates. The five tables below all report results for a VAR(1) estimated by Lasso or the adaptive Lasso using ols or lasso as initial estimator. These models are in tables 2, 4, and 5, using the log-variance transformation on censored data. Table 3 considers the Lasso estimator on log-matrix transformed censored data. Finally table 6 considers the Lasso on un-censored data. Let s consider table 2 as our benchmark model. The selected model for the diagonal equations is very sparse for most categories. It is striking that the model selected for off-diagonal equations contains many off-diagonal covariates, this is partly due to the large number of potential off-diagonal covariates. When considering the log-matrix transformation in 3 the number of selected variables is similar to the benchmark model except for off-diagonal covariates of off-diagonal equations where fewer covariates are selected and a clear diagonal pattern emerges. Using the adaptive Lasso with OLS as a first step estimator, table 4 results in a model that is more sparse than the benchmark model, except again for the off-diagonal covariates of off-diagonal equations where many more covariates are selected. This is in sharp contrast with the results obtained using the Lasso as an initial estimator, table 5 where the model is overall more sparse than the benchmark model. Finally when considering uncensored data in table 6 notice that the models for diagonal equations are comparable to those of the benchmark models whereas the off-diagonal equation models present a very different pattern. Very few diagonal covariates are selected, on the other hand a very large number, larger than with any of the other models considered, of off diagonal covariates is selected. The wide fluctuations in the pattern selection of off diagonal covariates in offdiagonal equations across models can be explained by considering the large number of very noisy off diagonal covariates the Lasso has to select from. Furthermore, large market wide shocks leads to sharp increases in the covariances of stock returns that are broadly correlated across covariances. These large correlated shocks to the covariances are partially eliminated by censoring or smoothed by the log matrix transform which implies relatively fewer variables selected. Using the Lasso instead

7 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 7 Consumer, Cyclical Technology Energy Industrial Communications Financial Consumer, Non-cyclical Basic Materials Diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Off-diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Table 2. Number of variables selected by category. VAR(1) estimated by Lasso, censored data, the log-variance transformation. of the OLS leads to a first step screening which seems to help the second step lasso perform variable selection.

8 8 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS Consumer, Cyclical Technology Energy Industrial Communications Financial Consumer, Non-cyclical Basic Materials Diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Off-diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Table 3. Number of variables selected by category. VAR(1) estimated by Lasso, censored data, the log-matrix transformation.

9 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 9 Consumer, Cyclical Technology Energy Industrial Communications Financial Consumer, Non-cyclical Basic Materials Diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Off-diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Table 4. Number of variables selected by category. VAR(1) estimated by adaptive Lasso using OLS as initial estimator, censored data, the log-variance transformation.

10 10 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS Consumer, Cyclical Technology Energy Industrial Communications Financial Consumer, Non-cyclical Basic Materials Diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Off-diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Table 5. Number of variables selected by category. VAR(1) estimated by adaptive Lasso using the Lasso as initial estimator, censored data, the log-variance transformation.

11 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 11 Consumer, Cyclical Technology Energy Industrial Communications Financial Consumer, Non-cyclical Basic Materials Diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Off-diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Table 6. Number of variables selected by category. VAR(1) estimated by Lasso, un-censored data, the log-variance transformation Forecasts. The forecasts are computed recursively for horizons greater than 1. All forecast errors are computed based on the de-transformed forecasts.

12 12 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS We consider 3 levels of aggregation of the data: daily, weekly and monthly. The daily forecasts are computed using a rolling window of 1000 observations leading to 437 forecasts. The weekly models are estimated using 263 observations and the monthly models using 60 observations, which results in 52 and 12 forecasts respectively. We forecast with Vector autoregressive (VAR) models, autoregressive (AR) models, and random walk (No Change) models on both transformations (lcov, lmat) of the data. The estimators for the VARs are either the Lasso, the adaptive Lasso (with OLS or Lasso as initial estimator) and OLS. The AR models are estimated only using OLS. Key to the tables. The forecast tables below report a number of statistics for forecasts computed with several models. below we detail these statistics. Let t := t 0 + h where t 0 is the last observation and h the horizon, and ɛ h t 0 be the vector of forecast errors at horizon h forecasted from time t 0. Primary column header. beat bmk: Frequency at which the absolute forecast error of a given model is lower than the corresponding absolute forecast error the benchmark. The benchmark model is the one for which this statistic is reported as NA. pft risk: the difference between the forecasted and realized risk of an equal weight portfolio: rskt h 0 := w Σ t w w ˆΣt0 +hw. Buy n Hold: the cumulative pft risk bh H t 0 := H h=1 rskh t 0. N Frobenius: the frobenius norm of the forecast error i,j=1 (σ ij,t ˆσ ij,t0 +h) 2. Med SFE: median square forecast error MedSF E h = 1 T T t 0 =1 med(ˆɛ2h t 0 ). RMSFE: RMSF E h = 1 T T t 0 =1 mean(ˆɛ 2h t 0 ). MaxSFE: MaxSF E h = 1 T T t 0 =1 max(ˆɛ2h t 0 ). Secondary column header. h: the forecast horizon. A: the full matrix. O: off diagonal. D: diagonal. Colors. Green: No change forecasts. Blue: cens data. Red: autoregressive models estimated by OLS. Interpretation of the tables. Table 7 and 8 report results for No Change forecasts, forecasts from VAR(1) models estimated using the Lasso, the adaptive Lasso, and OLS, and AR forecasts estimated by OLS. The models are evaluated on cens and un-censord date, using either the lmat or lcov transformations. Note that for No Change forecasts the results are identical for both transformations since the errors are based on de-transformed forecasts. The first stricking results is that models estimated on uncensored data tend to be explosive with the lcov transformation but not with the lmat transformation. VARs estimated by OLS tend to be explosive even using censored data with the lcov transformation. This is further evidence that both the OLS and the Lasso are sensitive to extreme observations. When the data is smoother, as is the case with

13 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 13 Buy&Hold ptf risk frobenius Med SFE RMSFE MaxSFE Model h A A A D O D O A No Change, cens No Change, cens No Change, cens No Change, cens No Change, cens No Change, un-cens No Change, un-cens No Change, un-cens No Change, un-cens No Change, un-cens Var(1), Lasso, cens, lcov Var(1), Lasso, cens, lcov Var(1), Lasso, cens, lcov Var(1), Lasso, cens, lcov Var(1), Lasso, cens, lcov Var(1), Lasso, cens, lmat Var(1), Lasso, cens, lmat Var(1), Lasso, cens, lmat Var(1), Lasso, cens, lmat Var(1), Lasso, cens, lmat Var(1), Lasso, un-cens, lcov Var(1), Lasso, un-cens, lcov Var(1), Lasso, un-cens, lcov. 10 -Inf -Inf Inf Inf Inf Var(1), Lasso, un-cens, lcov. 25 -Inf -Inf Inf Inf Inf Inf Inf Inf Var(1), Lasso, un-cens, lcov. 50 -Inf -Inf Inf Inf Inf Inf Inf Inf Var(1), Lasso, un-cens, lmat Var(1), Lasso, un-cens, lmat Var(1), Lasso, un-cens, lmat Var(1), Lasso, un-cens, lmat Var(1), Lasso, un-cens, lmat Var(1), alasso (ols), cens, lcov Var(1), alasso (ols), cens, lcov Var(1), alasso (ols), cens, lcov Var(1), alasso (ols), cens, lcov. 25 -Inf -Inf Inf Inf Inf Inf Inf Var(1), alasso (ols), cens, lcov. 50 -Inf -Inf Inf Inf Inf Inf Inf Inf Var(1) alasso (lasso), lcov, cens Var(1) alasso (lasso), lcov, cens Var(1) alasso (lasso), lcov, cens Var(1) alasso (lasso), lcov, cens Var(1) alasso (lasso), lcov, cens Table 7. Summary statistics daily forecasts, 1000 observation training sample, h-step ahead recursive forecasts. All statistics are averaged across forecast iterations. censoring and with the lmat transformation, the models are stable and often out perform the No Change benchmark. Table 9 reports results for weekly aggregated data. At this frequency stability of the VARs is no longer an issue, and in this setting the Lasso and it variants consistently outperform the No Change forecasts. In particular models estimated using the lcov transformation provide the most accurate forecasts of the covariance matrix resulting in less risky portfolios even over longer horizons. At a monthly level of aggregation, results in table 10, both transformations are equivalent and dominate (though not uniformaly) No Change forecasts. Note that at both levels of aggregation the short number of observations available relative to the number of parameters of the unrestricted model renders the OLS infeasible.

14 14 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS Buy&Hold ptf risk frobenius Med SFE RMSFE MaxSFE Model h A A A D O D O A AR(1), ols, cens, lcov AR(1), ols, cens, lcov AR(1), ols, cens, lcov AR(1), ols, cens, lcov AR(1), ols, cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(5), ols, cens, lcov AR(5), ols, cens, lcov AR(5), ols, cens, lcov AR(5), ols, cens, lcov AR(5), ols, cens, lcov VAR(1), ols, cens, lcov VAR(1), ols, cens, lcov Inf VAR(1), ols, cens, lcov. 10 -Inf -Inf Inf Inf Inf Inf VAR(1), ols, cens, lcov. 25 -Inf -Inf Inf Inf Inf Inf Inf Inf VAR(1), ols, cens, lcov. 50 -Inf -Inf Inf Inf Inf Inf Inf Inf VAR(1), ols, cens, lmat VAR(1), ols, cens, lmat VAR(1), ols, cens, lmat VAR(1), ols, cens, lmat VAR(1), ols, cens, lmat VAR(1), ols, un-cens, lcov VAR(1), ols, un-cens, lcov. 5 -Inf -Inf Inf Inf Inf Inf Inf VAR(1), ols, un-cens, lcov. 10 -Inf -Inf Inf Inf Inf Inf Inf Inf VAR(1), ols, un-cens, lcov. 25 -Inf -Inf Inf Inf Inf Inf Inf Inf VAR(1), ols, un-cens, lcov. 50 -Inf -Inf Inf Inf Inf Inf Inf Inf Table 8. Summary statistics daily forecasts, 1000 training observations, h-step ahead recursive forecasts, all statistics are averaged across forecast iterations. 6. Conclusions and Further Work In this paper we considered modeling and forecasting large realized covariance matrices. Our approach was based on the estimation of a large vector autoregressive model by the least absolute shrinkage and selection operator which, simultaneously, shrinks irrelevant parameters towards zero and conducts variable selection. Therefore, we avoided problems related to the curse of dimensionality which abound in the related literature. We also derived upper bounds for the forecast error. In an empiracal application focused on 30 stocks of the Dow Jones industrial average we evaluated the performance of the Lasso and Adaptive Lasso at different levels of aggregation, Compared to random walk forecasts and OLS (when feasible) forecasts, out empirical applications shows that our methodology is promising in that it provides better forecasts than the benchmarks even at long horizons.

15 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 15 Buy&Hold ptf risk frobenius Med SFE RMSFE MaxSFE Model h A A A D O D O A No Change, un-cens, lcov No Change, un-cens, lcov No Change, un-cens, lcov No Change, un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lmat VAR(1), alasso (lasso), un-cens, lmat VAR(1), alasso (lasso), un-cens, lmat VAR(1), alasso (lasso), un-cens, lmat VAR(1), Lasso, cens, lcov VAR(1), Lasso, cens, lcov VAR(1), Lasso, cens, lcov VAR(1), Lasso, cens, lcov VAR(1), Lasso, cens, lmat VAR(1), Lasso, cens, lmat VAR(1), Lasso, cens, lmat VAR(1), Lasso, cens, lmat VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lmat VAR(1), Lasso, un-cens, lmat VAR(1), Lasso, un-cens, lmat VAR(1), Lasso, un-cens, lmat AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat Table 9. Summary statistics for weekly aggregated data, 263 training observations, h-step ahead recursive forecasts, all statistics are averaged across forecast iterations. 7. Appendix Proof of Theorem 1. By the definition of σt 2 and ˆσ t 2 one has 2 ˆσ t σt 2 w = (ˆΣ t Σ t )w (ˆΣ t Σ t )w w l1 l ˆΣt Σ t w 2 l 1 n ˆΣt Σ t w 2 l 2 ˆΣt Σ t c 2 Before proving Theorem?? we recall the following result which is an extract of Theorem 2 in Kock and Callot (2012). Lemma 1 (Theorem 2 in Kock and Callot (2012)). Let λ T = 8 ln(1 + T ) 5 ln(1 + k) 4 ln(1 + p) 2 ln(k 2 p)σt 4 /T and 0 < q < 1. Then with

16 16 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS Buy&Hold ptf risk frobenius Med SFE RMSFE MaxSFE Model h A A A D O D O A No Change, un-cens, lcov No Change, un-cens, lcov No Change, un-cens, lcov No Change, un-cens, lmat No Change, un-cens, lmat No Change, un-cens, lmat VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lmat VAR(1), alasso (lasso), un-cens, lmat VAR(1), alasso (lasso), un-cens, lmat VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lmat VAR(1), Lasso, un-cens, lmat VAR(1), Lasso, un-cens, lmat AR(1), Lasso, un-cens, lcov AR(1), Lasso, un-cens, lcov AR(1), Lasso, un-cens, lcov.var AR(1), Lasso, un-cens, lmat AR(1), Lasso, un-cens, lmat.var AR(1), Lasso, un-cens, lmat Table 10. Summary statistics for monthly aggregated data, 60 training observations, h-step ahead recursive forecasts, all statistics are averaged across forecast iterations. probability at least 1 2(k 2 p) 1 ln(1+t ) 2(1+T ) 1/A π q (s i ) the following inequalities hold for all i = 1,..., k for some positive constant A. (7) ˆβi βi l1 16 s qκ 2 i λ T i Furthermore, all the above statements hold on one and the same set which has probability at least 1 2(k 2 p) 1 ln(1+t ) 2(1 + T ) 1/A π q ( s). Proof of Theorem??. Since ˆΣT +1 Σ T +1 = vech ˆΣT +1 vech Σ T +1 = ŷt +1 T y T +1 l we shall bound each entry of ŷ T +1 T y T +1. By assumption while y T +1,i = Z T +1β i + ɛ T +1 T,i ŷ T +1 T,i = Z T +1 ˆβ i such that yt +1,i ŷ T +1 T,i Z = T +1 (βi ˆβ i ) + ɛ T +1,i ZT +1 l ˆβi βi l1 + ɛt +1,i Using Lemma 1 this yields that y T +1,i ŷ T +1 T,i 16 Z T +1 l s qκ 2 i λ T + ɛ T +1,i for all i = 1,..., k i

17 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 17 with probability at least 1 2(k 2 p) 1 ln(1+t ) 2(1 + T ) 1/A π q ( s). Next, note that by the gaussianity of the covariates and error terms P ( y T l,i x) 2e x2 /2σT 2 for all 1 i k and 1 l p and P ( ɛ T +1,i x) 2e x2 /2σT 2 for all 1 i k. This implies P ( Z T +1 l max 1 i k ɛ T +1,i L) 2kpe L2 /2σ 2 T + 2ke L 2 /2σ 2 T = 2k(p + 1)e L 2 /2σ 2 T Choosing L 2 = 2σT 2 ln(k(p + 1)) ln(t ) yields (8) P ( Z T +1 l max 1 i k ɛ T +1,i L) 2[k(p + 1)] 1 ln(t ) and so ( yt +1,i ŷ T +1 T,i 16 ) 2σT 2 ln(k(p + 1)) ln(t ) s qκ 2 i λ T + 1 for all i = 1,..., k i with probability at least 1 2(k 2 p) 1 ln(1+t ) 2(1+T ) 1/A π q ( s) 2[k(p+1)] 1 ln(t ). References Bickel, P. J., Y. Ritov, and A. B. Tsybakov (2009). Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics 37 (4), Bühlmann, P. and S. Van De Geer (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer-Verlag, New York. Friedman, J., T. Hastie, and R. Tibshirani (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 (1), Kock, A. and L. Callot (2012). Oracle inequalities for high dimensional vector autoregressions. Aarhus University, CREATES Research Paper 16. Meinshausen, N. and P. Bühlmann (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34, Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), Zhao, P. and B. Yu (2006). On model selection consistency of lasso. The Journal of Machine Learning Research 7,

MODELING AND FORECASTING LARGE REALIZED COVARIANCE MATRICES AND PORTFOLIO CHOICE

MODELING AND FORECASTING LARGE REALIZED COVARIANCE MATRICES AND PORTFOLIO CHOICE Laurent A.F. Callot VU University Amsterdam, CREATES, and the Tinbergen Institute E-mail: l.callot@vu.nl Anders B. Kock