Very preliminary, please do not cite without permission. ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES LAURENT A. F.

Size: px
Start display at page:

Download "Very preliminary, please do not cite without permission. ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES LAURENT A. F."

Transcription

1 Very preliminary, please do not cite without permission. ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES LAURENT A. F. CALLOT VU University Amsterdam, The Netherlands, CREATES, and the Tinbergen Institute. ANDERS B. KOCK CREATES, Aarhus University, Denmark. MARCELO C. MEDEIROS Department of Economics, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil. Abstract. We consider forecasting large realized covariance matrices by penalized vector autoregressive models. Keywords: Realized covariance; vector autoregression; shrinkage; Lasso; forecasting; portfolio allocation. JEL codes: C22 1. Introduction This paper deals with modeling and forecasting large time-varying covariance matrices of daily returns on financial assets. Modern portfolio selection as well as risk management and empirical asset pricing strongly rely on precise forecasts of the covariance matrix of the assets involved. For instance,the traditional mean-variance approach of Markowitz requires the estimation or modeling of all variances and covariances, leading to unstable results when applied to a large set of assets. The evolution of financial markets increases the number of assets, leading the traditional approach to be less suitable to be used by practitioners. Typical multivariate ARCHtype models fail to deliver reliable estimates due the curse of dimensionality and large computational burden. Possible solutions frequently used in practice are (1) a weighted-average of past squared returns as in the Riskmetrics methodology or (2) the construction of factor models. In this paper we will take a different route and will consider the estimation of a vast vector autoregressive models for realized covariance matrices. To avoid the curse of dimensionality we advocate the use of the Least Absolute Shrinkage and Selection Operator (LASSO). The contributions of this paper are as follows. First, we put forward a methodology to model and forecast a large time-varying realized covariance matrices with a addresses: l.callot@vu.nl, akock@creates.au.dk, mcm@econ.puc-rio.br. Parts of the research for this paper were done while the first and second authors were visiting the Department of Economics at the Pontifical Catholic University of Rio de Janeiro, Brazil. Its hospitality is gratefully appreciated. MCM s research is partially supported by the CNPq/Brazil. 1

2 2 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS minimum number of restrictions. Our method can also shed some light on the drivers of the dynamics of these realized covariance matrices as the Lasso also does variable selection. Second, we derive an upper bound for the forecast error which is valid even in finite samples. Third, we show how this bound translates into a bound for the forecast error of the time-varying variance of a portfolio constructed with this large number of assets. Finally, we apply our methodology to the selection of a portfolio with mean-variance preferences. The rest of the paper is organized as follows. Section 2 describes the problem setup, defines notation, and briefly presents the Lasso and some key assumptions. In Section 3 we present some theoretical results. The dataset and computations issues are discussed in Section 4. The empirical results are presented in Section 5. Finally, Section?? concludes the paper. 2. Setup In this section we put forward our methodology and present a finite sample upper bound on the forecast error of our procedure. Let Σ t denote n T n T population conditional covariance matrix as of time t when conditioning on the σ-field σ({σ s : s < t}). Note that the dimension n T of Σ t is indexed by the sample size T. This reflects the fact that n T may be large compared to T and hence standard asymptotics which take dimension n T as a fixed number may not accurately reflect the actual performance in finite samples. Since Σ t is allowed to depend on its past it is a function of many variables. Defining y t = vech Σ t, we shall assume that it follows a vector autoregression of order p T, i.e., (1) y t = p T i=1 Φ i y t i + ɛ t, t = 1,..., T where Φ i, i = 1,..., p T are the k T k T dimensional parameter matrices with k T = n T (n T + 1)/2 and ɛ t N nt (0, Ω). Note that the dimension k T of the parameter matrices increases quadratically in n T. So even for conditional covariance matrices Σ t of a moderate dimension the number of parameters in (1) may be very large. Hence, standard estimation techniques such as least squares may provide very imprecise parameter estimates or even be infeasible if the number of variables is greater than the number of observations. To circumvent this problem we use the Least Absolute Shrinkage and Selection Operator (Lasso) of Tibshirani (1996) which is feasible even when the number of parameters to be estimated is (much) larger than the sample size. We suppress the dependence of n T, k T and p T on T to simplify notation. As mentioned in the introduction we are concerned with stationary VARs, meaning that the roots of I k p j=1 Φ jz j lie outside the unit circle. Equivalently, all roots of the companion matrix F must lie inside the unit disc. Let ρ (the dependence on T is suppressed) denote this largest root. It is convenient to write the model in stacked form. To do so let Z t = (y t 1,..., y t p) be the kp 1 vector of explanatory variables at time t in each equation and X = (Z T,..., Z 1 ) the T kp matrix of covariates for each equation. Let y i = (y T,i,..., y 1,i ) be the T 1 vector of observations on the ith variable (i = 1,..., k) and ɛ i = (ɛ T,i,..., ɛ 1,i ) the corresponding vector of error terms. The fact that y i inherits the gaussianity from the error terms is particularly useful since this means that y i has slim tails. Finally, βi is the kp dimensional parameter vector of true parameters for equation i which also implicitly depends on T. Hence, we may write (1) equivalently

3 as (2) ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 3 y i = Xβ i + ɛ i, i = 1,..., k such that each equation in the (1) may be modeled separately. Or, taking one step back, each element in Σ t is modeled as in (2). The length of β i, namely kp, may be much greater than the sample size if the original conditional covariance matrix Σ t is large. If, for example, n = 30 one has k = 465 which amounts to a total number of parameters per equation of 2325 if p = 5. As a consequence, traditional methods such as least squares will be inadequate in such situations and we will turn to the Lasso instead Notation. Let J i = {j : βi,j 0} {1,..., kp} denote the set of non-zero parameters in equation i and s i = J i its cardinality. s = max {s 1,..., s k } and let Ψ T = 1 T X X be the kp kp scaled Gramian of X. For any x R m, x = m i=1 x2 i, x l 1 = m i=1 x i and x l = max 1 i m x i denote l 2, l 1 and l norms, respectively (most often m = kp or n = s i in the sequel). When regarding the m m matrix A as a linear operator from R m to R m equipped with either the l 1 - or the l 2 -norm, A and A l1 denote the induced operator norms. A shall denote the maximum absolute entry of A. Note that it is not induced by the l -norm. For any vector δ in R n and a subset J {1,..., n} we shall let δ J denote the vector consisting only of those elements of δ indexed by J. For any two real numbers a and b, a b = max(a, b) and a b = min(a, b) Let σi,y 2 denote the variance of y t,i and σi,ɛ 2 the variance of ɛ t,i, 1 i k. Then define σ T = max 1 i k (σ i,y σ i,ɛ ) The Lasso. The LASSO was proposed by Tibshirani (1996). Its theoretical properties have been studied intensively since then, see e.g. Zhao and Yu (2006), Meinshausen and Bühlmann (2006), Bickel et al. (2009), and Bühlmann and Van De Geer (2011) to mention just a few. It is known that it only selects the correct model asymptotically under rather restrictive conditions on the dependence structure of the covariates. However, we shall see that it can still serve as an effective screening device in these situations. Put differently, it can remove many irrelevant covariates while still maintaining the relevant ones and estimating the coefficients of these with high precision. We investigate the properties of the LASSO when applied to each equation i = 1,..., k separately. The LASSO estimates βi in (2) by minimizing the following objective function (3) L(β i ) = 1 T y i Xβ i 2 + 2λ T β i l1 where λ T is a sequence to be defined exactly below. (3) is basically the least squares objective function plus an extra term penalizing parameters that are different from zero The restricted eigenvalue condition. If kp > T the Gram matrix Ψ T is singular, or equivalently, (4) δ Ψ T δ min δ R kp \{0} δ 2 = 0 In that case ordinary least squares is infeasible. However, for the LASSO Bickel et al. (2009) observed that the minimum in (4) can be replaced by a minimum over a much

4 4 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS smaller set. The same is the case for the LASSO in the VAR since we have written the VAR as a regression model. In particular we shall make use of the restricted eigenvalue condition { } κ 2 δ Ψ T δ (5) Ψ T (s i ) = min δ R : 2 R s i, δ R kp \ {0}, δ R c l1 3 δ R l1 > 0 where R {1,..., kp} and R is its cardinality. Note that instead of minimizing over all of δ R kp \ {0} as in (4) the minimum in (5) is restricted to those vectors satisfying δ R c l1 3 δ R l1. As a result, κ 2 Ψ T (r) can be positive even when the Rayleigh-Ritz ratio in (4) is zero. Note that whenever Ψ T has full rank κ 2 Ψ T (s i ) will be positive. Letting Γ = E(Ψ T ) = E(Z t Z t) denote the population covariance matrix we similarly define { } κ 2 i = κ 2 δ Γδ (6) Γ(s i ) = min δ R : 2 R s i, δ R kp \ {0}, δ R c l1 3 δ R l1 > 0 We shall assume throughout that Γ has full rank which implies that κ 2 i > 0. This is a rather standard assumption which is independent of whether T > kp or not. For more details on the restricted eigenvalue condition we refer to Kock and Callot (2012). 3. Theoretical Results Letting w R n denote a set of portfolio weights the true conditional variance of the portfolio is given by while the estimated variance is σ 2 t = w Σ t w ˆσ 2 t = w ˆΣt w As a consequence, one might be interested in measuring the precision of ˆΣ t by considering how much ˆσ 2 and σ 2 t deviate from each other. In the presence of an upper bound on the positions one may take this can be done by bounding ˆΣ t Σ t l2. The following theorem makes this claim precise Theorem 1. Assume that w l2 c for some c > 0. Then, 2 ˆσ t σt 2 ˆΣt Σ t c 2 Hence, in the presence of a restriction on the positions one can take, an upper bound on ˆΣ t Σ t implies an upper bound on the distance between ˆσ t 2 and σt 2. We shall next ( give an upper bound ) on ˆΣ t Σ t. To this end we define π q (s) = 4k 2 p 2 ζt exp + 2(k 2 p 2 ) 1 log(t ) for ζ = s 2 i log(t )(log(k2 p 2 )+1) With this notation in place we have the following theorem. (1 q) 2 κ 4 i ( Γ T i=0 F i ) 2. Theorem 2. Let λ T = 8 ln(1 + T ) 5 ln(1 + k) 4 ln(1 + p) 2 ln(k 2 p)σt 4 /T and 0 < q < 1. Then with probability at least 1 2(k 2 p) 1 ln(1+t ) 2(1 + T ) 1/A π q ( s) 2[k(p + 1)] 1 ln(t ) one has ( ˆΣT +1 Σ 16 ) T +1 2σT 2 ln(k(p + 1)) ln(t ) s qκ 2 i λ T + 1 i

5 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 5 Theorem 2 gives an upper bound on the forecast error of ˆΣ T +1 which is valid even in finite samples. Note that even if we new the true parameter vector βi we could never expect a forecast error which tends to zero since the error terms ɛ T +1 are unforecastable. By combining Theorems 1 and 2 one may achieve the following upper bound on the forecast error of ˆσ T 2. Corollary 1. Under the assumptions of Theorems 1 and 2 one has that ( 2 ˆσ T +1 σt 2 16 ) +1 2σT 2 ln(k(p + 1)) ln(t ) s i λ T + 1 c 2 Corollary 1 provides a finite sample upper bound on the error of the forecast of the portfolio variance under a short selling constraint. Note that this short selling constraint is the only restriction we place on the portfolio weights. 4. Computations A first section describes the practical implementation of the forecasts, the following section discusses variable selection using the Dow-Jones data, and the final section presents forecast results Data. The data we use consists of 437 stocks of the S&P 500, with a total of 1465 daily observations from 2006 to The realized covariances are constructed from 5 minutes returns. thank and cite asger. We consider two transformations of the data both aimed at ensuring that the fitted and forecasted covariance matrices have positive diagonal after reversion of the transformation: log-covariance transformation: take the logarithm of the variances and do not transform the covariances. This transformation has the effect of smoothing the variance series relative to the covariance series. log matrix transformation: compute the matrix logarithm of the covariance matrix. The reverse transformation, the matrix exponential, ensures that the resulting matrix is positive semi-definite and smooths both diagonal and off-diagonal element. The drawback is that under this transformation the diagonal and off-diagonal parameters cannot be interpreted as pertaining to variances and covariances respectively Censoring. The sample we consider covers the financial crisis of 2008 as well as flash-crashes in 2010 and These events lead to very correlated return leading to many extreme values in the stock return correlation series. The Lasso is fragile to these kind of outliers since it works under normality assumptions. In practice, we flag for censoring every day in which more than 25% of the upper diagonal of the entries of the covariance matrix are over 4 of the series standard deviations away from their sample average. These observations are replaced by an average of the nearest 5 preceding and following non-flagged observations. Using this censoring, the flagged observations are concentrated in october 2008, the flash crashs of 2010 and 2011 are also flagged implementation. All the computations are carried using R and the lassovar package which is a wrapper for glmnet, and implementation of the coordinate descent algorithm of Friedman et al. (2010) The VARs are estimated equation by equation using the Bayesian Information Criterion to select the penalty parameter. qκ 2 i

6 6 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS 5. Empirical results This section reports our empirical results, the first part focuses on the variable selection pattern by the different versions of the Lasso. The second part reports forecast results Variable selection. In this section and the next, we focus on the 30 stocks belonging to the dow jones. These stocks can be classified in 7 broad categories highlighted in table 1. Basic Comms Consumer Consumer Energy Financial Industrial Technology Materials Cyclical Non-cyclical Table 1. Number of stocks per category, 30 Dow Jones stocks. We estimate over 400 models using a training sample with a rolling window of 1000 observations. In the tables below we report the average (across data samples) number of variables from a given category (in rows) selected in equations for stocks belonging to a given category (in columns.) The sums are also divided between diagonal (D, the variances) and off diagonal (O, the covariances) equations and covariates. The five tables below all report results for a VAR(1) estimated by Lasso or the adaptive Lasso using ols or lasso as initial estimator. These models are in tables 2, 4, and 5, using the log-variance transformation on censored data. Table 3 considers the Lasso estimator on log-matrix transformed censored data. Finally table 6 considers the Lasso on un-censored data. Let s consider table 2 as our benchmark model. The selected model for the diagonal equations is very sparse for most categories. It is striking that the model selected for off-diagonal equations contains many off-diagonal covariates, this is partly due to the large number of potential off-diagonal covariates. When considering the log-matrix transformation in 3 the number of selected variables is similar to the benchmark model except for off-diagonal covariates of off-diagonal equations where fewer covariates are selected and a clear diagonal pattern emerges. Using the adaptive Lasso with OLS as a first step estimator, table 4 results in a model that is more sparse than the benchmark model, except again for the off-diagonal covariates of off-diagonal equations where many more covariates are selected. This is in sharp contrast with the results obtained using the Lasso as an initial estimator, table 5 where the model is overall more sparse than the benchmark model. Finally when considering uncensored data in table 6 notice that the models for diagonal equations are comparable to those of the benchmark models whereas the off-diagonal equation models present a very different pattern. Very few diagonal covariates are selected, on the other hand a very large number, larger than with any of the other models considered, of off diagonal covariates is selected. The wide fluctuations in the pattern selection of off diagonal covariates in offdiagonal equations across models can be explained by considering the large number of very noisy off diagonal covariates the Lasso has to select from. Furthermore, large market wide shocks leads to sharp increases in the covariances of stock returns that are broadly correlated across covariances. These large correlated shocks to the covariances are partially eliminated by censoring or smoothed by the log matrix transform which implies relatively fewer variables selected. Using the Lasso instead

7 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 7 Consumer, Cyclical Technology Energy Industrial Communications Financial Consumer, Non-cyclical Basic Materials Diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Off-diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Table 2. Number of variables selected by category. VAR(1) estimated by Lasso, censored data, the log-variance transformation. of the OLS leads to a first step screening which seems to help the second step lasso perform variable selection.

8 8 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS Consumer, Cyclical Technology Energy Industrial Communications Financial Consumer, Non-cyclical Basic Materials Diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Off-diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Table 3. Number of variables selected by category. VAR(1) estimated by Lasso, censored data, the log-matrix transformation.

9 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 9 Consumer, Cyclical Technology Energy Industrial Communications Financial Consumer, Non-cyclical Basic Materials Diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Off-diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Table 4. Number of variables selected by category. VAR(1) estimated by adaptive Lasso using OLS as initial estimator, censored data, the log-variance transformation.

10 10 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS Consumer, Cyclical Technology Energy Industrial Communications Financial Consumer, Non-cyclical Basic Materials Diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Off-diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Table 5. Number of variables selected by category. VAR(1) estimated by adaptive Lasso using the Lasso as initial estimator, censored data, the log-variance transformation.

11 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 11 Consumer, Cyclical Technology Energy Industrial Communications Financial Consumer, Non-cyclical Basic Materials Diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Off-diagonal Basic Materials D Consumer, Non-cyclical D Financial D Communications D Industrial D Energy D Technology D Consumer, Cyclical D Basic Materials O Consumer, Non-cyclical O Financial O Communications O Industrial O Energy O Technology O Consumer, Cyclical O Table 6. Number of variables selected by category. VAR(1) estimated by Lasso, un-censored data, the log-variance transformation Forecasts. The forecasts are computed recursively for horizons greater than 1. All forecast errors are computed based on the de-transformed forecasts.

12 12 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS We consider 3 levels of aggregation of the data: daily, weekly and monthly. The daily forecasts are computed using a rolling window of 1000 observations leading to 437 forecasts. The weekly models are estimated using 263 observations and the monthly models using 60 observations, which results in 52 and 12 forecasts respectively. We forecast with Vector autoregressive (VAR) models, autoregressive (AR) models, and random walk (No Change) models on both transformations (lcov, lmat) of the data. The estimators for the VARs are either the Lasso, the adaptive Lasso (with OLS or Lasso as initial estimator) and OLS. The AR models are estimated only using OLS. Key to the tables. The forecast tables below report a number of statistics for forecasts computed with several models. below we detail these statistics. Let t := t 0 + h where t 0 is the last observation and h the horizon, and ɛ h t 0 be the vector of forecast errors at horizon h forecasted from time t 0. Primary column header. beat bmk: Frequency at which the absolute forecast error of a given model is lower than the corresponding absolute forecast error the benchmark. The benchmark model is the one for which this statistic is reported as NA. pft risk: the difference between the forecasted and realized risk of an equal weight portfolio: rskt h 0 := w Σ t w w ˆΣt0 +hw. Buy n Hold: the cumulative pft risk bh H t 0 := H h=1 rskh t 0. N Frobenius: the frobenius norm of the forecast error i,j=1 (σ ij,t ˆσ ij,t0 +h) 2. Med SFE: median square forecast error MedSF E h = 1 T T t 0 =1 med(ˆɛ2h t 0 ). RMSFE: RMSF E h = 1 T T t 0 =1 mean(ˆɛ 2h t 0 ). MaxSFE: MaxSF E h = 1 T T t 0 =1 max(ˆɛ2h t 0 ). Secondary column header. h: the forecast horizon. A: the full matrix. O: off diagonal. D: diagonal. Colors. Green: No change forecasts. Blue: cens data. Red: autoregressive models estimated by OLS. Interpretation of the tables. Table 7 and 8 report results for No Change forecasts, forecasts from VAR(1) models estimated using the Lasso, the adaptive Lasso, and OLS, and AR forecasts estimated by OLS. The models are evaluated on cens and un-censord date, using either the lmat or lcov transformations. Note that for No Change forecasts the results are identical for both transformations since the errors are based on de-transformed forecasts. The first stricking results is that models estimated on uncensored data tend to be explosive with the lcov transformation but not with the lmat transformation. VARs estimated by OLS tend to be explosive even using censored data with the lcov transformation. This is further evidence that both the OLS and the Lasso are sensitive to extreme observations. When the data is smoother, as is the case with

13 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 13 Buy&Hold ptf risk frobenius Med SFE RMSFE MaxSFE Model h A A A D O D O A No Change, cens No Change, cens No Change, cens No Change, cens No Change, cens No Change, un-cens No Change, un-cens No Change, un-cens No Change, un-cens No Change, un-cens Var(1), Lasso, cens, lcov Var(1), Lasso, cens, lcov Var(1), Lasso, cens, lcov Var(1), Lasso, cens, lcov Var(1), Lasso, cens, lcov Var(1), Lasso, cens, lmat Var(1), Lasso, cens, lmat Var(1), Lasso, cens, lmat Var(1), Lasso, cens, lmat Var(1), Lasso, cens, lmat Var(1), Lasso, un-cens, lcov Var(1), Lasso, un-cens, lcov Var(1), Lasso, un-cens, lcov. 10 -Inf -Inf Inf Inf Inf Var(1), Lasso, un-cens, lcov. 25 -Inf -Inf Inf Inf Inf Inf Inf Inf Var(1), Lasso, un-cens, lcov. 50 -Inf -Inf Inf Inf Inf Inf Inf Inf Var(1), Lasso, un-cens, lmat Var(1), Lasso, un-cens, lmat Var(1), Lasso, un-cens, lmat Var(1), Lasso, un-cens, lmat Var(1), Lasso, un-cens, lmat Var(1), alasso (ols), cens, lcov Var(1), alasso (ols), cens, lcov Var(1), alasso (ols), cens, lcov Var(1), alasso (ols), cens, lcov. 25 -Inf -Inf Inf Inf Inf Inf Inf Var(1), alasso (ols), cens, lcov. 50 -Inf -Inf Inf Inf Inf Inf Inf Inf Var(1) alasso (lasso), lcov, cens Var(1) alasso (lasso), lcov, cens Var(1) alasso (lasso), lcov, cens Var(1) alasso (lasso), lcov, cens Var(1) alasso (lasso), lcov, cens Table 7. Summary statistics daily forecasts, 1000 observation training sample, h-step ahead recursive forecasts. All statistics are averaged across forecast iterations. censoring and with the lmat transformation, the models are stable and often out perform the No Change benchmark. Table 9 reports results for weekly aggregated data. At this frequency stability of the VARs is no longer an issue, and in this setting the Lasso and it variants consistently outperform the No Change forecasts. In particular models estimated using the lcov transformation provide the most accurate forecasts of the covariance matrix resulting in less risky portfolios even over longer horizons. At a monthly level of aggregation, results in table 10, both transformations are equivalent and dominate (though not uniformaly) No Change forecasts. Note that at both levels of aggregation the short number of observations available relative to the number of parameters of the unrestricted model renders the OLS infeasible.

14 14 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS Buy&Hold ptf risk frobenius Med SFE RMSFE MaxSFE Model h A A A D O D O A AR(1), ols, cens, lcov AR(1), ols, cens, lcov AR(1), ols, cens, lcov AR(1), ols, cens, lcov AR(1), ols, cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(5), ols, cens, lcov AR(5), ols, cens, lcov AR(5), ols, cens, lcov AR(5), ols, cens, lcov AR(5), ols, cens, lcov VAR(1), ols, cens, lcov VAR(1), ols, cens, lcov Inf VAR(1), ols, cens, lcov. 10 -Inf -Inf Inf Inf Inf Inf VAR(1), ols, cens, lcov. 25 -Inf -Inf Inf Inf Inf Inf Inf Inf VAR(1), ols, cens, lcov. 50 -Inf -Inf Inf Inf Inf Inf Inf Inf VAR(1), ols, cens, lmat VAR(1), ols, cens, lmat VAR(1), ols, cens, lmat VAR(1), ols, cens, lmat VAR(1), ols, cens, lmat VAR(1), ols, un-cens, lcov VAR(1), ols, un-cens, lcov. 5 -Inf -Inf Inf Inf Inf Inf Inf VAR(1), ols, un-cens, lcov. 10 -Inf -Inf Inf Inf Inf Inf Inf Inf VAR(1), ols, un-cens, lcov. 25 -Inf -Inf Inf Inf Inf Inf Inf Inf VAR(1), ols, un-cens, lcov. 50 -Inf -Inf Inf Inf Inf Inf Inf Inf Table 8. Summary statistics daily forecasts, 1000 training observations, h-step ahead recursive forecasts, all statistics are averaged across forecast iterations. 6. Conclusions and Further Work In this paper we considered modeling and forecasting large realized covariance matrices. Our approach was based on the estimation of a large vector autoregressive model by the least absolute shrinkage and selection operator which, simultaneously, shrinks irrelevant parameters towards zero and conducts variable selection. Therefore, we avoided problems related to the curse of dimensionality which abound in the related literature. We also derived upper bounds for the forecast error. In an empiracal application focused on 30 stocks of the Dow Jones industrial average we evaluated the performance of the Lasso and Adaptive Lasso at different levels of aggregation, Compared to random walk forecasts and OLS (when feasible) forecasts, out empirical applications shows that our methodology is promising in that it provides better forecasts than the benchmarks even at long horizons.

15 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 15 Buy&Hold ptf risk frobenius Med SFE RMSFE MaxSFE Model h A A A D O D O A No Change, un-cens, lcov No Change, un-cens, lcov No Change, un-cens, lcov No Change, un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lmat VAR(1), alasso (lasso), un-cens, lmat VAR(1), alasso (lasso), un-cens, lmat VAR(1), alasso (lasso), un-cens, lmat VAR(1), Lasso, cens, lcov VAR(1), Lasso, cens, lcov VAR(1), Lasso, cens, lcov VAR(1), Lasso, cens, lcov VAR(1), Lasso, cens, lmat VAR(1), Lasso, cens, lmat VAR(1), Lasso, cens, lmat VAR(1), Lasso, cens, lmat VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lmat VAR(1), Lasso, un-cens, lmat VAR(1), Lasso, un-cens, lmat VAR(1), Lasso, un-cens, lmat AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lcov AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat AR(1), ols, un-cens, lmat Table 9. Summary statistics for weekly aggregated data, 263 training observations, h-step ahead recursive forecasts, all statistics are averaged across forecast iterations. 7. Appendix Proof of Theorem 1. By the definition of σt 2 and ˆσ t 2 one has 2 ˆσ t σt 2 w = (ˆΣ t Σ t )w (ˆΣ t Σ t )w w l1 l ˆΣt Σ t w 2 l 1 n ˆΣt Σ t w 2 l 2 ˆΣt Σ t c 2 Before proving Theorem?? we recall the following result which is an extract of Theorem 2 in Kock and Callot (2012). Lemma 1 (Theorem 2 in Kock and Callot (2012)). Let λ T = 8 ln(1 + T ) 5 ln(1 + k) 4 ln(1 + p) 2 ln(k 2 p)σt 4 /T and 0 < q < 1. Then with

16 16 L. CALLOT, A. B. KOCK, AND M. C. MEDEIROS Buy&Hold ptf risk frobenius Med SFE RMSFE MaxSFE Model h A A A D O D O A No Change, un-cens, lcov No Change, un-cens, lcov No Change, un-cens, lcov No Change, un-cens, lmat No Change, un-cens, lmat No Change, un-cens, lmat VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lcov VAR(1), alasso (lasso), un-cens, lmat VAR(1), alasso (lasso), un-cens, lmat VAR(1), alasso (lasso), un-cens, lmat VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lcov VAR(1), Lasso, un-cens, lmat VAR(1), Lasso, un-cens, lmat VAR(1), Lasso, un-cens, lmat AR(1), Lasso, un-cens, lcov AR(1), Lasso, un-cens, lcov AR(1), Lasso, un-cens, lcov.var AR(1), Lasso, un-cens, lmat AR(1), Lasso, un-cens, lmat.var AR(1), Lasso, un-cens, lmat Table 10. Summary statistics for monthly aggregated data, 60 training observations, h-step ahead recursive forecasts, all statistics are averaged across forecast iterations. probability at least 1 2(k 2 p) 1 ln(1+t ) 2(1+T ) 1/A π q (s i ) the following inequalities hold for all i = 1,..., k for some positive constant A. (7) ˆβi βi l1 16 s qκ 2 i λ T i Furthermore, all the above statements hold on one and the same set which has probability at least 1 2(k 2 p) 1 ln(1+t ) 2(1 + T ) 1/A π q ( s). Proof of Theorem??. Since ˆΣT +1 Σ T +1 = vech ˆΣT +1 vech Σ T +1 = ŷt +1 T y T +1 l we shall bound each entry of ŷ T +1 T y T +1. By assumption while y T +1,i = Z T +1β i + ɛ T +1 T,i ŷ T +1 T,i = Z T +1 ˆβ i such that yt +1,i ŷ T +1 T,i Z = T +1 (βi ˆβ i ) + ɛ T +1,i ZT +1 l ˆβi βi l1 + ɛt +1,i Using Lemma 1 this yields that y T +1,i ŷ T +1 T,i 16 Z T +1 l s qκ 2 i λ T + ɛ T +1,i for all i = 1,..., k i

17 ESTIMATION AND FORECASTING LARGE REALIZED COVARIANCE MATRICES 17 with probability at least 1 2(k 2 p) 1 ln(1+t ) 2(1 + T ) 1/A π q ( s). Next, note that by the gaussianity of the covariates and error terms P ( y T l,i x) 2e x2 /2σT 2 for all 1 i k and 1 l p and P ( ɛ T +1,i x) 2e x2 /2σT 2 for all 1 i k. This implies P ( Z T +1 l max 1 i k ɛ T +1,i L) 2kpe L2 /2σ 2 T + 2ke L 2 /2σ 2 T = 2k(p + 1)e L 2 /2σ 2 T Choosing L 2 = 2σT 2 ln(k(p + 1)) ln(t ) yields (8) P ( Z T +1 l max 1 i k ɛ T +1,i L) 2[k(p + 1)] 1 ln(t ) and so ( yt +1,i ŷ T +1 T,i 16 ) 2σT 2 ln(k(p + 1)) ln(t ) s qκ 2 i λ T + 1 for all i = 1,..., k i with probability at least 1 2(k 2 p) 1 ln(1+t ) 2(1+T ) 1/A π q ( s) 2[k(p+1)] 1 ln(t ). References Bickel, P. J., Y. Ritov, and A. B. Tsybakov (2009). Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics 37 (4), Bühlmann, P. and S. Van De Geer (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer-Verlag, New York. Friedman, J., T. Hastie, and R. Tibshirani (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 (1), Kock, A. and L. Callot (2012). Oracle inequalities for high dimensional vector autoregressions. Aarhus University, CREATES Research Paper 16. Meinshausen, N. and P. Bühlmann (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34, Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), Zhao, P. and B. Yu (2006). On model selection consistency of lasso. The Journal of Machine Learning Research 7,

MODELING AND FORECASTING LARGE REALIZED COVARIANCE MATRICES AND PORTFOLIO CHOICE

MODELING AND FORECASTING LARGE REALIZED COVARIANCE MATRICES AND PORTFOLIO CHOICE MODELING AND FORECASTING LARGE REALIZED COVARIANCE MATRICES AND PORTFOLIO CHOICE Laurent A.F. Callot VU University Amsterdam, CREATES, and the Tinbergen Institute E-mail: l.callot@vu.nl Anders B. Kock

More information

ESTIMATION AND FORECASTING OF LARGE REALIZED COVARIANCE MATRICES AND PORTFOLIO CHOICE LAURENT A. F. CALLOT

ESTIMATION AND FORECASTING OF LARGE REALIZED COVARIANCE MATRICES AND PORTFOLIO CHOICE LAURENT A. F. CALLOT ESTIMATION AND FORECASTING OF LARGE REALIZED COVARIANCE MATRICES AND PORTFOLIO CHOICE LAURENT A. F. CALLOT VU University Amsterdam, The Netherlands, CREATES, and the Tinbergen Institute. ANDERS B. KOCK

More information

Estimation and Forecasting of Large Realized Covariance Matrices and Portfolio Choice. Laurent A. F. Callot, Anders B. Kock and Marcelo C.

Estimation and Forecasting of Large Realized Covariance Matrices and Portfolio Choice. Laurent A. F. Callot, Anders B. Kock and Marcelo C. Estimation and Forecasting of Large Realized Covariance Matrices and Portfolio Choice Laurent A. F. Callot, Anders B. Kock and Marcelo C. Medeiros CREATES Research Paper 2014-42 Department of Economics

More information

SUPPLEMENT TO MODELING AND FORECASTING LARGE REALIZED COVARIANCE MATRICES AND PORTFOLIO CHOICE

SUPPLEMENT TO MODELING AND FORECASTING LARGE REALIZED COVARIANCE MATRICES AND PORTFOLIO CHOICE SUPPLEMEN O MODELING AND FORECASING LARGE REALIZED COVARIANCE MARICES AND PORFOLIO CHOICE Laurent A.F. Callot VU University Amsterdam, CREAES, and the inbergen Institute Anders B. Kock Aarhus University

More information

Oracle Efficient Estimation and Forecasting with the Adaptive LASSO and the Adaptive Group LASSO in Vector Autoregressions

Oracle Efficient Estimation and Forecasting with the Adaptive LASSO and the Adaptive Group LASSO in Vector Autoregressions Oracle Efficient Estimation and Forecasting with the Adaptive LASSO and the Adaptive Group LASSO in Vector Autoregressions Anders Bredahl Kock and Laurent A.F. Callot CREAES Research Paper 2012-38 Department

More information

Estimating Global Bank Network Connectedness

Estimating Global Bank Network Connectedness Estimating Global Bank Network Connectedness Mert Demirer (MIT) Francis X. Diebold (Penn) Laura Liu (Penn) Kamil Yılmaz (Koç) September 22, 2016 1 / 27 Financial and Macroeconomic Connectedness Market

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Penalized Estimation of Panel Vector Autoregressive Models: A Lasso Approach

Penalized Estimation of Panel Vector Autoregressive Models: A Lasso Approach Penalized Estimation of Panel Vector Autoregressive Models: A Lasso Approach Annika Schnücker Freie Universität Berlin and DIW Berlin Graduate Center, Mohrenstr. 58, 10117 Berlin, Germany October 11, 2017

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Forecasting Large Realized Covariance Matrices: The Benefits of Factor Models and Shrinkage

Forecasting Large Realized Covariance Matrices: The Benefits of Factor Models and Shrinkage Forecasting Large Realized Covariance Matrices: The Benefits of Factor Models and Shrinkage Diego S. de Brito Department of Economics PUC-Rio Marcelo C. Medeiros Department of Economics PUC-Rio First version:

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

Mining Big Data Using Parsimonious Factor and Shrinkage Methods

Mining Big Data Using Parsimonious Factor and Shrinkage Methods Mining Big Data Using Parsimonious Factor and Shrinkage Methods Hyun Hak Kim 1 and Norman Swanson 2 1 Bank of Korea and 2 Rutgers University ECB Workshop on using Big Data for Forecasting and Statistics

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual

More information

Delta Theorem in the Age of High Dimensions

Delta Theorem in the Age of High Dimensions Delta Theorem in the Age of High Dimensions Mehmet Caner Department of Economics Ohio State University December 15, 2016 Abstract We provide a new version of delta theorem, that takes into account of high

More information

Robust Portfolio Risk Minimization Using the Graphical Lasso

Robust Portfolio Risk Minimization Using the Graphical Lasso Robust Portfolio Risk Minimization Using the Graphical Lasso Tristan Millington & Mahesan Niranjan Department of Electronics and Computer Science University of Southampton Highfield SO17 1BJ, Southampton,

More information

Estimation of the Global Minimum Variance Portfolio in High Dimensions

Estimation of the Global Minimum Variance Portfolio in High Dimensions Estimation of the Global Minimum Variance Portfolio in High Dimensions Taras Bodnar, Nestor Parolya and Wolfgang Schmid 07.FEBRUARY 2014 1 / 25 Outline Introduction Random Matrix Theory: Preliminary Results

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information

An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso

An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso PIER Exchange Nov. 17, 2016 Thammarak Moenjak What is machine learning? Wikipedia

More information

Identifying Financial Risk Factors

Identifying Financial Risk Factors Identifying Financial Risk Factors with a Low-Rank Sparse Decomposition Lisa Goldberg Alex Shkolnik Berkeley Columbia Meeting in Engineering and Statistics 24 March 2016 Outline 1 A Brief History of Factor

More information

2.5 Forecasting and Impulse Response Functions

2.5 Forecasting and Impulse Response Functions 2.5 Forecasting and Impulse Response Functions Principles of forecasting Forecast based on conditional expectations Suppose we are interested in forecasting the value of y t+1 based on a set of variables

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Lecture 8: Multivariate GARCH and Conditional Correlation Models

Lecture 8: Multivariate GARCH and Conditional Correlation Models Lecture 8: Multivariate GARCH and Conditional Correlation Models Prof. Massimo Guidolin 20192 Financial Econometrics Winter/Spring 2018 Overview Three issues in multivariate modelling of CH covariances

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Time Series Models for Measuring Market Risk

Time Series Models for Measuring Market Risk Time Series Models for Measuring Market Risk José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department June 28, 2007 1/ 32 Outline 1 Introduction 2 Competitive and collaborative

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Penalized Estimation of Panel VARs: A Lasso Approach. Annika Schnücker DIW Berlin Graduate Center and Freie Universität Berlin Draft - February 2017

Penalized Estimation of Panel VARs: A Lasso Approach. Annika Schnücker DIW Berlin Graduate Center and Freie Universität Berlin Draft - February 2017 Penalized Estimation of Panel VARs: A Lasso Approach Annika Schnücker DIW Berlin Graduate Center and Freie Universität Berlin Draft - February 2017 Abstract Panel vector autoregressive (PVAR) models account

More information

Keywords: sparse models, shrinkage, LASSO, adalasso, time series, forecasting, GARCH.

Keywords: sparse models, shrinkage, LASSO, adalasso, time series, forecasting, GARCH. l 1 -REGULARIZAION OF HIGH-DIMENSIONAL IME-SERIES MODELS WIH FLEXIBLE INNOVAIONS Marcelo C. Medeiros Department of Economics Pontifical Catholic University of Rio de Janeiro Rua Marquês de São Vicente

More information

LASSO-type penalties for covariate selection and forecasting in time series

LASSO-type penalties for covariate selection and forecasting in time series LASSO-type penalties for covariate selection and forecasting in time series Evandro Konzen 1 Flavio A. Ziegelmann 2 Abstract This paper studies some forms of LASSO-type penalties in time series to reduce

More information

Log Covariance Matrix Estimation

Log Covariance Matrix Estimation Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1 Outline Background and Motivation The Proposed

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Bayesian Compressed Vector Autoregressions

Bayesian Compressed Vector Autoregressions Bayesian Compressed Vector Autoregressions Gary Koop a, Dimitris Korobilis b, and Davide Pettenuzzo c a University of Strathclyde b University of Glasgow c Brandeis University 9th ECB Workshop on Forecasting

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Vector Auto-Regressive Models

Vector Auto-Regressive Models Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

VAR Models and Applications

VAR Models and Applications VAR Models and Applications Laurent Ferrara 1 1 University of Paris West M2 EIPMC Oct. 2016 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

arxiv: v2 [math.st] 2 Jul 2017

arxiv: v2 [math.st] 2 Jul 2017 A Relaxed Approach to Estimating Large Portfolios Mehmet Caner Esra Ulasan Laurent Callot A.Özlem Önder July 4, 2017 arxiv:1611.07347v2 [math.st] 2 Jul 2017 Abstract This paper considers three aspects

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

10. Time series regression and forecasting

10. Time series regression and forecasting 10. Time series regression and forecasting Key feature of this section: Analysis of data on a single entity observed at multiple points in time (time series data) Typical research questions: What is the

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Commodity Connectedness

Commodity Connectedness Commodity Connectedness Francis X. Diebold (Penn) Laura Liu (Penn) Kamil Yılmaz (Koç) November 9, 2016 1 / 29 Financial and Macroeconomic Connectedness Portfolio concentration risk Credit risk Counterparty

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Speculation and the Bond Market: An Empirical No-arbitrage Framework

Speculation and the Bond Market: An Empirical No-arbitrage Framework Online Appendix to the paper Speculation and the Bond Market: An Empirical No-arbitrage Framework October 5, 2015 Part I: Maturity specific shocks in affine and equilibrium models This Appendix present

More information

An algorithm for the multivariate group lasso with covariance estimation

An algorithm for the multivariate group lasso with covariance estimation An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Frontiers in Forecasting, Minneapolis February 21-23, Sparse VAR-Models. Christophe Croux. EDHEC Business School (France)

Frontiers in Forecasting, Minneapolis February 21-23, Sparse VAR-Models. Christophe Croux. EDHEC Business School (France) Frontiers in Forecasting, Minneapolis February 21-23, 2018 Sparse VAR-Models Christophe Croux EDHEC Business School (France) Joint Work with Ines Wilms (Cornell University), Luca Barbaglia (KU leuven),

More information

Estimating Covariance Using Factorial Hidden Markov Models

Estimating Covariance Using Factorial Hidden Markov Models Estimating Covariance Using Factorial Hidden Markov Models João Sedoc 1,2 with: Jordan Rodu 3, Lyle Ungar 1, Dean Foster 1 and Jean Gallier 1 1 University of Pennsylvania Philadelphia, PA joao@cis.upenn.edu

More information

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models Christopher Paciorek, Department of Statistics, University

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Portfolio Allocation using High Frequency Data. Jianqing Fan

Portfolio Allocation using High Frequency Data. Jianqing Fan Portfolio Allocation using High Frequency Data Princeton University With Yingying Li and Ke Yu http://www.princeton.edu/ jqfan September 10, 2010 About this talk How to select sparsely optimal portfolio?

More information

Backtesting Marginal Expected Shortfall and Related Systemic Risk Measures

Backtesting Marginal Expected Shortfall and Related Systemic Risk Measures Backtesting Marginal Expected Shortfall and Related Systemic Risk Measures Denisa Banulescu 1 Christophe Hurlin 1 Jérémy Leymarie 1 Olivier Scaillet 2 1 University of Orleans 2 University of Geneva & Swiss

More information

On Model Selection Consistency of Lasso

On Model Selection Consistency of Lasso On Model Selection Consistency of Lasso Peng Zhao Department of Statistics University of Berkeley 367 Evans Hall Berkeley, CA 94720-3860, USA Bin Yu Department of Statistics University of Berkeley 367

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56 Cointegrated VAR s Eduardo Rossi University of Pavia November 2013 Rossi Cointegrated VAR s Financial Econometrics - 2013 1 / 56 VAR y t = (y 1t,..., y nt ) is (n 1) vector. y t VAR(p): Φ(L)y t = ɛ t The

More information

Volatility. Gerald P. Dwyer. February Clemson University

Volatility. Gerald P. Dwyer. February Clemson University Volatility Gerald P. Dwyer Clemson University February 2016 Outline 1 Volatility Characteristics of Time Series Heteroskedasticity Simpler Estimation Strategies Exponentially Weighted Moving Average Use

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

Forecasting 1 to h steps ahead using partial least squares

Forecasting 1 to h steps ahead using partial least squares Forecasting 1 to h steps ahead using partial least squares Philip Hans Franses Econometric Institute, Erasmus University Rotterdam November 10, 2006 Econometric Institute Report 2006-47 I thank Dick van

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Economics 883 Spring 2016 Tauchen. Jump Regression

Economics 883 Spring 2016 Tauchen. Jump Regression Economics 883 Spring 2016 Tauchen Jump Regression 1 Main Model In the jump regression setting we have X = ( Z Y where Z is the log of the market index and Y is the log of an asset price. The dynamics are

More information

Robust methods and model selection. Garth Tarr September 2015

Robust methods and model selection. Garth Tarr September 2015 Robust methods and model selection Garth Tarr September 2015 Outline 1. The past: robust statistics 2. The present: model selection 3. The future: protein data, meat science, joint modelling, data visualisation

More information

A Comparative Framework for Preconditioned Lasso Algorithms

A Comparative Framework for Preconditioned Lasso Algorithms A Comparative Framework for Preconditioned Lasso Algorithms Fabian L. Wauthier Statistics and WTCHG University of Oxford flw@stats.ox.ac.uk Nebojsa Jojic Microsoft Research, Redmond jojic@microsoft.com

More information

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS

INSTITUTE AND FACULTY OF ACTUARIES. Curriculum 2019 SPECIMEN SOLUTIONS INSTITUTE AND FACULTY OF ACTUARIES Curriculum 09 SPECIMEN SOLUTIONS Subject CSA Risk Modelling and Survival Analysis Institute and Faculty of Actuaries Sample path A continuous time, discrete state process

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction Acta Math. Univ. Comenianae Vol. LXV, 1(1996), pp. 129 139 129 ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES V. WITKOVSKÝ Abstract. Estimation of the autoregressive

More information

Markowitz Efficient Portfolio Frontier as Least-Norm Analytic Solution to Underdetermined Equations

Markowitz Efficient Portfolio Frontier as Least-Norm Analytic Solution to Underdetermined Equations Markowitz Efficient Portfolio Frontier as Least-Norm Analytic Solution to Underdetermined Equations Sahand Rabbani Introduction Modern portfolio theory deals in part with the efficient allocation of investments

More information

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu

More information

The Constrained Lasso

The Constrained Lasso The Constrained Lasso Gareth M. ames, Courtney Paulson and Paat Rusmevichientong Abstract Motivated by applications in areas as diverse as finance, image reconstruction, and curve estimation, we introduce

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice - theory and application in finance R. Scott Hacker and Abdulnasser Hatemi-J April 200

More information

Principles of forecasting

Principles of forecasting 2.5 Forecasting Principles of forecasting Forecast based on conditional expectations Suppose we are interested in forecasting the value of y t+1 based on a set of variables X t (m 1 vector). Let y t+1

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information