Regularization of Portfolio Allocation

Size: px

Start display at page:

Download "Regularization of Portfolio Allocation"

Robert Adams
5 years ago
Views:

1 Regularization of Portfolio Allocation Benjamin Bruder Quantitative Research Lyxor Asset Management, Paris Jean-Charles Richard Quantitative Research Lyxor Asset Management, Paris Nicolas Gaussel Chief Investment Officer Lyxor Asset Management, Paris Thierry Roncalli Quantitative Research Lyxor Asset Management, Paris June 2013 Abstract The mean-variance optimization (MVO) theory of Markowitz (1952) for portfolio selection is one of the most important methods used in quantitative finance. This portfolio allocation needs two input parameters, the vector of expected returns and the covariance matrix of asset returns. This process leads to estimation errors, which may have a large impact on portfolio weights. In this paper we review different methods which aim to stabilize the mean-variance allocation. In particular, we consider recent results from machine learning theory to obtain more robust allocation. Keywords: Portfolio optimization, active management, estimation error, shrinkage estimator, resampling methods, eigendecomposition, norm constraints, Lasso regression, ridge regression, information matrix, hedging portfolio, sparsity. JEL classification: G11, C60. 1 Introduction The mean-variance optimization (MVO) framework developed by Markowitz (1952) is certainly the most famous model used in asset management. This model is generally associated to the CAPM theory of Sharpe (1964). This explains why Harry M. Markowitz and William F. Sharpe have shared 1 the Nobel Prize in However, the two models are used differently by practitioners. The CAPM theory considers the Markowitz model from the viewpoint of micro analysis in order to deduce the price formation for financial assets. In this model, the key concept is the market portfolio, which is uniquely defined. In the Markowitz model, optimized portfolios depend on expected returns and risks. Moreover, the optimal portfolio is not unique and depends on the investor s risk aversion. As a consequence, these two models We are grateful to Clément Le Bars for his helpful comments. 1 with Merton H. Miller. Electronic copy available at:

2 pursue different purposes. While the CAPM theory is the foundation framework of passive management, the Markowitz model is the relevant framework for active management. Nevertheless, even if the Markowitz model is a powerful model to transform the views of the portfolio manager into investment bets, it has suffered a lot of criticism because it is particularly dependent on estimation errors (Michaud, 1989). In fact, the Markowitz model is an aggressive model of active management (Roncalli, 2013). By construction, it does not make the distinction between real arbitrage factors and noisy arbitrage factors. The goal of portfolio regularization is then to produce less aggressive portfolios by reducing noisy bets. The paper is organized as follows. Section two presents the motivations to use regularization methods. In particular, we illustrate the instability of mean-variance optimized portfolios. In section three, we review the different approaches of portfolio regularization. They concern the introduction of weight constraints, the use of resampling techniques or the shrinkage of covariance matrices. We also consider penalization methods of the objective function like Lasso or ridge regression and show how these methods may be used to regularize the inverse of the covariance matrix, which is the most important quantity in portfolio optimization. In section four, we consider different applications in order to illustrate the impact of regularization on portfolio optimization. Section five offers some concluding remarks. 2 Motivations 2.1 The mean-variance portfolio Let us consider a universe of n risky assets. Let µ and Σ be the vector of expected returns and the covariance matrix of asset returns 2. We note r the risk-free asset. A portfolio allocation consists in a vector of weights x = (x 1,..., x n ) where x i is the percentage of the wealth invested in the i th asset. Sometimes, we may assume that all the wealth is invested meaning that the sum of weights is equal to one. Moreover, we may also add some other constraints on the weights. For instance, we may impose that the portfolio is long-only. Let us now define the quadratic utility function U of the investor which only depends of the expected returns µ and the covariance matrix Σ of the assets: U (x) = x (µ r1) φ 2 x Σx where φ is the risk tolerance of the investor. The mean-variance optimized (or MVO) portfolio x is the portfolio which maximizes the investor s utility. The optimization problem can be reformulated equivalently as a standard QP problem: x = arg min 1 2 x Σx γx (µ r1) where γ = φ 1. Without any constraints, the solution yields the well known formula: x = 1 φ Σ 1 (µ r1) = γσ 1 (µ r1) It comes that the Sharpe ratio of the MVO portfolio is: SR (x r) = x (µ r1) x Σx = (µ r1) Σ 1 (µ r1) 2 In this paper, we adopt the formulation presented in the book of Roncalli (2013). Electronic copy available at:

3 We deduce that the optimal utility of the investor is: U (x ) = x (µ r1) φ 2 x Σx = 1 2φ (µ r1) Σ 1 (µ r1) = 1 2φ SR2 (x r) Maximizing the mean-variance utility function is then equivalent to maximizing the ex-ante Sharpe ratio of the allocation. Remark 1 Without lack of generality, we assume that the risk-free rate r is equal to zero in the rest of the paper. In practice, we cannot reach the optimal allocation because we don t know µ and Σ. That is why we have to estimate these two quantities. Let R t = (R 1,t,..., R n,t ) be the vector of historical returns for the different assets at time t. We then estimate µ and Σ by maximum likelihood method: ˆµ = 1 T ˆΣ = 1 T T t=1 R t T (R t ˆµ) (R t ˆµ) t=1 We can therefore use the estimates ˆµ and ˆΣ in place of µ and Σ in mean-variance optimization. This estimation step is very easy. As mentioned by Roncalli (2013), we could think that the job is complete. However, the story does not end here. 2.2 Evidence of mean-variance instability Estimating the input parameters of the optimization program necessarily introduces estimation errors and instability in the optimal solution. This stability issue with estimators based on historical figures has been largely studied by academics 3. Before going into the details of this subject, we propose to illustrate the stability problem of the MVO portfolio with the following example. Example 1 We consider a universe of four assets. The expected returns are ˆµ 1 = 5%, ˆµ 2 = 6%, ˆµ 3 = 7% and ˆµ 4 = 8% whereas the volatilities are equal to ˆσ 1 = 10%, ˆσ 2 = 12%, ˆσ 3 = 14% and ˆσ 4 = 15%. We assume that the correlations are the same and we have ˆρ i,j = ˆρ = 70%. We solve the mean-variance problem without constraints using the parameters given in Example 1. The risk tolerance parameter φ is calibrated in order to target an ex-ante volatility 4 equal to 10%. In this case the optimal portfolio is x 1 = 23.49%, x 2 = 19.57%, 3 See for instance Michaud (1989), Jorion (1992), Broadie (1993) Ledoit and Wolf (2004) or more recently DeMiguel et al. (2011). 4 Let σ be the ex-ante volatility. We have: ˆµ ˆΣ 1 ˆµ φ = σ Electronic copy available at:

4 x 3 = 16.78% and x 4 = 28.44%%. In Table 1, we indicate how a small perturbation of input parameters changes the optimized solution. For instance, if the volatility of the second asset increases by 3%, the weight on this asset becomes 14.04% instead of 19.57%. If the realized return of the first asset is 6% and not 5%, the optimal weight of the first asset is almost three times larger (63.19% versus 23.49%). As a consequence, the optimized solution is very sensitive to estimation errors. Table 1: Sensitivity of the MVO portfolio to input parameters ˆρ 70% 80% 80% ˆσ 2 12% 15% 15% ˆµ 1 5% 6% x % 19.43% 36.55% 39.56% 63.19% x % 16.19% 14.04% 32.11% 8.14% x % 13.88% 26.11% 28.26% 6.98% x % 32.97% 37.17% 45.87% 18.38% The stability problem comes from the solution structure. Indeed, the solution involves the inverse of the covariance matrix I = ˆΣ 1 called the information matrix. The eigenvectors of the two matrices are the same but the eigenvalues of I are equal to the inverse of the eigenvalues of ˆΣ (Roncalli, 2013). Example 2 We consider our previous example but with another correlation matrix: 1.00 Ĉ = In Table 2, we consider Example 2 and report the eigenvectors v j and the eigenvalues λ j of the covariance and information matrices. Results show that the most important factor 5 of the information matrix is the less important factor of the covariance matrix. However the smallest factors of the covariance matrix are generally considered noise factors because they represent a small part of the total variance. This explains why MVO portfolios are sensitive to input parameters because small changes in the covariance matrix dramatically modify the nature of smallest factors. Despite the simplicity of the mean-variance optimization, the stability of the allocation is then a real problem. In this context, Michaud suggested that mean-variance maximization is in fact error maximization : The unintuitive character of many optimized portfolios can be traced to the fact that MV optimizers are, in a fundamental sense, estimation error maximizers. Risk and return estimates are inevitably subject to estimation error. MV optimization significantly overweights (underweights) those securities that have large (small) estimated returns, negative (positive) correlations and small (large) variances. These securities are, of course, the ones most likely to have large estimation errors (Michaud, 1989, page 33). In a dynamic framework, estimation errors can then dramatically change the weights leading to high turnover and/or high transaction costs. Moreover, the diversifiable risk is supposed 5 The j th factor is represented by the eigenvector v j and the importance of the factor is given by the eigenvalue λ j.

5 to be decreased thanks to the optimization that can be underestimated. Aware from these problems, academics and practitioners have developed techniques to reduce the impact of estimation errors. Table 2: Eigendecomposition of the covariance and information matrices ( ) (in %) Covariance matrix ˆΣ Information matrix I v j v 1 v 2 v 3 v 4 v 1 v 2 v 3 v λ j (*) The eigenvalues of the information matrix are not expressed in %, but as decimals. Remark 2 In Section 3.3.1, we will see how to interpret the eigenvectors and the eigenvalues of the covariance matrix in the MVO framework. 2.3 Input parameters versus estimation errors After estimating the input parameters, the optimization is done as if these quantities were perfectly certain, implying that estimation errors are introduced into the allocation process. Various solutions exist to stabilize the optimization from the simplest to the most complicated, but we generally distinguish two ways to regularize the solution. The first one consists in reducing the estimation errors of the input parameters thanks to econometric methods. For instance, Michaud (1998) uses the resampling approach to reduce the impact of noise estimation. Ledoit and Wolf (2003) propose to replace the covariance estimator by a shrinkage version whereas Laloux et al. (1999) clean the covariance matrix thanks to the random matrix theory. Another route is chosen by Black and Litterman (1992), who suggest combining manager views and market equilibrium to modify the expected returns 6. The second way is to directly shrink the portfolio weights using weight bounds, penalization of the objective function or regularization of input parameters. Jagannathan and Ma (2003) show that imposing constraints on the mean-variance optimization can be interpreted as a modification of the covariance matrix. In particular, lower bounds (resp. upper bounds) decrease (resp. increase) asset return volatilities. Constraints on weights reduce then the degree of freedom of the optimization and the allocation is forced to remain in certain intervals. Instead of using constraints, we can also use other values of input parameters than those estimated with historical figures. For instance, we can consider a diagonal matrix instead of the full covariance matrix or we can use a unique value for the expected returns. This is the case of the equally-weighted (or EW) portfolio, which is the solution for the mean-variance portfolio when Σ = I n and µ = 1. This solution is obtained using 6 See DeMiguel et al. (2011) for a review of shrinkage estimators of the covariance matrix of asset returns and the vector of expected returns.

6 wrong estimators. However, these estimators have a null variance and minimize the impact of estimation errors on the optimized portfolio. The correction of estimation errors is such difficult task that several studies tend to show that heuristic allocations perform better than mean-variance allocations in terms of the Sharpe ratio. For example, DeMiguel et al. (2009) compare the performances of 14 different portfolio models and the equally-weighted portfolio on different datasets and conclude that sophisticated models are not better than the EW portfolio. More recently, Tu and Zhou (2011) propose to combine the EW portfolio with optimized allocation to outperform naive strategies. In a similar way, Dupleich et al. (2012) combine MVO portfolios with different lag windows to remove model uncertainty. By mixing stable noisy portfolios, the authors seek to improve the stability of the allocation. In fact, we will see that most of mixing schemes are equivalent to denoising input parameters. 3 Regularization methods for portfolio optimization In what follows, we present the most popular techniques used to solve the problem of estimation errors. The first three paragraphs concern weight constraints, resampling methods and shrinkage procedures of the covariance matrix. We then consider the penalization approach of the objective function. Finally, the stability of hedging portfolios based on the information matrix is explained in the last paragraph. 3.1 Using weight constraints Adding constraints is certainly the first approach that has been used by portfolio managers to regularize optimized portfolios, and it remains today the most frequent method to avoid mean-variance instability. Let us consider the optimization problem with the normalization constraint: x (γ) = arg max 1 2 x ˆΣx γx ˆµ u.c. 1 x = 1 The constraint 1 x = 1 means that the sum of weights is equal to one. It is easy to show that the optimized portfolio is then: x (γ; λ) = γ ˆΣ 1 µ where µ = ˆµ + (λ/γ) 1 and λ is the Lagrange coefficients associated to the constraint. We notice that imposing a portfolio that is fully invested with a leverage equal to exactly one is equivalent to regularize the vector of expected returns. The constraint n i=1 x i = 1 is then already a regularization method. Example 3 We consider a universe of four assets. The expected returns are ˆµ 1 = 8%, ˆµ 2 = 9%, ˆµ 3 = 10% and ˆµ 4 = 8% whereas the volatilities are equal to ˆσ 1 = 15%, ˆσ 2 = 20%, ˆσ 3 = 25% and ˆσ 4 = 30%. The correlation matrix is the following: Ĉ =

7 If we suppose that γ = 0.5, we obtain results reported in Table 3. If there is no constraint, the portfolio is highly leveraged. For instance, the weight of the first asset is equal to %. By adding the simple constraint n i=1 x i = 1, the dispersion of optimized weights is smaller 7. We also notice that the regularized expected returns are lower, because λ is equal to 2.65%. Table 3: Optimized portfolio with the constraint n i=1 x i = 1 Unconstrained Constrained µ i x (γ) µ i x (γ; λ) % % 2.69% 64.61% % % 3.69% 42.06% % 22.75% 4.69% 11.55% % 32.28% 2.69% 18.22% The previous framework may be generalized to other constraints. For instance, Jagannathan and Ma (2003) show that adding a long-only constraint is equivalent to regularizing the covariance matrix. This result also holds for any equality or inequality constraints (Roncalli, 2013). If we consider our previous example and add a long-only constraint, the optimized portfolio is x 1 = 52.23%, x 2 = 42.41%, x 3 = 1.36% and x 4 = 0.00%. In this case, the regularized vector of expected returns µ and the regularized covariance matrix Σ are given in Table 4. We notice that the long-only constraint is equivalent to decrease the volatility and the correlation of the fourth asset in order to eliminate its short exposure. Table 4: Regularized parameters µ and Σ Asset µ i σ i ρ i,j % 15.00% % % 20.00% 10.00% % % 25.00% 40.00% 70.00% % % 26.72% 32.90% 27.49% 53.43% % Remark 3 Portfolio managers generally find the optimal portfolio by sequential steps. They perform the portfolio optimization, analyze the solution to define some regularization constraints, design a new optimization problem by considering these constraints, analyze the new solution and add more satisfying constraints, etc. This step-by-step approach is then very popular, because portfolio managers implicitly regularize the parameters in a coherent way with their expectations for the solution. The drawback may be that the regularized parameters are no longer coherent with the initial parameters. Moreover, the constrained solution is generally overfitted. 3.2 Resampling methods Resampling techniques are based on Monte Carlo and bootstrapping methods. Jorion (1992) was the first to apply these techniques to portfolio optimization. The idea is to create more realistic allocation by introducing uncertainty in the decision process of the allocation. For 7 The weight of the first asset is then equal to 64.61%.

8 that, we consider a universe of n assets. Let ˆµ and ˆΣ be the estimates of the expected returns and the covariance matrix of assets returns. The efficient frontier computed with these statistics is an estimation of the true efficient frontier. Michaud (1998) proposed then averaging many realizations of optimized MV solutions to improve out-of-sample performance thanks to the statistical diversification. The procedure is the following. We generate K samples of asset returns from the original data using Monte Carlo or bootstrap methods: Monte Carlo The returns are simulated according to a multivariate Gaussian distribution with mean ˆµ and covariance matrix ˆΣ. Bootstrap The returns are drawn randomly from the original sample with replacement. We assume that the MV solution is computed for a given value of the risk tolerance. We then calculate the mean ˆµ (k) and the covariance matrix ˆΣ (k) of the k-th simulated sample. We also calculate the MVO portfolios for a grid of risk tolerance. Finally, we average the weights with respect to the grid and estimate the resampled efficient frontier. Example 4 We consider a universe of four assets. The expected returns are ˆµ 1 = 5%, ˆµ 2 = 9%, ˆµ 3 = 7% and ˆµ 4 = 6% whereas the volatilities are equal to ˆσ 1 = 4%, ˆσ 2 = 15%, ˆσ 3 = 5% and ˆσ 4 = 10%. The correlation matrix is the following: Ĉ = We illustrate the resampling procedure in Figure 1 by considering Example 4. MVO portfolios are computed under the constraints 1 x = 1 and 0 x i 1. We consider 500 simulated samples and 60 points for the grid. The estimated frontier is calculated with ˆµ and ˆΣ statistics. The averaged frontier corresponds to the average of the different efficient frontiers obtained for each sample of simulated asset returns. It is different from the resampled frontier, which corresponds to the frontier of resampled portfolios. For instance, we report one optimal resampled portfolio (designed by the red star symbol) which is the average of the 500 resampled portfolios (indicated with the blue cross symbol). The resampled efficient frontier in Figure 2 is performed with S&P 100 asset returns during the period from January 1, 2011 to December 31, The resampled frontier is largely below the estimated and averaged efficient frontiers. Moreover, portfolios with high returns are unattainable on the resampled frontier, meaning that these portfolios are extreme points on the estimated efficient frontier and are purely due to estimation noises. Remark 4 Resampling techniques have faced some criticisms (Scherer, 2002). The first one concerns the procedure itself, because the resampled portfolio always contains estimation errors since it is computed with the initial parameters ˆµ and ˆΣ. The second criticism is the lack of theory. Resampling techniques is more an empirical method which seems to correct some biases, because portfolio averaging produces more diversified portfolios. However, they do not solve the robustness question concerning optimized portfolios.

9 Figure 1: Simulated resampled efficient frontier (Monte Carlo approach) Figure 2: S&P 100 resampled efficient frontier (Bootstrap approach)

10 3.3 Regularization of the covariance matrix The eigendecomposition approach The goal of this method is to reduce the instability of the covariance matrix estimator ˆΣ. For that, we consider the eigendecomposition ˆΣ = V ΛV where Λ = diag (λ 1,, λ n ) is the diagonal matrix of the eigenvalues with λ 1 > λ 2 > > λ n and V is an orthogonal matrix where each column v j is an eigenvector. With this decomposition, also known as principal components analysis, we can build endogenous factors F t = Λ 1/2 V R t. In this case, the cleaning process consists in deleting some noise factors F j,t. Let m be the number of relevant factors. We can then keep the most informative factors, i.e. the factors with the largest eigenvalues. In this case, factors with low eigenvalues are considered as noise factors and we have: m = max {j : λ j (λ λ n )/n} Another solution consists in computing the implicit exposures of the portfolio to these factors. For instance, we can reformulate the MVO portfolio as follows: x = γ ˆΣ 1 ˆµ = V Λ 1/2 β where β = γλ 1/2 V ˆµ. If we compute the return of the portfolio s investor, we obtain: R t (x ) = x R t = β Λ 1/2 V R t = β F t We deduce that β j is exactly the exposure of the investor to the PCA factor F j,t. We notice that the weights β of the factors in the MVO portfolio are inversely proportional to the square root of the eigenvalues: β j λ j. Thus the optimized portfolio can be strongly exposed to low variance factors, meaning that some noise factors may have a high impact on the MVO solution. Example 5 Using the returns of the S&P 100 universe from , we perform the PCA decomposition of the sample correlation matrix. We also compute the implicit exposure β j to each factor with respect to the MVO portfolio when φ is set to 5. Table 5: Factor exposures of the MVO portfolio (in %) Rank Oil & Gas Basic Materials Industrials Consumer Goods Health Care Consumer Services Telecommunications Utilities Financials Technology β j

11 Results are reported in Table 5. For each factor, we give the loading coefficients with respects to ICB classification 8. The second column is the eigenvector with the largest eigenvalue. You can see that it is a proxy of a sector-weighted portfolio 9. It may be viewed as a market factor. The other reported factors are the top five most important factors of the MVO portfolio in terms of beta exposures. These factors correspond to long-short portfolios of industry sectors. We notice that the factor with the highest beta is ranked 7, whereas the second most important factor corresponds to the factor ranked 97, which is certainly a noise factor. We verify that the beta exposure of the market factor is very small 10. This example illustrates how some factors can introduce noise in the MVO solution and how an investor can be exposed to non significant factors. A way to reduce this noise is then to set to 0 the weight of these noisy factors. A last solution consists in using random matrix theory to regularize eigenvalues of the correlation matrix. Thanks to the random matrix theory, Laloux et al. (1999) showed that the eigenvalues of the estimated correlation matrix are generally more dispersed than the true ones. A first consequence for the MVO allocation is the overweighing of some assets. Indeed the optimization focuses on some low eigenvalues whereas these eigenvalues were equal to the others in the true correlation matrix. Random matrix theory allows to test if the dispersion of the eigenvalues is significant or just due to noise. As a consequence, regularizing the estimated correlation matrix would be either to delete or equalize the eigenvalues, which are not significant. Laloux et al. (1999) studied the estimated correlation matrix of n identical independent asset returns based on T observations and showed that the eigenvalues follow a Marcenko-Pastur (MP) distribution 11 : ρ (λ) = Q 2πσ 2 (λmax λ) (λ λ min ) where Q = T/n. The maximum and minimum eigenvalues are then given by: λ ( λ max min = σ 2 1 ± ) 2 1/Q It is therefore difficult to distinguish the true eigenvalues from noisy eigenvalues for a matrix whose eigenvalue distribution looks like the MP distribution. On the other hand, the eigenvalue spectrum outside this distribution could represent real information. Example 6 We compute the theoretical distribution of λ for different value of T when n = 100. We also simulate the eigenvalue distribution of independent asset returns with n = 100 and T = 260. We finally consider the eigenvalue distribution of S&P 100 asset returns for the year In the first panel in Figure 3, we report on the Marcenko-Pastur distribution of the eigenvalues. In the second panel, we compare the histogram of simulated independent asset returns (red bars) and the theoretical MP distribution (blue line). The last panel corresponds to the eigenvalues of the correlation matrix in the case of the S&P 100 universe. For that, we remove the first eigenvalue which represents 60% of the total variance. If we consider the 99 remaining eigenvalues, we observe that their histogram is close to the MP distribution, 8 The ICB repartition of the 100 stocks is the following: Energy (11), Basic Materials (4), Industrials (15), Consumer Goods (12), Health Care (10), Consumer Services (13), Telecommunications (3), Utilities (4), Financials (16) and Technology (12). 9 The weight of the sector is closed to the frequency of stocks belonging to it. 10 It is equal to 0.16%. 11 See Marcenko and Pastur (1967).

12 Figure 3: Eigenvalue distribution except for seven eigenvalues which are outside the dashed blue line. Denoising the correlation matrix can then be performed by replacing all the eigenvalues under the dashed blue line by their mean The shrinkage approach This method was popularized by Ledoit and Wolf (2003). They propose do define the shrinkage estimator of the covariance matrix as a combination of the sample estimator of the covariance matrix ˆΣ and a target covariance matrix ˆΦ: Σ α = αˆφ + (1 α) ˆΣ where α is a constant between 0 and 1. We know that ˆΣ is a non-biased estimator, but its convergence is slow. The underlying idea is then to combine it with a biased estimator ˆΦ, but which converges faster. As a result, the mean squared error of the estimator is reduced. This approach is very close to the principle of bias and variance trade-off well known in regression analysis (James and Hastie, 1997). Ledoit and Wolf (2003) use the bias-variance decomposition with respect to the Frobenius norm to propose an optimal shrinkage parameter α. The loss function considered by Ledoit and Wolf is the following: L (α) = αˆφ + (1 α) ˆΣ Σ By solving the minimization problem α = arg min E [L (α)], they give an analytical expression of α. Ledoit and Wolf (2003) consider the single-factor model of Sharpe (1963). In this case, the vector of asset returns R t can be written as a function of the market return R m,t and 2

13 uncorrelated Gaussian residuals ε t N (0, D): R t = βr m,t + ε t where β is the vector of market betas, σ m is the volatility of the market portfolio and D = diag ( σ 2 1,..., σ 2 n) is the covariance of specific risks. The covariance matrix ˆΦ of the single-factor model is then: ˆΦ = σ 2 mββ + D Assuming that the first eigenvector of ˆΣ is the market factor, we obtain 12 : Σ α λ 1 v 1 v 1 + n ( (1 α) λi + α σ 2) v i vi i=2 The expression (1 α) λ i + α σ 2 shows that shrinking toward the single-factor matrix is equivalent to modifying the distribution of eigenvalues. The highest eigenvalue is unchanged whereas the other eigenvalues are forced to be closer to specific risks. Others models can be considered like the constant correlation matrix (Ledoit and Wolf, 2004), but the result of the shrinkage approach is always to reduce the dispersion of eigenvalues. 3.4 Penalization methods The idea of using penalizations comes from the regularization problem of linear regressions. These techniques have been largely used in machine learning in order to improve out-of-sample forecasting (Tibshirani, 1996; Zou and Hastie, 2005). Since mean-variance optimization is related to linear regression (Scherer, 2007), regularizations may improve the performance of MVO portfolios. For instance, DeMiguel et al. (2010) consider the following norm-constrained problem: x (λ) = arg min 1 2 x ˆΣx + λ x u.c. 1 x = 1 where x is the norm of the portfolio x. In particular they proved that the solution of the L 1 norm-constrained MV problem is the same as the short-sale constrained minimum-variance portfolio analyzed in Jagannathan and Ma (2003). They also demonstrate that using the L 2 norm is equivalent to combine MV and EW portfolios The L 1 constrained portfolio The L 1 norm or the Lasso approach is one of the most famous regularization procedures. The penalty consists to constrain the sum of the absolute values of the weights. We have 13 : x (γ, λ) = arg min 1 2 x ˆΣx γx ˆµ + λ x (1) The L 1 penalty has useful properties. It improves the sparsity and thus the selection of assets in large portfolio. Moreover, it stabilizes the problem by imposing size restriction on the weights. Even there is no closed solution of Equation (1), it can be easily solved with QP 12 See Appendix A.1 for computational details. We also assume that the idiosyncratic volatilities are equal ( σ 1 =... = σ n = σ). 13 n The L 1 norm is defined as follows: x = i=1 x i. It may be interpreted as the portfolio leverage.

14 algorithm. If the covariance matrix is the identity, we obtain an analytical formula which gives insight on the effect on the L 1 norm. The solution is 14 : x (γ, λ) = sgn (ˆµ) (γ ˆµ λ) + The L 1 norm corresponds then to a soft-thresholding operator of the expected return. The L 1 penalty is also well adapted to portfolio optimization under transaction or liquidity costs (Scherer, 2007). Let c be the vector of transaction costs and x 0 the initial portfolio. The transaction cost paid by the investor is c x x 0 and may be easily introduced into the mean-variance optimization. Another way to use the L 1 norm is to perform asset selection. The investor may then choose the parameter λ which corresponds to the given number m of selected assets. Example 7 We consider the asset returns of the S&P 100 universe for the period January 2011 December We compute the regularized L 1 MVO portfolio for different values of λ. Results are reported in Figure 4. In the first panel, we indicate the number of selected stocks. The optimized value of the utility function (or the ex-ante Sharpe ratio) is given in the second panel. We also report the weight evolution of the consumer services stocks. Finally, we indicate the leverage n i=1 x i of the portfolio in the last panel. This example illustrates the sparsity property when λ increases. We also notice the impact of λ in the leverage of the portfolio. For instance, if λ = 0.2%, the leverage is divided by a factor larger than six whereas the decrease of the utility function is equal to 28%. As a result, we may obtain more sparse portfolios with limited impacts on the ex-ante Sharpe ratio The L 2 constrained portfolio The L 2 constrained MVO problem is defined as follows: x (γ, λ) = arg min 1 2 x ˆΣx γx ˆµ λx x = arg min 1 2 x (ˆΣ + λin ) x γx ˆµ x (γ, λ) is then a MVO portfolio with a modified covariance matrix Σ = ˆΣ + λi n. Imposing the L 2 constraint is equivalent to adding the same amount λ to the diagonal elements of the covariance matrix. This approach is therefore very close to the shrinkage method of Ledoit and Wolf (2003). Remark 5 The L 2 constraint may be viewed as an eigenvalue shrinkage method. Indeed, we have ˆΣ = V ΛV and Σ = V (Λ + λi n ) V because V V = I n. The parameter λ is thus useful to stabilize the small eigenvalues of the covariance matrix. 14 We notice that sgn (x ) = sgn (ˆµ). The first order condition is x γ ˆµ + λ sgn (ˆµ) = 0. We deduce that: If ˆµ i 0, the first order condition becomes x i γ ˆµ i + λ = 0 and we have: x i = γ ˆµ i λ = sgn (ˆµ i ) (γ ˆµ i λ) + If ˆµ i < 0, the first order condition becomes x i γ ˆµ i λ = 0 and we have: x i = γ ˆµ i + λ = sgn (ˆµ i ) (γ ˆµ i λ) +

15 Figure 4: Illustration of the L 1 norm-constrained portfolio optimization The solution can be written as a linear combination of the MVO solution x (γ): ) 1 x (γ, λ) = γ (ˆΣ + λin ˆµ ( = I n + λˆσ 1) 1 x (γ) Using the eigendecomposition ˆΣ = V ΛV, the solution can be expressed in a simple form: x (γ, λ) = ( V V + λv Λ 1 V ) 1 x (γ) = V ΛV x (γ) where Λ is a diagonal matrix with elements Λ j = Λ j / (Λ j + λ). We notice that the weights are equal to 0 when λ = +. Instead of using the identity matrix, we can consider a general matrix A to define the L 2 norm: The solution is then: x (γ, λ) = arg min 1 2 x ˆΣx γx ˆµ λx Ax = arg min 1 2 x (ˆΣ + λa ) x γx ˆµ ) 1 x (γ, λ) = γ (ˆΣ + λa ˆµ ( 1 = I n + λˆσ A) 1 x (γ)

16 In the case of L 2 identity constraint, the covariance matrix is shrunken toward the identity matrix. If A is the diagonal matrix of asset variances (A = υ), the shrinkage is based on the correlation matrix 15. This approach is sometimes used by portfolio managers when they reduce the correlations even if they don t realize it. Indeed, we have: x (γ, λ) = γ (ˆΣ + λυ ) 1 ˆµ = γ 1 + λ ( η ˆΣ + (1 η) υ) 1 ˆµ with η = (1 + λ) 1. In this case, the solution x (γ, λ) is an optimized portfolio where we keep a percentage η of the correlations. Example 8 To illustrate the L 2 approach, we consider a universe of four assets. The correlation matrix is: 1.00 Ĉ = The expected returns are 10%, 0%, 5% and 10% whereas the volatilities are the same and are equal to 10%. Figure 5: Weights with respect to λ 15 See Appendix A.2.1.

17 Figure 6: Effect of the penalty matrix We assume that γ = 0.5%. In Figure 5, we report the evolution of the weights when A = I n. We verify that the solution is the MVO portfolio if λ = 0 and tends to 0 if λ increases. In Figure 6, we consider that A = diag ( κˆσ 2 1, ˆσ 2 2, ˆσ 2 3, ˆσ 2 4). We observe the impact on the weight x 1 of the first asset when the uncertainty κ on this asset increases. The L 2 portfolio optimization can also be used when investors target a portfolio x 0 : x (γ, λ) = arg min 1 2 x ˆΣx γx ˆµ λ (x x 0) A (x x 0 ) (2) The parameter λ controls the distance between the MVO portfolio and the target portfolio. For instance, the target portfolio could be an heuristic allocation like the EW, MV or ERC portfolio (Roncalli, 2013) or it could be the actual portfolio in order to limit the turnover. In this case, we interpret λ as risk aversion with respect to the MVO portfolio. We notice that the analytical solution is: If A = I n, the optimal portfolio becomes: x (γ, λ) = (ˆΣ + λa ) 1 (γ ˆµ + λax0 ) x (γ, λ) = (ˆΣ + λin ) 1 (γ ˆµ + λx0 ) = γ Σ 1 µ where Σ = ˆΣ + λi n and µ = ˆµ + (λ/γ) x 0. This approach corresponds to a double shrinkage of the covariance matrix ˆΣ and the vector of expected returns ˆµ (Candelon et al., 2012). We can also reformulate the solution as follows 16 : 16 See Appendix A.2.2. x (γ, λ) = Bx (γ) + (I n B) x 0

18 ( ) 1. where B = I n + λˆσ 1 The optimal portfolio is then a linear combination between the MVO portfolio and the target portfolio, and coincides with classical allocation policy. For example, when an investor considers a 50/50 allocation policy, B is equal to I n /2 and we obtain: x (γ, λ) = 1 2 x (γ) x 0 Example 9 We consider a universe of three assets. The expected returns are 5%, 6% and 7% whereas the volatilities are 10%, 15% and 20%. The correlation matrix is: Ĉ = The risk aversion parameter γ is set to 30%. We assume that the target portfolio is the EW portfolio. Table 6: L 2 portfolio with a target allocation asset x 0 x (γ) x (γ, λ) λ = 0.01 λ = 0.10 λ = In Table 6, we report the solution for different values of λ. When λ is small, the diagonal elements of B are high and the MVO portfolio x (γ) dominates the target portfolio x 0. For instance, if λ = 1%, we obtain: B = If we are interested to reduce the turnover, we can use a time-varying regularization: x t (γ, λ) = arg min 1 2 x t ˆΣ t x t γx t ˆµ t λ (x t x t 1 ) A (x t x t 1 ) (3) where t 1 and t are two successive rebalancing dates and x t 1 is the previous allocation. The analytical solution is: x t (γ, λ) = γ (ˆΣt + λa) 1 ˆµt + λ (ˆΣt + λa) 1 Axt 1 = B t x t (γ) + (I n B t ) x t 1 with B t ( 1. = I n + λˆσ 1 t A) If we assume that xt 1 = x t 1 (γ, λ), it follows that the current allocation is a moving average of past unconstrained MVO portfolios: x t (γ, λ) = B t x t (γ) + t i 1 (I n B t j ) B t i x t i (γ) i=1 j=0

19 Remark 6 Suppose that ˆΣ t = ˆΣ t 1 and A = diag (ˆσ 2 1,..., ˆσ 2 n). If asset returns are not correlated, we obtain: x i,t (γ, λ) = γ ˆσ i 2 + ˆµ i,t + λˆσ2 i λˆσ2 i ˆσ i 2 + x λˆσ2 i,t 1 (γ, λ) i = αx i,t (γ) + (1 α) x i,t 1 (γ, λ) where α = 1/(1 + λ). The solution is an exponentially weighted moving average filter. Calibrating λ is then equivalent to choosing the holding period to turn the portfolio. 3.5 Information matrix and hedging portfolios The previous methods are focused on the covariance matrix. However, the important quantity in mean-variance optimization is the information matrix I = ˆΣ 1, i.e. the inverse of the covariance matrix (Scherer, 2007; Roncalli, 2013). Stevens (1998) gives a new interpretation of the information matrix using the following regression framework: R i,t = β 0 + β i R ( i) t + ε i,t (4) where R ( i) t denotes the vector of asset returns R t excluding the i th asset and ε i,t N (0, s 2 i ). Let Ri 2 be the R-squared of the linear regression (4) and ˆβ be the matrix of OLS coefficients with rows ˆβ i. Stevens (1998) shows that the diagonal elements of the information matrix are given by: whereas the off-diagonal elements are: I i,i = ˆβ i,j 1 ˆσ 2 i (1 R2 i ) I i,j = ˆσ i 2 (1 R2 i ) = ˆσ 2 j ˆβj,i ( 1 R 2 j ) Using this expression of I, we obtain a new formula of the MVO portfolio: x i (γ) = γ ˆµ i ˆβ i ˆµ( i) ˆσ 2 i (1 R2 i ) Scherer ( (2007) ) interprets ˆµ i ˆβ i ˆµ( i) as the excess return after regression hedging and 1 R 2 i as the non-hedging risk. We remind that R 2 i = 1 ŝ 2 i /ˆσ2 i. We finally obtain: ˆσ 2 i x i (γ) = γ ˆµ i ˆβ i ˆµ( i) ŝ 2 i From this equation, we deduce the following conclusions: 1. The better the hedge, the higher the exposure. This is why highly correlated assets produces unstable MVO portfolios. 2. The long-short position is defined by the sign of ˆµ i ˆβ i ˆµ( i). If the expected return of the asset is lower than the conditional expected return of the hedging portfolio, the weight is negative. It has been shown that the linear regression can be improved using norm constraints (Hastie et al., 2009). For example we can use the L 2 regression to improve the predictive power of the hedging relationships. We can also estimate the hedging portfolios with the L 1 penalty.

20 Example 10 We consider a universe of four assets. The expected returns are ˆµ 1 = 7%, ˆµ 2 = 8%, ˆµ 3 = 9% and ˆµ 4 = 10% whereas the volatilities are equal to ˆσ 1 = 15%, ˆσ 2 = 18%, ˆσ 3 = 20% and ˆσ 4 = 25%. The correlation matrix is the following: Ĉ = In Table 7, we have reported the results of the hedging portfolios. The OLS coefficients ˆβ i, the coefficient of determination Ri 2 and the standard error ŝ i of residuals are computed thanks to the formulas given in Appendix A.4. We also have computed the conditional expected return 17 µ i = ˆµ i ˆβ i ˆµ( i). We can then deduce the corresponding information matrix I and the MVO portfolio x for γ = 0.5. We finally obtain a very well balanced allocation, because the weights range between 19.28% and 69.80%. Let us now change the value of the correlation between the third and fourth assets. If ρ 3,4 = 95%, we obtain results given in Table 8. In this case, the story is different, because the optimized portfolio is not well balanced. Indeed, because two assets are strongly correlated, some hedging relationships present high value of R 2. The information matrix is then very sensitive to these hedging portfolios. This explains that the weights are now in the range between % and %! Table 7: Hedging portfolios when ρ 3,4 = 40% Asset ˆβi R 2 i ŝ i µ i x % 11.04% 1.70% 69.80% % 14.20% 2.06% 51.18% % 16.31% 2.85% 53.66% % 19.12% 1.41% 19.28% Table 8: Hedging portfolios when ρ 3,4 = 95% Asset ˆβi R 2 i ŝ i µ i x % 10.88% 3.16% % % 14.66% 2.23% 52.01% % 5.89% 1.66% % % 6.90% 1.61% % 4 Some applications In this section, we look at three applications which are directly linked to the previous framework. The first application concerns the relationship between the MVO portfolio and the principal portfolios derived from PCA analysis. The second application shows the usefulness of the L 2 covriance matrix regularization. We finally illustrate how the Lasso approach may improve the robustness of the information matrix and hedging portfolios in the third application. 17 We note that µ i is also equal to the intercept ˆβ 0 of the linear regression.

21 4.1 Principal portfolios We first consider the problem of mean-variance optimization in a multi-assets universe. We have shown that this portfolio has implicit exposition to arbitrage and risk factors that we call principal portfolios (Meucci, 2009). The universe is composed of ten indices, four developed market equity indexes, one emerging market equity index, two bond indexes, two currency indexes and one commodity index: S&P 500 index, Eurostoxx 50 index, Topix index, Russell 2000 index, MSCI EM index, Merrill Lynch US High Yield index, JP Morgan Emerging Bonds index, EUR/USD and JPY/USD exchange rates and S&P GSCI Commodity index. To better understand the importance of principal portfolios, we look at the mean-variance optimization at the end of The covariance matrix is estimated using historical daily returns from January 2006 to December We then assume that this portfolio is held until the end of For the purpose of the study, we will consider perfect views of the market. The expected returns are therefore the realized returns of each asset from January 2007 to December We report in Table 9 the expected return, the volatility and the Sharpe ratio of each asset. The mean-variance optimized portfolio is computed under a 10% volatility constraint 18. The optimized weights are given in the last column in Table 9. Table 9: Statistics and MVO portfolio at the end of 2006 Asset ˆµ i ˆσ i SR i x i SPX 1.27% 10.04% % SX5E 6.24% 14.65% % TPX 10.60% 18.85% % RTY 4.84% 17.37% % MSCI EM 30.15% 18.11% % US HY 3.07% 1.69% % EMBI 0.50% 4.08% % EUR/USD 9.27% 7.72% % JPY/USD 2.04% 8.28% % GSCI 24.76% 21.17% % Let us now study the decomposition of the MVO portfolio by its principal portfolios. Their weights are given in Table 10. By construction, each principal portfolio is independent from the others and is composed by all the assets (Meucci, 2009). The MVO portfolio is then a combination of these principal portfolios. It follows that the optimal weight x i of an asset i is the sum of the exposure β j of the principal portfolio F j in the MVO portfolio multiplied by the weight w i,j of the asset i in the portfolio F j : x i = 10 j=1 β j w i,j We have also reported the expected return µ j and the ex-ante volatility σ j associated to each principal portfolio F j. The first portfolio F 1 is the riskier portfolio. We notice that it is a long-only portfolio and it has the profile of a risk weighted portfolio. Assets with a lower volatility have then a lower weight. As a result, the other principal portfolios can 18 In this case, γ is equal to 2.39%.

22 be considered as neutral risk weighted portfolios, because they are uncorrelated to the first principal portfolio. Nevertheless, if we consider the exposures β j, we observe that the first principal portfolio is underweighted compared to the other principal portfolios. Table 10: Decomposition of the MVO portfolio at the end of 2006 Asset F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 F 9 F 10 SPX SX5E TPX RTY MSCI EM US HY EMBI EUR/USD JPY/USD GSCI β j µ j σ j SR j Let us consider the other principal portfolios. For instance, the principal portfolio F 8 can be interpreted as an arbitrage portfolio which bets on large cap versus small cap equities whereas the principal portfolio F 7 is an arbitrage portfolio on FX spread. Principal portfolio F 10 seems to be an arbitrage portfolio between the two bond indexes. This is the portfolio with the highest expected Sharpe ratio 19 SR j and with the highest weight β j in the MVO portfolio 20. By using mean-variance optimization, we implicitly have a high exposure on this principal portfolio. Indeed, The US HY index has a high negative expected Sharpe ratio ( 1.82) whereas EMBI index has a weak positive expected Sharpe ratio (0.12) resulting in a principal portfolio which is short of the US HY index and long of the EMBI index. Since the correlation between the two indexes is about 60% and their volatilities are low, the principal portfolio has also a low volatility and its weight in the MVO portfolio is dramatically high (β 10 = %). In this case, the performance of the MVO portfolio is strongly dependent on the performance of the tenth principal portfolio. We have computed the realized performance of these portfolios over Results are reported in Table 11. The realized volatility of the MVO portfolio is 14.63%, which is above the targeted volatility of 10%. This suggests that the allocation defined at the end of 2006 was probably too optimistic. We also notice that the riskiest portfolio is the tenth principal portfolio. This result is not surprising because, even if this portfolio was the less risky portfolio in an ex-ante viewpoint, it was also the most leveraged portfolio. In Figure 7, we have represented the cumulative performance of the MVO and F 10 portfolios. We can see that their behavior is very close 21. If we consider the Sharpe ratio, the better portfolio is the principal portfolio F 4, and not the principal portfolio F 10, even if the investor has the right views on expected returns It is equal to The weight of the principal portfolio F 10 is equal to %. 21 The correlation between the MVO and F 10 (resp. F 4 ) portfolios is 73% (resp. 21%) in We remind that the expected returns are exactly equal to the realized returns in 2007.

23 Table 11: Performance of MVO and principal portfolios in 2007 MVO F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 F 9 F 10 µ j σ j SR j Figure 7: Cumulative performance of MVO and principal portfolios in 2007 Remark 7 As mentioned earlier, the problem comes from the fact that the MVO portfolio is sensitive to the information matrix. Let us now consider the risk budgeting construction, which is not sensitive to the information matrix but to the covariance matrix. We obtain the results given in Table 12 in the case of the ERC portfolio 23. We can see that the risk allocated to the first principal portfolio is the higher. We retrieve the fact that risk budgeting portfolios make less active bets than MVO portfolios (Roncalli, 2013). Table 12: Decomposition of the ERC portfolio at the end of 2006 F 1 F 2 F 3 F 4 F 5 F 6 F 7 F 8 F 9 F 10 β j µ j σ j SR j The implied expected returns are then equal to µ = φˆσx erc (Roncalli, 2013). The value of φ is scaled to target a 10% volatility.

A Fast Algorithm for Computing High-dimensional Risk Parity Portfolios

A Fast Algorithm for Computing High-dimensional Risk Parity Portfolios Théophile Griveau-Billion Quantitative Research Lyxor Asset Management, Paris theophile.griveau-billion@lyxor.com Jean-Charles Richard