Eigen Portfolio Selection: A Robust Approach to Sharpe. Ratio Maximization

Size: px

Start display at page:

Download "Eigen Portfolio Selection: A Robust Approach to Sharpe. Ratio Maximization"

Ami Cannon
5 years ago
Views:

1 Eigen Portfolio Selection: A Robust Approach to Sharpe Ratio Maximization Danqiao Guo 1, Phelim Boyle 2, Chengguo Weng 1, and Tony Wirjanto 1 1 Department of Statistics & Actuarial Science, University of Waterloo 2 Lazaridis School of Business & Economics, Wilfrid Laurier University This version: January 2018 Abstract This paper shows how to pick optimal portfolios by modulating the impact of estimation risk in large covariance matrices. The portfolios are selected to maximize their Sharpe ratios. Each eigenvector of the covariance matrix corresponds to a maximum Sharpe ratio (MSR) portfolio for a different set of expected returns. Assuming that the portfolio manager has views on the future expected returns, a portfolio consistent with her views can be approximated by the first few eigenvectors of the sample covariance matrix. Since the estimation error in a large sample covariance matrix tends to be most severe in the eigenvectors associated with the smallest eigenvalues, the elimination of the tail eigenvectors reduces estimation error. We substitute the vector of expected excess returns by its lower-dimensional approximation, so that the MSR portfolio is not contaminated by the estimation errors in the tail. To seek a balance between the approximation error and the estimation error, we set a tolerance limit for the former and make best efforts to control the latter. We further introduce a more general spectral selection method, which uses non-consecutive eigenvectors to approximate the expected excess returns. According to simulation and real-data studies, the advantage of the spectral selection method becomes apparent when the number of assets is large compared with the sample size. We thank Raymond Kan and seminar participants at University of Toronto for useful comments and suggestions on the early version of this paper under the title Enhanced Sharpe Ratio via Eigen Portfolios Selection. However any remaining errors and omissions are ours alone. 1

2 1 Introduction Markowitz s mean-variance theory, despite its theoretical appeal, has not been widely used in its original form in practice. One of the main reasons (DeMiguel et al. [2007], Kan and Zhou [2007]) is that the estimation errors in the expected returns and the covariance matrix often have an adverse interaction with the optimization. As Merton [1980] points out, estimating expected returns from the time series of realized returns is very difficult, while the covariance matrix can be much more accurately estimated from the historical data. The erratic nature of financial returns encourages investors to forgo estimating expected returns from history and instead resort to either the minimum variance (MV) portfolio or the maximum Sharpe ratio (MSR) portfolio with a better input for the expected returns. The former can be viewed as a special case of the latter if we set the expected return of all assets to be equal. Another example of the latter type is Martellini [2008], who proposes using total volatility as a model-free estimate of a stock s excess expected return. In practice, sophisticated investors often possess adequate resources, e.g. proprietary data and models, to form their own perspective on the cross-sectional expected returns, while relying on the realized returns to estimate the covariance matrix. Our goal in this paper is to devise a robust approach to constructing a vast MSR portfolio given that the investor has formed his/her own view of expected returns. Many attempts have been made to guard portfolios against the proliferation of the estimation errors in the inputs. Some of these approaches do not explicitly aim at improving the MSR portfolio but address a broader set of portfolio optimization problems. As we will see, they provide insightful ideas which can be applied to improving the MSR portfolios. Jagannathan and Ma [2003], for instance, point out that imposing no-shortsale constraints helps reduce portfolio risk and explain why such constraints are effective. Fan et al. [2012] introduce a gross exposure constraint which bridges the gap between the no-short-sale constrained portfolios in Jagannathan and Ma [2003] and the unconstrained problem of Markowitz [1952]. Tu and Zhou [2011] demonstrate that an optimal combination of the naive 1/N portfolio and one of the more sophisticated strategies generally outperform the 1/N portfolio. Another strand of research focuses on improving the quality of the covariance matrix 2

3 estimator used in the optimization. Notably, Ledoit and Wolf [2004] propose shrinking the sample covariance matrix towards a multiple of the identity matrix so that the overdispersed sample eigenvalues are pushed back towards their grand mean. In a related study, Ledoit and Wolf [2014] propose a more flexible covariance matrix estimator which shrinks the sample eigenvalues in a nonlinear manner. Frahm and Memmel [2010] derive two estimators for the global minimum variance portfolio that dominate the traditional estimator with respect to the out-of-sample variance of portfolio returns. More recently, Fan et al. [2013] develop a Principal Orthogonal complement Thresholding (POET) method to deal with the estimation of a high-dimensional covariance matrix with a conditional sparse structure and fast-diverging eigenvalues. Lastly, Carrasco and Noumon [2011] investigate four regularization techniques to stabilize the inverse of the covariance matrix in a Sharpe ratio maximization problem and derive a data-driven method for selecting the tuning parameter in an optimal way. Most existing plug-in methods treat the estimation of the expected returns and the covariance matrix as two separate tasks. This particular estimation strategy may explain why the Sharpe ratio maximization problem with given expected returns has not become a more popular research topic although it is very closely related to the problem of covariance matrix estimation. It is helpful to use an example to support our claim that the Sharpe ratio maximization problem with a given vector of expected returns is more than just a problem of covariance matrix estimation. Here is a simple argument to explain the intuition for this. Suppose that we have a sample covariance matrix such that the eigenvector corresponding to the largest eigenvalue can be perfectly estimated, while the remaining eigenvectors are subject to significant estimation error. Then, if the vector of expected excess returns is an exact multiple of the eigenvector associated with the largest eigenvalue (hereafter referred to as the dominant eigenvector), we can show that the sample-based estimator for the MSR portfolio is exactly the true MSR portfolio. However, if the vector of expected excess returns differs from the dominant eigenvector, it is easy to verify that the resulting estimator for the MSR portfolio will be susceptible to the estimation errors in the non-dominant eigenvectors. Therefore, the quality of the sample-based MSR portfolio depends not only on the quality of the covariance matrix estimator, but also on whether the vector of expected excess returns is spanned by the 3

4 dominant eigenvector or not. Admittedly our assumption about the distribution of estimation error in the sample covariance matrix is too extreme. But when relaxed it leads to a fruitful extension. The estimation error becomes progressively more important as we move away from the dominant principal component. Shen et al. [2016] examine the consistency of the principal component analysis under different asymptotics. They show that in the case of a highdimensional K-factor model, which is commonly assumed (Bai and Ng [2002], Fan et al. [2013]), the first K sample eigenvalues and eigenvectors are consistently estimated, while very little can be said about the tail sample eigenvalues and eigenvectors. This distribution of estimation errors together with the earlier example suggests an expected returns approximation approach to improve the MSR portfolio. To appreciate this claim, suppose that we have p assets. Next we replace the vector of expected excess returns with its projection onto a lower-dimensional space spanned by the first K eigenvector 1 of the sample covariance matrix so that the resulting MSR portfolio is not contaminated by estimation errors in the tail eigenvalues and eigenvectors. If the true expected excess returns vector is very close to its projection, little information about the expected returns is lost and at the same time severe estimation errors can be avoided. In a nutshell, the key idea behind the expected returns approximation method is to introduce some approximation error to mitigate estimation errors. Moreover, we control the maximum amount of approximation error to be introduced and do as much as possible to avoid the estimation errors. This method of projecting the expected excess returns vector onto the subspace spanned by the first K sample eigenvectors is shown to be equivalent to keeping the expected excess returns vector unchanged and replacing the inverse of the covariance matrix estimator with an alternative matrix which is constructed by using the first K principal components of the original covariance matrix. In the literature, the method of discarding the eigenvectors associated with the smallest eigenvalues is known as the spectral cut-off method (Carrasco et al. [2007]). In a related work, Carrasco and Noumon [2011] propose a data-driven method to select the optimal number of principal components to be kept in the spectral cut-off method. The equivalence between the expected returns approximation method and the spectral cut-off method 1 Eigenvectors are sorted according to their associated eigenvalues, from the largest to the smallest. 4

5 motivates us to transform the problem of determining the K into a problem of determining a new parameter δ that measures the deviation between the expected returns vector and its projection. This deviation is captured geometrically by the sine of the angle between the original expected excess returns vector and its projection. As a tuning parameter, δ is preferred to K because it has both an economic and geometric interpretation. On one hand, it measures the maximum distortion in expected returns that can be tolerated, and on the other hand, it specifies the maximum sine of an angle representing an approximation error. Thus, even without a fine tuning procedure, we have a rough idea about the size of δ. Another appealing feature of δ is that once we have decided on a value of δ, as we update the forecasted returns and the sample covariance matrix, the resulting K automatically changes according to the spatial relationship between the updated expected returns vector and the eigenvectors of the updated sample covariance matrix. For example, if the current expected returns vector is very close to the dominant eigenvector of the current covariance matrix, we would set the value of K to 1. However, if on the next rebalancing date the updated expected return vector cannot be well approximated even when using the first 100 eigenvectors of the updated covariance matrix, we would choose a much larger value of K. Therefore, δ can be viewed as a dynamic selector of K. More importantly, during the rebalancing procedure, the expected returns approximation error is guaranteed to be below δ. Therefore the amount of approximated error we introduce is always under our control. In this paper, we go beyond approximating the expected returns using consecutive eigenvectors and propose a spectral selection method, which allows the expected returns vector to be approximated by a few non-consecutive eigenvectors. However we still want the tail eigenvectors to be excluded for the sake of robustness. It will be seen that this objective can be translated into a LASSO problem with coefficient-varying penalty factors. The spectral selection method generalizes the spectral cut-off method and will be shown to outperform the latter in numerical examples. In particular, we will show that there can be a blind spot in the spectral cut-off method, in which case we resort to the spectral selection method. In the real world, most investors, even those sophisticated ones, deem it challenging to make a precise prediction about future asset returns. Instead, they are more comfortable 5

6 with predicting a range for the returns. This return uncertainty problem has been thoroughly discussed in Garlappi et al. [2006], which illustrates several possible specifications of the uncertainty. In this paper, we point out that the spectral cut-off and spectral selection method are applicable when return uncertainty is present. If we are provided with a vector of expected returns as well as a confidence region 2, we just need to search for the lowest-dimensional approximation for the expected excess returns over the confidence region. In this scenario, we do not need to specify the value of δ, instead, it is implied by the size of the confidence region. A tighter confidence region implies a smaller value of δ. This paper makes four contributions to the literature. First, we establish a connection between an eigen portfolio and an MSR portfolio and clarify the rationale behind holding an eigen portfolio. Second, by showing the equivalence between the expected returns approximation approach and the spectral cut-off method in the literature, we cast a new light on the economic interpretation of the latter. Third, we introduce a judicious tuning parameter δ into the spectral cut-off method. This new tuning parameter has both a geometric and economic interpretation and acts as a dynamic selector of the currently used parameter. Lastly, inspired by the LASSO method for variable selection, we propose a spectral selection method for safeguarding an MSR portfolio against pervasive estimation errors in the less informative dimensions. The spectral selection method is a generalized version of the spectral cut-off method in that it allows the less informative dimensions to be represented by non-consecutive and not necessarily tail eigenvectors. The rest of this paper is organized as follows. Section 2 establishes the connection between an eigen portfolio and a Sharpe ratio maximization portfolio and lays the foundation of the spectral methods. Section 3 introduces two expected returns approximation approaches, the spectral cut-off and the novel spectral selection method, derives some of the theoretical properties, and illustrates the tuning parameter selection procedure. In Section 4 we use four simulated cases to demonstrate the effectiveness of the spectral cut-off and the spectral selection method in improving out-of-sample Sharpe ratios of portfolios. In Section 5 we use empirical returns from three major equity markets to evaluate 2 A confidence region is the high-dimensional counterpart of a confidence interval. Here we are somehow abusing the concept of confidence region, because we use it to describe all the possible expected excess returns according to the investor, not necessarily a statistical confidence region for a multi-dimensional estimator. 6

7 the out-of-sample performance of different portfolios. It is shown that the Sharpe ratios based on the spectral methods consistently outperform those under the equally weighted portfolios; especially the case in Japan. In addition, we also justify a heuristic value for the tuning parameter. 2 From eigen portfolio to MSR portfolio In this section we explain the close connection between portfolios based on the eigenvectors of the covariance matrix and portfolios that maximize the Sharpe ratio (MSR portfolios). The key result is that if the expected excess returns vector is proportional to a given eigenvector of the covariance matrix, then the portfolio with weights proportional to this eigenvector will be the MSR portfolio. Conversely if a portfolio whose weights are proportional to an eigenvector is the MSR portfolio, then the expected excess return vector is a simple multiple of the portfolio weights. We use a numerical example to illustrate this idea. Portfolios based on re-scaling the eigenvectors of the covariance matrix are called eigen portfolios. An appealing feature of eigen portfolios is that their returns are uncorrelated since the eigenvectors of the covariance matrix are linearly independent. This nice orthogonal property has been exploited by Steele [1995], Partovi et al. [2004], Avellaneda and Lee [2010], and Boyle [2014] among others. A discussion about the economic interpretation of eigen portfolios can be found in Laloux et al. [1999] and Gopikrishnan et al. [2001]. However it is not obvious from the literature as to why an investor would hold an arbitrary eigen portfolio. In this section, we find conditions under which this can be optimal. Specifically we show that each eigen portfolio is an MSR portfolio for a specific set of expected return vectors. Before we develop our approach it is useful to explain our notation. Throughout the paper, we denote matrices by bold capital letters, vectors by bold lower-case letters, and scalars by plain lower-case letters. We let Σ denote the true covariance matrix. We denote the vector of expected returns by µ. Let UΛU T = Σ represent the eigen decomposition of the true covariance matrix, where Λ is a diagonal matrix with diagonal elements λ 1 λ 2 λ p > 0 and U = (u (1), u (2),, u (p) ) is an orthogonal matrix. 7

8 Similarly, let Û ΛÛT = Σ denote the eigen decomposition of the sample covariance matrix, where Λ is a diagonal matrix containing the sample eigenvalues λ 1 λ 2 λ p > 0 and Û = (û(1), û (2),, û (p) ). Throughout the paper, we focus on the situations where Σ and Σ are invertible for the sake of clarity and without loss of generality. Furthermore, denotes the spectral norm of a square matrix and p denotes the L p norm of a vector. An MSR portfolio maximizes the excess return to standard deviation ratio: w msr = argmax w w T µ r f w T Σw, s.t. wt e = 1. (1) It is easy to check that the solution to this problem is w msr = Σ 1 (µ r f e) e T Σ 1 (µ r f e), (2) as long as the expected return of the global minimum-variance portfolio is higher than the risk-free rate 3, which is equivalent to requiring that e T Σ 1 (µ r f e) > 0. It may be helpful to first discuss the connection between the MSR portfolio and the eigen portfolio in a simplified setting before presenting and discussing the main result. Suppose that there is no risk-free asset, so that the MSR portfolio is given by w msr = Σ 1 µ e T Σ 1 µ. This portfolio lies on the efficient frontier and every portfolio (not necessarily on the frontier) which is orthogonal to this portfolio has an expected return equal to the zero beta return and they lie on a horizontal line in a mean-variance space (see Roll [1980]). Now assume that we have p eigen portfolios Z = (z (1), z (2),, z (p) ) whose weights are multiples of the eigenvectors of the covariance matrix, i.e., z (i) = u(i). These portfolios e T u (i) are mutually orthogonal. Suppose that µ happens to be proportional to z (i). Since Σ and Σ 1 have the same eigenvectors, the weights of this MSR portfolio will correspond to z (i). The other portfolios based on the remaining eigenvectors will be orthogonal to the 3 If this condition is not satisfied, the MSR portfolio does not exist, and the portfolio calculated from eq. 2 corresponds to the minimum Sharpe ratio portfolio. 8

9 MSR portfolio. We can run through all the eigenvectors in the same way by selecting the expected returns to correspond to each eigenvector. In each case our chosen portfolio will be on the frontier and correspond to the MSR portfolio. We can use an example based on the last discussion to illustrate the concept. Assume that there is no risk-free security. Suppose that Σ = σ2 0 The eigenvectors of this matrix, sorted in a descending order of their corresponding eigenvalues, are U = and the corresponding eigen portfolios are Z = Note that the portfolio weights sum to one. The first portfolio corresponds to the dominant eigenvector and has positive weights. We select an expected return vector which is proportional to this dominant eigenvector. We can arrange for the corresponding MSR frontier portfolio to have an expected return of 0.05 by a suitable scaling. In this case the selected expected return vector is If we focus instead on the fourth eigen portfolio whose weights are

10 The corresponding expected return vector in this case is Observe that the second asset which has a negative expected return has a negative weight. As before we have scaled the return vector so that the MSR frontier portfolio has an expected return of This example shows that if we pick the expected return vector to be a multiple of a given eigenvector, the corresponding eigen portfolio is an MSR portfolio. The next proposition provides a formal statement of the connection between an eigen portfolio and an MSR portfolio. Proposition 2.1. If the vector of expected excess returns is a non-zero scalar multiple of the ith eigenvector of Σ, so that this vector of expected excess returns sums to a positive number, i.e. µ r f e = au (i), a {a 0 : ae T u (i) > 0}, the MSR portfolio is exactly the ith eigen portfolio, i.e. w msr = z (i) = result. u(i). e T u (i) All of the proofs are given in the Appendix. We now provide some comments on this An interesting implication of Proposition 2.1 is that when the vector of expected excess returns is a scalar multiple of an eigenvector and sums to a positive number, the MSR portfolio reveals a return preserving property, namely, the investment in a given asset is directly proportional to its expected excess return. As our numerical example illustrates a short position corresponds to a negative expected excess return. It is easy to come up with a concrete case where a Sharpe ratio maximizing investor would like to hold an eigen portfolio. Suppose that an investor believes that asset returns are generated from a single-factor model with a constant residual variance (Partovi [2013]): r t r f e = β(r m,t r f ) + ɛ t, Var(r m,t ) = σ 2 m, ɛ t iid (0, σ 2 ɛ I p ), (3) where β is a p 1 vector of factor loading, r m,t is the factor return, (E[r m,t ] r f )e T β > 0, and 0 and I p represent a p 1 zero-vector and a p-dimensional identity matrix respectively. If this is the case, the investor believes that the expected return is µ = r f e+β(e[r m,t ] r f ) 10

11 and the covariance matrix is Σ = σ 2 mββ T + σ 2 ɛ I p. It is easy to verify that the dominant eigenvector of Σ is u (1) = 1 β 2 β and the remaining eigenvectors cannot be uniquely determined. Thus, the vector of expected excess returns is a scalar multiple of the dominant eigenvector and sums to a positive number. According to Proposition 2.1, this investor s optimal choice is to hold the dominant eigen portfolio. Proposition 2.1 can be readily extended to the scenarios where the vector of expected excess returns is a linear combination of eigenvectors. The following proposition discusses this more general case. Proposition 2.2. If the vector of expected excess returns is expressed by using eigenvectors of Σ as µ r f e = p i=1 a iu (i) and the inequality e T Σ 1 µ > 0 is satisfied, the MSR portfolio has weights w msr = p i=1 a i λ i u (i) p i=1 a i λ i e T u (i). An implication of Proposition 2.2 is that the vector of expected excess returns lies in the same subspace of the eigenvector space as the resulting MSR portfolio does. This proposition serves as a theoretical foundation and provides an intrinsic motivation for the expected returns approximation method which will be formally introduced in Section 3. The next result shows that if the MSR portfolio is an eigen portfolio the excess expected return vector is a multiple of this portfolio. This result can be viewed as the converse of Proposition 2.1. Proposition 2.3. If the ith eigen portfolio is the MSR portfolio, the implied vector of expected returns of the p assets is µ = r f e + au (i) for some a {a 0 : ae T u (i) > 0}. This Proposition enables us to pin down the expected return vector when a Sharpe ratio maximizing investor holds a single eigen portfolio. In practice this rarely happens. In general investors will hold combinations of the eigen portfolios. As we will now explain this basic idea can be extended to cover the case of a portfolio of eigen portfolios. Note from Proposition 2.3 that if all investors have homogeneous beliefs about the covariance matrix, any (Sharpe ratio maximizing) investor s belief on the expected returns is reflected by the portfolio he/she holds. This is because Proposition 2.3 states that under the Sharpe ratio maximization framework, each eigen portfolio is associated with a vector of expected excess returns. In addition, these vectors of expected excess returns are 11

12 pairwise orthogonal. Thus, we can imagine p different expected return vectors corresponding to the p eigenvectors. These p expected return vectors are pairwise orthogonal and provide a basis for any arbitrary expected returns vector. Therefore, any portfolio that a Sharpe ratio maximizing investor holds reflects his/her perspective on asset expected excess returns. Furthermore the investor s portfolio will be a linear combination of the p basis portfolios. This insight is formalized in the next proposition. Proposition 2.4. If a Sharpe ratio maximizing investor holds a portfolio which can be expressed as a linear combination of eigenvectors as w msr = p i=1 a iu (i), where p i=1 a ie T u (i) = 1, then the implied expected return is µ = r f e + c p i=1 a iλ i u (i), where c can be any constant so that e T Σ 1 (µ r f e) > 0 holds. In addition, a specific value of c and the corresponding µ have such a relationship that c is the excess return to variance ratio of the minimum variance portfolio. Sharpe [1974] was the first to propose a reverse optimization approach to deduce the expected return from the observed portfolio composition of an investor. Proposition 2.4 says more than his assertion by pointing out that we can calculate the loading of the implied expected excess return on each direction of the p basis vectors. 3 Expected returns approximation approach In the last section it was assumed that there was no estimation error in the inputs to the portfolio selection problem. In practice estimation errors in the expected returns vector and the covariance matrix are ubiquitous. There is an extensive literature on this topic (for a contemporary review see Kolm et al. [2014]). In this section we explain how the eigen portfolios can be used to tackle the estimation risk problem in portfolio selection. The basic idea of this is described as follows. Following the same logic as that in Proposition 2.2, we can show that in the presence of estimation errors, the vector of expected excess returns lies in the same subspace of the sample eigenvector space as the resulting sample MSR portfolio does. Therefore, if the expected excess return vector can be approximated well by using a few sample eigenvectors which are more precisely estimated, we can replace the original returns vector with its lower-dimensional approximation, so that the MSR portfolio is not contaminated by the more severe estimation errors in the 12

13 excluded principal components. Inevitably we introduce approximation error by ignoring some eigenvectors, and we discuss how to make this trade off. We describe two approaches. The first one uses the first K sample eigenvectors to approximate the expected excess return and will be shown to be equivalent to a spectral cut-off method in the literature. The second approach, known as the spectral selection method, is more of a regression and variable selection flavor. Under the second approach, the eigen portfolios used to approximate the return vector need not correspond to consecutive sample eigenvalues. 3.1 Another look at the spectral cut-off method We now describe the first approximation approach. Suppose that we want to construct an MSR portfolio of p assets. We believe that the vector of expected returns of the p assets is given by µ. Note that the µ here is not necessarily estimated from assets price history but can be obtained from any method, such as a fundamental analysis. Recall that the sample covariance matrix we observe is Σ, and its sample eigenvalues and eigenvectors are denoted by Λ = diag( λ 1, λ 2,, λ p ) and Û = (û(1), û (2),, û (p) ) respectively. If the conventional sample covariance estimator based method is used for deriving the MSR portfolio, we get ŵ msr = Σ 1 (µ r f e) e T Σ 1 (µ r f e). However, our concern is that there may exist severe estimation errors in Σ and that inverting the matrix further amplifies the errors, especially those in the tail sample eigenvectors (Shen et al. [2016]). To mitigate the influence of estimation errors on ŵ msr, we propose a method of approximating µ r f e in the K-dimensional space spanned by the first K eigenvectors of Σ. If the approximation is very close to the origin vector, we use the approximation to replace µ r f e. In this way, only the first K sample eigenvalues and eigenvectors are involved in ŵ msr ; thus we avoid the estimation errors in the smallest p K sample eigenvalues and eigenvectors (recall Proposition 2.2). Next, we find the optimal approximation of µ r f e in the K-dimensional space. We say an approximation is optimal if its L 2 distance from the original vector is minimized. We let K i=1 a iû (i) denote the approximated vector of expected excess returns in the K- dimensional space and solve for the a i s from the following minimization problem. 13

14 K 2 min a 1,,a K µ r f e a i û (i) i=1 2 }{{} approximation error (4) For clarity, we let the matrix of the first K sample eigenvectors be denoted by ÛK = (û (1),, û (K) ) and the diagonal matrix containing the first K sample eigenvalues be denoted by Λ K = diag( λ 1,, λ K ). In addition, a K = (a 1,, a K ) T denotes the K 1 vector to be determined. It is easy to check that the solution to the minimization problem in eq. (4) is a K = ÛT K(µ r f e). As a result, the approximation vector of the expected excess return is µ (K) r f e = ÛKÛT K(µ r f e), which is the projection of µ r f e onto the space spanned by the first K sample eigenvectors. Replacing µ r f e by µ r f e, we obtain the following MSR portfolio, according to Proposition 2.2. ŵ msr K = e Σ 1 K (µ r f e) T Σ 1 K (µ r f e), (5) where Σ 1 K 1 = ÛK Λ K ÛT K. The logic behind this method is that in practice the investor needs to make a certain amount of sacrifices in his/her belief of what level the expected returns should be at to avoid the estimation errors in the small sample eigenvalues and their corresponding sample eigenvectors. The extent of the sacrifices is quantified by the L 2 distance between µ r f e and its approximation, and when a K is given, it is desirable to have the distance to be as small as possible. If µ r f e is exactly a linear combination of the first K sample eigenvectors, we can select a K to make the distance equal to 0. In this case, even if the conventional method for constructing the MSR portfolio is used, the smallest p K sample eigenvalues and their corresponding eigenvectors do not play any role in determining the optimal portfolio. 14

15 There is an interesting finding according to the last equality in eq. (5) that is worth pointing out. In the Sharpe ratio maximization framework, the method of approximating µ r f e in the space spanned by the first K eigenvectors is equivalent to keeping µ r f e unchanged and using Σ 1 K to replace Σ 1. Recall that Σ 1 K is a modified inverse covariance matrix which discards the eigenvectors associated with the smallest p K sample eigenvalues. This specific way of modifying an inverse covariance matrix as well as the motivation behind it is introduced in Carrasco et al. [2007], Carrasco and Noumon [2011], and Guo et al. [2017]. We hereafter refer to Σ 1 K as the spectral cut-off regularized precision matrix, or the spectral cut-off matrix for short. The spectral cut-off method was originally introduced as a stabilizing technique in inverting an ill-posed 4 sample covariance matrix. While in the context of Sharpe ratio maximization, the spectral cut-off method has an economic explanation: By cutting-off the tail principal components, this method actually modifies the vector of expected excess returns to be a projection of the original vector. Thus, as long as the vector of expected excess returns can be well approximated by a linear combination of the first K eigenvectors, the spectral cut-off method can be justified by the argument that it avoids the estimation errors in the tail sample eigenvalues and eigenvectors with little loss of information. Since the approach of approximating the expected excess returns vector using the first few eigen portfolios is equivalent to the spectral cut-off method in the sense that both lead to the same portfolio estimator, we do not coin this the approximation method with a new name and continue to use the term spectral cut-off method when referring to it Consistency of ŵ msr K under a K-factor model This section is devoted to showing that if the cross-sectional asset returns come from a latent K-factor model endowed with certain properties, then ŵk msr, the weight estimator based on the spectral cut-off method with parameter K, converges almost surely to wk msr, which represents a distortion of the true optimal weight under the high-dimensional asymptotics where both p and n go to infinity at the same rate. 4 The sample covariance matrix can be ill-posed or even invertible, especially when multicollinearity in different assets is present or when the sample size is smaller than the number of assets. 15

16 We use the same model setup as in Fan et al. [2013] and assume without loss of generality that the risk-free rate is zero. Consider a latent K-factor model where K is viewed as a fixed number as p increases: y it = b i f t + u it, i = 1,, p, t = 1,, n, (6) where y it is return of the ith asset on day t and is the only quantity observable, f t is a K 1 random vector containing the K latent common factors on day t and has a nondegenerate covariance matrix cov(f t ), b i is the K 1 factor loading of the ith asset, and u t = (u 1t, u 2t,, u pt ) T is the vector of residual returns on day t and has mean 0 and covariance matrix Σ u. Note that in eq. 6 we do not require Σ u to be diagonal. Instead, we allow some pairs of returns to be weakly correlated given the common factors. Therefore, the true covariance matrix, in matrix notations, is Σ = BB T + Σ u, The following assumptions are needed to show the consistency of the spectral cut-off method based portfolio weight estimator. Assumption 3.1. Both {f t } n t=1 and {u t} n t=1 are second-order stationary time series. This assumption ensures that the covariance matrix is stable across time. Assumption 3.2. The eigenvalues of p 1 B T B are bounded away from zero for all large p. This assumption implies that the first K eigenvalues of Bcov(f t )B T diverge at a rate of O(p), which holds when all of the K factors are pervasive (Fan et al. [2013]). Assumption 3.3. There exists a constant c 1, such that Σ u c 1 as p increases. This assumption says that there is no omitted common factors in eq. 6 which help explaining {u t } n t=1. This residual time series only represents the idiosyncratic part of the asset returns, and the idiosyncratic risk of each asset is bounded. Proposition 3.1. As Assumptions hold, ŵk msr, which is the portfolio weight estimator based on the spectral cut-off method with parameter K, converges almost surely 16

17 to a distortion of the target MSR portfolio w msr K = Σ 1 (µ K r f e) e T Σ 1 (µ K r f e), where µ K r f e is the projection of µ r f e on the subspace spanned by the population eigenvectors {u (1), u (2),, u (K) }, as p, n and p/n c (0, 1). Proposition 3.1 conveys the key idea behind the expected returns approximation method: we intentionally introduce some bias into µ to ensure the convergence of the portfolio estimator. The appeal of this method depends critically on how much bias we introduce for ensuring the convergence. If µ K r f e is close to µ r f e, we do not lose much while gain a lot, otherwise, the method will yield a portfolio which is far from the true optimal one. If we just choose the number of common factors K as the parameter in the expected returns approximation, or equivalently, the spectral cut-off method, the quality of the resulting portfolio is not under our control, since we cannot control the closeness between µ K r f e and µ r f e. This motivates a discussion on how to select the tuning parameter K when applying the spectral cut-off method to portfolio construction problems Selection of tuning parameter According to the discussion in the previous section, if we want a consistent estimator for the weight of a biased version of the true MSR portfolio, the number of principal components we should keep is simply the number of factors, given that a factor model defined in eq. (6) satisfies A great deal of study, such as Bai and Ng [2002], Onatski [2009], and Onatski [2012], has been devoted to finding a good estimator for the number of factors in a factor model. Bai and Ng [2002] proposed a consistent estimator K when n and p both go to infinity. However, it is notable that even though ŵ msr K is a consistent estimator, using this estimator in real-world portfolio construction tasks causes problems. This is because as mentioned earlier the bias we introduce is not under our control but implied by K. Thus the MSR portfolio estimator could be far from what we desire. According to Bai and Ng s method of estimating K, in most real-world scenarios the estimate we obtain is not greater than 2. While it is highly demanding to approximate an arbitrary expected excess returns 17

18 vector well in such low-dimensional subspace. Thus, if we keep as few as K principal components in the spectral cut-off method, we would lose a lot of useful information. Another problem is that the consistency of ŵ msr K relies on the property that the eigenvalues of the true covariance matrix are well-separable: The first few eigenvalues increase at the same rate as p, and the remaining ones are bounded. For real-world asset returns covariance matrices however, we cannot clearly distinguish between diverging and bounded eigenvalues, instead we face some weak factors in the sense that they increase at a lower rate than p. Onatski [2012] points out that under a weakly influential factor asymptotics, the true number of factors is in general unidentified. Due to the aforementioned reasons, we pursue other methods to determine the number of principal components to keep. The equivalence between the spectral cut-off method and the method of approximating the expected returns vector using the first few sample eigenvectors suggests a new way to select K. The new method integrates information from µ into the tuning parameter selection procedure. The idea is to specify the maximum relative approximation error we can tolerate and then select the minimum tolerable K. The number of principal components to keep, according to our newly proposed method, is { K(δ) = min K : (µ r f e) (µ (K) r f e) 2 µ r f e 2 } δ, (7) where δ is the aforementioned tolerance limit for the relative approximation error. Now, δ becomes the tuning parameter. The parameter δ is much more preferable compared with K. The most important reason is that by setting the value of δ, we control the amount of approximation error to be introduced. In contrast, when the conventional cross-validation method for selecting K is used, the implied approximation error depends on the data and can be so substantial that the resulting portfolio is far from the true MSR portfolio. Another reason is that there are both a geometric and economic interpretations for δ, thus we have a rough idea about the reasonable value of δ before any fine tuning is carried out, e.g., δ = 0.1. In addition, once the value of δ is determined, each time the MSR portfolio is updated according to the latest µ and Σ, the value of K automatically changes according to the spatial relationship between the new expected returns vector and the eigenvectors of the updated covariance 18

19 matrix estimator. Thus, δ is a dynamic selector of K. In contrast, if we would like to pre-determine a value of K, as µ and Σ are updated, we cannot control the approximation error in any way. Lastly, adopting a reasonable value of δ is an effective way to avoiding the high computational cost incurred by a cross-validation method. There is always a trade-off between the approximation error and the estimation error in choosing δ: If δ is too large, µ will be exposed to too much distortion. If δ is too small, the resulting weight estimator will reduce to the sample covariance matrix based estimator, which is highly vulnerable to estimation errors in Σ. In the empirical study section, we will discuss the use of a heuristic value for δ. 3.2 Spectral selection method In this section, we discuss another expected returns approximation approach. We start by presenting an illustrative example which motivates the new approach. Suppose that the vector of forecasted excess returns has such a relationship with the sample eigenvectors that µ r f e = û (1) + û (p) and that the spectral cut-off method is employed to obtain a more robust MSR portfolio. We consider what will happen if only the last principal component is cut-off, i.e., when K = p 1. It turns out that in this case the relative approximation error between µ r f e and µ (p 1) r f e is µ r f e (µ (p 1) r f e) 2 µ r f e 2 = û (1) 2 û (1) + û (p) 2 = This relative approximation error is obviously too large. Therefore, even if we only cut-off a single principal component, the approximation error incurred is unacceptable, which means that under this particular specification of µ and Σ, the spectral cut-off method fails to cut off anything. The illustrative example inspires us to consider the following question: Can we approximate the expected excess returns using some not necessarily consecutive sample eigenvectors but still allowing us to be more cautious about including the tail eigenvectors? Bearing this objective in mind, we devise the following way of approximating µ r f e: 19

20 min a 1,,a p p 2 µ r f e a i û (i) + γ i=1 2 }{{} approximation error p i=1 a i λ i }{{} induce sparsity. (8) The basic idea underlying problem (8) is similar to that of (4). We want to find a good approximation of the expected excess return µ r f e in a lower-dimensional space, so that we can avoid the estimation errors in some directions. Therefore we impose a Lasso-type penalty to encourage sparsity in the solution. Moreover, since the tail sample eigenvalues and eigenvectors are likely to be poorly estimated compared with the head ones, we penalize the a i s differently so that the eigenvectors associated with the small sample eigenvalues less likely enter the set of eigenvectors used to approximate µ r f e. In addition, the eigenvectors which hardly contribute in approximating µ r f e are less likely to be included as well. Using matrix notation, the optimization problem in eq. (8) can be re-written as: a (γ) = arg min (µ r f e Ûa)T (µ r f e Ûa) + γ Λ 1 2 a 1, (9) a where a = (a 1, a 2,, a p ) T. If we view Λ 1 2 a as a whole, eq. (9) is a standard Lasso objective function to be minimized to obtain Λ 1 2 a (γ) and thus a (γ). Then, we replace µ r f e with its approximation µ (γ) r f e = Ûa (γ) and find the weights of the corresponding MSR portfolio based on Proposition 2.2. Since there is no closed-form solution to the spectral selection problem, we cannot directly connect it to a precision matrix estimation problem as in the spectral cut-off method Selection of tuning parameter In the spectral selection method, γ controls the sparsity of the solution as well as how much the approximated vector of expected excess returns deviates from the original vector. When γ is zero, the inequality constraint is not binding and the resulting portfolio is just the sample-based MSR portfolio. As γ increases, the LASSO penalty encourages an increasingly sparse solution. For a comparison purpose, we apply the same tolerance limit of a relative approximation error δ and select the maximum tolerable γ, i.e. 20

21 { γ(δ) = max γ : (µ r f e) (µ (γ) r f e) 2 µ r f e 2 } δ. (10) It is notable that with the LASSO-type penalty in presence, the approximated vector of expected excess returns is no longer a projection of the original vector. Thus, the relative approximation error is not the sine value of an angle anymore. But it is still a reasonable measure of the approximation error. Compared with the spectral cut-off approach, the spectral selection approach works in a wider range of scenarios, especially when the vector of expected excess returns has a heavy loading on some tail sample eigenvectors. This will be illustrated by using a numerical example in Section Spectral methods with return uncertainty So far we have assumed that an investor has a point view on the asset expected returns. However, in the real world, even the most sophisticated investors cannot provide an infinitely narrow prediction region (Garlappi et al. [2006]). In this section, we will discuss the application of the spectral methods in the scenarios where the investor provides a confidence region around the vector of expected returns. Before proceeding to the methods, we introduce a new notation, M, which denotes the collection of all possible vector of expected excess returns according to the investor s prediction. The point prediction is still denoted by µ r f e. Previously, given the value of δ, which quantifies the greatest tolerable extent of expected returns distortion, we search for the approximated expected excess returns that leads to the maximum dimension reduction within the legitimate range implied by δ. When an investor comes with an explicit set of possible expected returns, he/she has implicitly set the parameter δ. In this scenario, we can determine the value of K in the spectral cut-off method or the value of γ in the spectral selection method based on M. Instead of searching for the approximated expected excess returns within the range implied by δ, we directly search within the confidence region for the expected excess returns. 21

22 Therefore, eq. (7) becomes K(M) = min { K : µ (K) r f e M }, (11) where µ (K) r f e = ÛKÛT K (µ r f e). Similarly, eq. (10) becomes γ(m) = max { γ : µ (γ) r f e M }, (12) where µ (γ) is calculated by using the solution to the optimization problem in eq. (8). Therefore, when the confidence region for the expected returns is given, we do not need to specify the value of δ any longer. We say that a value of δ is implied by a confidence region M if δ = max{δ : K(δ) = K(M)}. Then, it is easy to check that if M 1 and M 2 are two confidence regions and µ r f e M 1 M 2, it follows that δ 1 δ 2, where δ 1 and δ 2 are implied by M 1 and M 2 respectively. The intuition behind this result is that if an investor is very confident about her view so that she predicts a small region around µ r f e, then the investor can only tolerate a small amount of approximation error. 4 Simulation study In this section, we use a set of simulation results to elaborate on the methods proposed in Section 3. In the simulation study, we pre-specify the true covariance matrix Σ and the expected return µ. Again it is assumed without loss of generality that the risk-free rate is identically zero. For each set of parameters (sample size n and number of assets p), we repeat the experiment 500 times. In each replication, 2n random returns are independently generated from the distribution N p (µ, Σ). We use the first n observations to train the two spectral methods and determine the MSR portfolios estimators ŵ msr K(δ) we use the remaining n observations as a test set to assess these portfolios. and ŵmsr γ(δ). Then Different portfolio methods are evaluated based on their corresponding out-of-sample Sharpe ratio distribution. Throughout this section we use δ = 0.1 as the maximum acceptable relative approximation error. We perform the simulation under four different (n, p) combinations 22

23 to compare the performance of the spectral methods across dimensionality configurations. The Σ matrices specified in the simulation studies are all calibrated from daily returns of S&P 500 stocks using the well-known Fama-French three-factor model. We show how effective the spectral cut-off method and the spectral selection method are in maximizing the portfolio Sharpe ratio under four different specifications of µ: (1) µ e, (2) µ u (1), (3) µ u (1) + u (2) + + u (p), and (4) µ is randomly generated. In all the four cases µ is scaled so that the average annual expected return of all assets is 0.4. Case 1: µ e If µ is a (positive) scalar multiple of e, or alternatively, each asset has the same expected rate of return, the sample-based MSR portfolio reduces to the minimum-variance portfolio, since both portfolios have a weight estimator given by ŵ msr = ŵ mv = Σ 1 e e T Σ 1 e. We study this case because this special µ captures the view of an uninformed investor about µ. In addition, by setting µ to be proportional to the vector of ones we are not preassuming any direct connection between the µ and the eigenvectors of the (population) covariance matrix. We use this general case to convey an idea about how different methods perform in terms of improving out-of-sample Sharpe ratios. In the subsequent cases, we will specify a concrete relationship between µ and the eigenvectors and study the behavior of different portfolio estimators. Figure 1 presents the kernel density plot of the out-of-sample Sharpe ratios. Each of the four panels corresponds to a specific configuration of n and p, and each of the three colors listed in the legend represents a particular method. Note that the darker areas are caused by overlapping of the two colors. The vertical line in each panel is drawn at the true maximum Sharpe ratio. Comparing the four panels, we can make the following observations. First, as the p/n ratio becomes larger, the out-of-sample Sharpe ratios produced by the sample-based MSR portfolio move further and further away from the true maximum Sharpe ratio. This phenomenon reflects the vulnerability of large-scale portfolios to estimation errors. Second, in the highest-dimensional scenario, both the spectral cut-off method and the spectral selection method out-perform the sample-based method. This is because in a traditional big n small p scenario, the loss incurred by introducing an approximation error is not compensated by the gain from avoiding estimation errors, 23

since the latter is not quite obvious. Third, the spectral selection method dominates the spectral cut-off approach under all of the dimensionality settings.

24 since the latter is not quite obvious. Third, the spectral selection method dominates the spectral cut-off approach under all of the dimensionality settings. The reason behind this dominance is that the spectral selection method is more capable in discarding the useless eigenvectors since it is less restrictive on which ones should be discarded. Figure 1: Kernel density plot of out-of-sample Sharpe ratios: µ e Figure 2 shows the histograms of proportion of dimensions used by the spectral methods to approximate µ. We do not include the sample-based method because it does not involve finding an approximation of µ. Again, the darker regions are a result of overlapping. According to the histograms, with the same tuning parameter δ, the spectral selection method clearly leads to a smaller proportion of eigenvectors used in approximating the expected returns. 24

Figure 2: Histogram of proportion of dimensions used to approximate µ: µ e Case 2: µ u (1) Starting from this case, we pre-specify a relationship between µ and the eigenvectors of Σ to see whether

25 Figure 2: Histogram of proportion of dimensions used to approximate µ: µ e Case 2: µ u (1) Starting from this case, we pre-specify a relationship between µ and the eigenvectors of Σ to see whether the spectral cut-off and the spectral selection method help in improving the out-of-sample Sharpe ratios under different scenarios. In this case, we let µ be proportional to the dominant eigenvector. This specification is consistent with a singlefactor model with a constant residual variance, since under such a model, the dominant eigenvector is proportional to the factor loading, or beta, vector. Figure 3 shows the kernel density plot of the out-of-sample Sharpe ratios. As in the previous case, as we move towards a high-dimensional setting, both spectral methods improve in terms of the out-of-sample Sharpe ratio distribution. Another notable observation is that the histograms corresponding to the two spectral methods completely overlap. The reason is that in this case, since µ can be perfectly explained by the dominant population eigenvector, it is highly likely that µ can be better approximated by the dominant sample eigenvector, given that the dominant sample eigenvector can be relatively accurately estimated. 25

Figure 3: Kernel density plot of out-of-sample Sharpe ratios: µ u (1) Figure 4 provides support to the above explanation for the overlapping: the histograms corresponding to both spectral

26 Figure 3: Kernel density plot of out-of-sample Sharpe ratios: µ u (1) Figure 4 provides support to the above explanation for the overlapping: the histograms corresponding to both spectral methods reduce to a single bar at 100 p %, since in all of the dimensionality settings and in all of the replications, the dominant sample eigenvector approximates µ sufficiently well. 26

27 Figure 4: Histogram of proportion of dimensions used to approximate µ: µ u (1) Case 3: µ u (1) + u (2) + + u (p) In this case, we assume that µ has an equal loading on all of the eigenvectors of Σ, i.e. we let µ = u (1) + u (2) + + u (p). We intend to use this case to illustrate the superiority of the spectral selection method as well as to point out the scenarios where the spectral cut-off method hardly works. Figure 5 summarizes the out-of-sample Sharpe ratio distribution. As in the previous two cases, in the first two big n small p settings (the top panels), the difference between the spectral methods and the sample-based method is not clear cut. In the left bottom panel however, the spectral selection method demonstrates its superiority compared with the other two methods, which have a quite similar performance. The reason behind this similarity is that the equi-loading structure of µ makes it hard for the spectral cut-off method to cut off much since otherwise µ is not well approximated. As a result, the spectral cut-off method almost reduces to the sample-based method. 27

28 Figure 5: Kernel density plot of out-of-sample Sharpe ratios: µ u (1) + u (2) + + u (p) The above explanation is further supported by Figure 6. In the top two panels of Figure 6, for the spectral cut-off method, the highest spike appears at around 100%, which means that in most replications, the spectral cut-off method does not result in any dimension reduction. In the two higher-dimensional settings, the highest spike still resides very closely to 100%. While in all the four panels, the highest bar for the spectral selection method is around 97%. Therefore, when µ has a heavy loading on one or several of the tail eigenvectors, the spectral cut-off method fails to work and almost reduces to the sample-based method. If this is the case, we need to resort to the more generalized spectral selection method to obtain a robust MSR portfolio. 28

29 Figure 6: Histogram of proportion of dimensions used to approximate µ: µ u (1) +u (2) + +u (p) Case 4: µ is randomly generated In the last case, we consider a µ, each of whose element is independently generated from N(0.4, 0.4) so that around 16% of the assets have negative expected returns. Once µ has been generated, we treat it as a fixed quantity representing the expected return which the investor believes in and use it across all replications. According to Figure 7, in the two lower-dimensional settings (the top two panels), there is no significant difference among the three methods. However in the two high-dimensional settings, the spectral selection method clearly leads to the highest average out-of-sample Sharpe ratio, followed by the spectral cut-off method. In addition, it is notable that in the two panels on the right side, we observe negative out-of-sample Sharpe ratios. This happens because we sometimes enter extreme positions but turn out to make incorrect bets. Negative Sharpe ratios are highly undesirable. According to the right bottom panel, the spectral selection method results in the smallest area under the fitted density curve in the negative half of the x-axis. Therefore, we conclude that the the spectral selection method is the most effective method in producing robust MSR portfolios. 29

Figure 7: Kernel density plot of out-of-sample Sharpe ratios: µ is randomly generated Figure 8 looks quite similar to the histogram in the previous case.

30 Figure 7: Kernel density plot of out-of-sample Sharpe ratios: µ is randomly generated Figure 8 looks quite similar to the histogram in the previous case. The spectral selection method almost always leads to some dimension reduction. Figure 8: Histogram of proportion of dimensions used to approximate µ: µ is randomly generated 30

Eigen Portfolio Selection: A Robust Approach to Sharpe Ratio Maximization

Eigen Portfolio Selection: A Robust Approach to Sharpe Ratio Maximization Danqiao Guo, Phelim P. Boyle, Chengguo Weng, and Tony S. Wirjanto Abstract This paper shows how to pick optimal portfolios by modulating