Meta Algorithms for Portfolio Selection. Technical Report

Meta Algorithms for Portfolio Selection Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street SE Minneapolis, MN 55455-0159 USA TR 10-022 Meta Algorithms for Portfolio Selection Puja Das and Arindam Banerjee September 20, 2010

Meta Algorithms for Portfolio Selection Puja Das Dept of Computer Science & Engg University of Minnesota, Twin Cities Minneapolis, MN 55455 pdas@cs.umn.edu Arindam Banerjee Dept of Computer Science & Engg University of Minnesota, Twin Cities Minneapolis, MN 55455 banerjee@cs.umn.edu Abstract We consider the problem of sequential portfolio selection in the stock market. There are theoretically well grounded algorithms for the problem, such as Universal Portfolio (UP), Exponentiated Gradient (EG) and Online Newton Step (ONS). Such algorithms enjoy the property of being universal, i.e., having low regret with the best constant rebalanced portfolio. However, the practical performance of such popular algorithms is sobering compared to heuristics such as Anticor, which have no theoretical guarantees but perform surprisingly well in practice. Motivated by such discrepancies, in this paper we focus on designing meta algorithms for portfolio selection which can leverage the best of both worlds. Such algorithms work with a pool of base algorithms and use online learning to redistribute wealth among the base algorithms. We develop two meta-algorithms: MA EG which uses online gradient descent following EG and MA ONS which uses online Newton step following ONS. If one of the base algorithms is universal, it follows that the meta-algorithm is universal. Through extensive experiments on two real stock market datasets, we show that the meta-algorithms are competitive and often better than the best base algorithm, including heuristics, while maintaining the guarantee of being an universal algorithm. 1 Introduction Algorithms for automatically designing portfolio based on historical stock market data has been extensively investigated in the literature for the past five decades [15, 13]. Earlier literature on the topic advocated a statistical treatment of the problem, and important results were established [13, 8, 5]. With the realization that any statistical assumptions regarding the stock market may be inappropriate and eventually counterproductive, over the past two decades new methods of portfolio selection have been designed which make no statistical assumptions regarding the movement of stocks [6, 7, 11]. In a well-defined technical sense, such methods are guaranteed to perform competitively with certain families of adaptive portfolios even in an adversarial market. From the theoretical perspective, algorithm design for portfolio selection has been largely a success story [6, 11, 5]. The motivation of this paper comes from sobering practical performance of theoretically well motivated algorithms. Sometimes even simple heuristic based approaches can outperform these theoretically motivated strategies in terms of empirical performance. Heuristics which can gather admirable amounts of wealth in one market run the risk of stark poor performance in an adversarial situation. Ideally, we would want portfilio selection algorithms which have good empirical performance but at the same time have theoretical guarantees with regards to their performance. In this paper we bring both of these worlds together by designing meta algorithms for on-line portfolio selection. The objective of any on-line portfolio selection algorithm is maximization of wealth. But absolute wealth maximization is not possible due to the temporal nature of the problem. Recent theoretically motivated online algorithms usually demonstrate competitiveness with a class of target strategies called Constant 1

Rebalanced Portfolios (CRP). A CRP ensures a fixed proportion of wealth investment amongst the stocks by daily redistribution of wealth. The performance of a theoretically motivated online strategy is usually measured by regret, the difference in logarithmic wealth between the online strategy and the best CRP in hindsight. It is important to note that while the best CRP in hindsight takes advantage of knowing the market beforehand and hence forms a strong baseline, it cannot be implemented in practice. The regret analysis provides a lower bound for the worst performance of an on-line algorithm w.r.t. the best CRP in hindsight. Of the theory based approaches that require mention here are Universal Portfolios(UP) [6, 7], Exponentiated Gradient (EG) [11] and Online Newton Step (ONS) [3]. UP achieves O(log T ) regret, EG achieves O( T ) regret under the no-junk-bond assumption (to be explained shortly), and ONS achieves O(log T ) regret under the same assumption. In recent work, [1] proposed a heuristic approach called Anticor which was shown to beat the theoretically motivated EG, UP and the best CRP in practice on several datasets by an overwhelming margin. Therefore, it seems logical for an investor to use Anticor and maximize his wealth. The reason why one cannot use Anticor without any qualms is due to the absence of any worst case performance guarantees. Although Anticor does well on certain datasets, a real time market sequence can prove to be unsuitable and even adversarial for Anticor. While there is always the temptation to maximize wealth based on a heuristic that has worked well on certain markets, it is highly desirable to have worst case performance guarantees. In this paper, we introduce new meta algorithms for online portfolio selection which achieve the balance between good empirical performance and theoretical guarantees. In particular, we present two meta algorithms, their difference owing only to the weight update strategy: MA EG, relying on gradient-based updates inspired by EG and MA ONS, relying on Newton-step based updates inspired by ONS. The meta algorithms maintain a pool of base algorithms and shifts wealth to the better performing base algorithms by online learning over time. The novelty is their ability to be competitive with the best of heuristics in empirical performance while having theoretical guarantees, i.e., regret bounds w.r.t. the best CRP in hindsight. The meta algorithm achieves O( T ) regret with the EG-based update under the no-junk-algorithms assumption (to be explained shortly). With the ONS-based update, it can achieve O(log T ) regret under the same assumption. Through comprehensive experiments on two historical stock market datasets, spanning 14 and 22 years at a daily resolution, we show that the two versions of the meta algorithm outperform the existing algorithms with theoretical guarantees by several order of magnitude while maintaining the exact same guarantees The rest of the paper is organized as follows. In Section 2, we review the relevant literature on portfolio selection. Specifically we introduce UP, EG, ONS, and Anticor. We present the new meta algorithms (MA EG and MA ONS ) and analyze their properties in Section 3. In Section 4 we demonstrate the effectiveness of the meta algorithms through empirical results on the widely used NYSE dataset and a new S&P500 dataset. We conclude in Section 5 with a discussion and directions for future work. 2 Related Work We consider a stock market consisting of n stocks {s 1,..., s n } over T periods. For ease of exposition, we will consider a period to be a day, but the analysis presented in the paper holds for any valid definition of a period, such as an hour or a month. Let x t (i) denote the price relative of stock s i in day t, i.e., the multiplicative factor by which the price of s i changes in day t. Hence, x t (i) > 1 implies a gain, x t (i) < 1 implies a loss, and x t (i) = 1 implies the price remained unchanged. Further, x t (i) > 0 for all i, t. Let x t = x t (1),..., x t (n) denote the vector of price relatives for day t, and let x 1:t denote the collection of such price relative vectors upto and including day t. A portfolio p t = p t (1),..., p t (n) on day t can be viewed as a probability distribution over the stocks that prescribes investing p t (i) fraction of the current wealth in stock x t (i). Note that the portfolio p t has to be decided before knowing x t which will be revealed only at the end of the day. The multiplicative gain in wealth at the end of day t, is then simply p T t x t = n i=1 p t(i)x t (i). 2

For a sequence of price relatives x 1:t 1 = {x 1,..., x t 1 } upto day (t 1), the sequential portfolio selection problem in day t is to determine a portfolio p t based on past performance of the stocks. At the end of day t, x t is revealed and the actual performance of p t gets determined by p T t x t. Over a period of T days, for a sequence of portfolios p 1:T = {p 1,..., p T }, the multiplicative gain in wealth is then S(p 1:T, x 1:T ) = T ( ) p T t x t. (1) In the literature, one often looks at the logarithm of the multiplicative gain, given by LS(p 1:T, x 1:T ) = log ( p T ) t x t. (2) Ideally, we would like to maximize S(p 1:T, x 1:T ) over x 1:T. Unfortunately, portfolio selection cannot be posed as an optimization problem due to the temporal nature of the choices: x t is not available when one has to decide on p t. Further, in a stock market, (statistical) assumptions regarding x t can be difficult to make. Existing literature for theoretically motivated portfolio selection [6, 11, 5] focus on designing algorithms whose overall gain in wealth is guaranteed to be competitive with reasonable strategies to the problem. Constant Rebalanced Portfolios: One goal could be to compete against the best Constant Rebalanced Portfolio (CRP). The CRP investment strategy maintains a fixed fraction of total wealth in each of the stocks. So, a CRP has a fixed portfolio vector p crp = p(1),..., p(n) which is employed every day. Such a strategy requires vast amounts of trading every day to ensure that the investment proportions are rebalanced back to the vector p crp. To understand the strength of this family of strategies, consider the example of a market consisting of two stocks. The first stock is a no-growth stock and has price relatives 1, 1, 1, 1... over time. The second stock doubles in value on even days and on odd days its value gets halved. So, its price relatives are 1 2, 2, 1 2, 2,.... The sequence of market vectors in this case is (1, 1 2 ), (1, 2), (1, 1 2 ), (1, 2), (1, 1 2 ),.... A Buy-and-Hold strategy will not make any gains in this market. On the other hand, if we use a CRP of p crp = ( 1 2, ) 1 2, then the growth of wealth every 2 days will be 9 8 so that after T days, the multiplicative gain in wealth will be ( 9 T/2. 8) It is easy to see that the wealth accumulated by the best CRP will be at least as big as that by the best Buy-and-Hold strategy and hence by the single best stock. CRP is also known to have certain optimality properties when certain statistical assumptions regarding the price relatives can be made [8]. Universal Algorithms: Theoretically motivated strategies are guaranteed to be competitive with the best CRP by achieving small regret. For any sequence x 1,..., x T of price relatives, let p 1,..., p T be the sequence of portfolios selected by the algorithm. The regret of a portfolio selection algorithm ALG is given by: Regret(ALG) max p log ( p T ) T x t log ( p T ) t x t. (3) An investment strategy is deemed universal if it has sublinear regret, i.e., Regret(ALG) = o(t ). We now briefly review a set of important sequential portfolio selection strategies from the literature, both universal and non-universal. Universal Portfolios (UP): The seminal work of Cover [6] introduced Universal Portfolios (UP), the first algorithm which can be shown to be competitive with the best CRP. The algorithm has since been extended to various practical scenarios, such as investment with side information [7]. The key idea behind UP is to maintain a distribution over all CRPs and perform a Bayesian update after observing every x t. Since each CRP q is a distribution over n stocks and hence lies in the n-simplex, one uses a distribution µ(q) over the n-simplex. A popular choice is the Dirichlet prior µ(q) = Dir( 1 2,..., 1 2 ) over the simplex. For any CRP q, let S t 1 (q, x 1:t 1 ) = t 1 t =1 log ( q T x t ) denote the wealth accumulated by q over (t 1) days. Then, the 3

universal portfolio p t is defined as the weighted average over all such q: q p t (i) = q(i)s t 1(q, x 1:t 1 )µ(q)dq q S. (4) t 1(q, x 1:t 1 )µ(q)dq UP has a regret of O(log T ) with respect to the best CRP in hindsight. However, the updates for UP are computationally prohibitive. Discrete approximation or recursive series expansion are used to evaluate the above integrals. However, in either case, the time and space complexity for finding the new universal portfolio vector grows exponentially in the dimensionality of the simplex, i.e., number of stocks. Exponentiated Gradient (EG) portfolios: The key motivation behind the Exponentiated Gradient (EG) strategy [11] was to design a computationally efficient portfolio selection algorithm which stays competitive with the best CRP. The EG algorithm scales linearly with the number of stocks and the portfolios are guaranteed to stay competitive with the best CRP. However, the regret is O( T ) under the no-junk-bond assumption and is weaker than that of UP. The no-junk-bond assumption states that all the x t (i) are bounded from below such that x t (i) > α > 0 for all t. 1 The EG investment strategy was introduced and analyzed by [11]. Their framework for updating a portfolio vector is analogous to the framework developed by [14] for online regression. In the online learning framework, the portfolio vector itself encapsulates the necessary information from all previous price relatives. At the start of day t, the algorithm computes its new portfolio vector p t such that it stays close to p t 1 and does well on the price relatives x t 1 for the previous day. In particular, the new portfolio vector p t is chosen so as to maximize F (p t ) = η log(p T t x t ) KL(p t, p t 1 ), (5) where η > 0 is a parameter called the learning rate and KL(, ) is the KL-divergence ensuring p t stays close to p t 1. Using an approximation of F based on Taylor expansion, the updated portfolio turns out to be ( ) p t 1 (i) exp η xt 1(i) p p t (i) = T t 1 xt 1 ). (6) n i =1 p t 1(i ) exp (η xt 1(i ) p T t 1 xt 1 Note that p T t 1x t 1 is the average price relative, and the wealth allocated to stock i relies on the ratio x t 1(i). In particular, if the price relative for a stock is greater than the average in a particular round, the p T xt 1 t 1 investment on that stock is increased accordingly. Online Newton Step Method (ONS): The Online Newton Step (ONS) method for portfolio selection is an application of the Newton step to the online setting. Recent work has shown that the ONS approach can be used in online convex optimization to achieve sharper regret bounds compared to online gradient descent based methods [10, 2, 9]. Let p t is a portfolio vector, such that p t n, the n-simplex. The ONS algorithm uses following portfolio update method for round t > 1: p t = ( A t 1 p t 1 1 ) β A 1 t 1 t 1 (7) n where t = [log(p t. x t )] = 1 p t x t x t, A t = t τ=1 τ τ + I, β is a non-negative constant, and A t 1 n the projection onto the nsimplex n according to the norm induced by A t 1, i.e., is At 1 n (y) = argmin (y x) T A t 1 (y x). (8) x n Under the no-junk-bond assumption, ONS achieves a O(log T ) regret. 2 ONS is computationally more efficient than UP. ONS has a better regret bound than EG, but is however less efficient than EG. 1 The no-junk-bond assumption can be removed with a more advanced analysis yielding a regret of O(T 3/4 ) [11]. 2 Without the no junk bond assumption, the regret of ONS is O( T ). 4

Anticor: Anticor (AC) is a heuristic which does not confirm to the universal property for portfolio selection algorithms [1]. In AC, learning the best stocks (to invest money in) is done by exploiting the volatility of the market and the statistical relationship between the stocks. It implements the reversal to the mean market phenomenon rather aggressively. An important parameter for Anticor is the window length w. The version of Anticor implemented works with two most recent windows of length w. The strategy is to move money from a stock i to stock j if the growth rate of stock i is greater than the growth rate of j in the most recent window. An additional condition that needs to be satisfied is the existence of a positive correlation between stock i in the second last window and stock j in the last window. The satisfiability of this condition is an indicator that stock j will replicate stock i s past behavior in the near future. The amount of money that is transferred from stock i to stock j depends on the strength of correlation between the stocks and the strength of self-anti-correlations between each stock i over two consecutive windows. For a window length of w, LX 1 and LX 2 are defined as LX 1 = [log(x t 2w+1 ),, log(x t w )] T and LX 2 = [log(x t w+1 ),, log(x t )] T. Thus LX 1 and LX 2 are two w n matrices over two consecutive time windows. The j th column of LX k is denoted by LX k (j), which tracks the performance of stock j in window k. Let µ k (j) be the mean of LX k (j) and σ(k) be the corresponding standard deviation. The cross-covariance matrix between the column vectors of LX 1 and LX 2 is defined as follows: M cov (i, j) = 1 w 1 (LX 1(i) µ 1 (i)) T (LX 2 (j) µ 2 (j)). (9) The corresponding cross-correlation matrix is given by: M cor (i, j) = { Mcov(i,j) σ 1(i)σ 2(j) σ 1 (i), σ 2 (j) 0 0 otherwise. (10) Following the reversal to mean strategy, the proportion of wealth to be moved from stock i to stock j is defined as: C i j = M cor (i, j) + M cor (i, i) + M cor (j, j), (11) where x = max(0, x). The normalized transfer is defined as C i j T i j = p t (i) j C. (12) i j Using these transfer values, the portfolio is defined to be p t+1 (i) = p t (i) + j i(t j i T i j ). (13) For more details on the Anticor algorithm please refer to [1]. Preliminary Experiments: Experiments with different variations of Anticor in [1] brought to the fore the exceptional empirical performance improvement that a suitable heuristic can achieve over theoretically well grounded approaches. We ran Anticor with a window size of 30 (ANTI 30 ) on two datasets along with EG, UP and ONS. The two datasets used in Figure 1 are the historical NYSE dataset and the S&P500 dataset (description given in Section 4). For NYSE, Anticor s multiplicative gain is of the order of 10 6, more than four orders of magnitude larger than the wealth gathered by the best performing universal algorithm, in this case ONS. For S&P500, the results are similar. These results highlight the key discrepancy motivating our current work: The performance of theoretically well grounded universal algorithms are substantially worse compared to suitable heuristics such as Anticor. The meta-algorithms considered in the next section combines the good empirical performance of heuristics such as Anticor with the theoretical guarantees of universal algorithms. 5

3 Meta Algorithms The Meta Algorithms we consider maintain a pool of baseline portfolio selection algorithms. The meta algorithms were designed keeping in view the original objective of the paper: good empirical performance along with provably low regret bound with respect to a class of investment strategy, viz constant rebalanced portfolios (CRPs). The Meta Algorithms (referred to as MAs in the sequel) in spirit are also online weight update algorithms similar to the potfolio selection algorithms we came across earlier. We maintain a distribution over a set of base algorithms (BAs), each using its own approach to portfolio selection. In each step we observe the individual performance of the BAs and update our belief in them based on their performance. It is important to note that we can include any kind of portfolio selection strategy as a BA. In particular, the BAs need not necessarily be a provable universal strategy [6]. But in order for a MA to have low regret w.r.t. the best CRP in hindsight, at least one of the BAs should be universal. Let there be m base algorithms. Let r t denote the vector of wealth relatives at time t over all baseline algorithms, i.e., r t (j) is the multiplicative gain in wealth achieved by j th BA in time step t. If BA j maintained a portfolio p t,j over the stocks, then r t (j) = p T t,j x t, where x t is the vector of price relatives of the stocks under consideration. Without loss of generality, we assume r t (j) 1 for our analysis, since r t (j) u for any constant u will only add a constant additive term log u in the regret term. Further we work with a no junk algorithm assumption so that r t (j) l for some l > 0. Any MA maintains a distribution over the BAs. It adaptively moves weights to the algorithms that have shown good performance in the past, where performance here represents the wealth gathered by each of the BAs. In the context of the MAs, the BAs play the role of individual stocks, and the wealth relatives of the BAs correspond to the price relatives of stocks in the portfolio selection problem. As a result, the MAs can use one of the portfolio selection algorithms with regret guarantees to maintain a distribution over the BAs. Specifically we discuss two versions of MA: MA EG based on gradient based updates and MA ONS based on Newton updates. The total wealth accumulated by MA using wealth distributions w 1,, w T in T rounds over m base algorithms is and the logarithm of the wealth achieved is S(w 1:T, r 1:T ) = LS T (w 1:T, r 1:T ) = T (wt T r t ), (14) log(wt T r t ). (15) Analysis of MA EG : Algorithm 1 describes Meta Algorithm with EG based updates. Following the update property of EG [11], it can be shown that the difference in logarithmic wealth gathered by MA EG and any of its base algorithms is O( T ). The following theorem formalizes this statement. Theorem 1 Let u m be a distribution vector over the base algorithms, and let r 1,, r T be a sequence of wealth relatives with r t (j) l > 0 for all j, t and max j r t (j) = 1 for all t (no-junk-algorithms assumption). For η > 0 the logarithmic wealth due to the distribution vectors over the base algorithms produced by the MA EG is bounded from below as follows: log(u T r t ) t 1 log(w T t r t ) log m η + ηt 8l 2. (16) Furthermore, if w 1 is chosen to be the uniform proportion vector, and we set η = 2l 2 log m/t, then we have 2T log m log(u T r t ) log(wt T r t ). (17) 2l 6

Proof: The proof follows directly from the proof of Theorem 4.1 of [11] by replacing x t by r t and c by l. The w t is a distribution over the base algorithms instead of portfolio vector. What Theorem 1 implies is that we can include any number of heuristics or universal algorithms in the pool of BAs, and the wealth gathered by MA EG will be competitive with the best strategy. So if we create a pool with EG, UP, ONS, Anticor and run MA EG on the NYSE and S&P500 dataset, we will expect the wealth accumulated by MA EG to be close to that of Anticor. Corollary 1 MA EG will have a O( T ) regret with respect to any of the base algorithms. In particular, for j th base algorithm 2T log m log(p T t,jx t ) log(wt T r t ). (18) 2l Proof: The proof follows simply by using u in Theorem 1 to pick the j th base algorithm only, and substituting the wealth relative r t (j) by p T t,j x t. Consider an application of MA EG where one of the base algorithms is universal. In particular, assume that UP is used as a base algorithm. The above results readily imply that the corresponding MA EG will be competitive with the best CRP in hindsight. Corollary 2 Let r 1,, r T be a sequence of wealth relatives with r t (j) l > 0 for all j, t and max j r t (j) = 1 for all t. Let p be any CRP over the corresponding sequence of price relatives x 1,, x T. For η = 2l 2 log m/t and w 1 = uniform(1/m), the regret in log-wealth of MA EG w.r.t. to the best CRP in hindsight is bounded as follows max p 2T log m log(p T x t ) log(wt T r t ) + (n 1) log(t + 1). (19) 2l Proof: Corollary 1 implies that if we include UP which is competitive with the best CRP in hindsight in the pool of BAs, MA EG in turn will also be competitive with the best CRP in hindsight. From [16] we have that if p 1:T,UP denotes the sequence of portfolio vectors used by UP and p be any CRP on a sequence of price relatives x 1,, x T, then max p log(p T x t ) By adding (20) with (18) and considering the j th BA to be UP, we have log(p T t,up x t ) (n 1) log(t + 1). (20) max p 2T log m log(p T x t ) log(w t r t ) + (n 1) log(t + 1). (21) 2l Hence, we have shown that MA EG has O( T ) regret w.r.t. the best CRP in hindsight if UP (or any other universal algorithm) is included as one of the base algorithms. Analysis of MA ONS : Like MA EG, w t in MA ONS is a distribution on the base algorithms in round t. At each step, w t is learnt using the ONS-update [3]. Algorithm 2 describes the steps for the algorithm. An analysis of the ONS update yields the following regret bound for MA ONS. 7

Algorithm 1 MA EG : Meta Algorithm with EG-update Let w 1 = ( 1 m,, 1 m ) give the fraction of initial wealth invested in each of the base algorithms b 1 b m Let p t,j be the portfolio vector used by base algorithm b j in round t for t = 1, 2,... Receive price relatives x t Calculate r t (j) = p T t,j x t Update weights on each base algorithms b j as follows: w t+1 (j) = w t (j) exp(ηr t (j)/w T t r t ) m j=1 w t(j) exp(ηr t (j)/w T t r t ) Algorithm 2 MA ONS : Meta Algorithm with ONS-update Let w 1 = ( 1 m,, 1 m ) give the fraction of initial wealth invested in each of the base algorithms b 1 b m Let β = l 16 for t = 1, 2,... Receive price relatives x t Calculate r t (j) = p T t,j x t Calculate the new weight vector as follows w t+1 = ( A t x t 1 ) β A 1 t t (22) m where t = [log(p t. T tr t )] = 1 p r t, A T t = t t rt τ=1 τ τ +I and A t m is a projection in the norm induced by A t, i.e., At (y) = argmin (y x) T A t (y x) (23) m x m Theorem 2 For any sequence of wealth relatives r t with r t (j) [l, 1], for any β l/16, and the no junk algorithm assumption, MA ONS algorithm has the following regret: max log(w T r t ) log(wt T r t ) m ( ) mt w β log l 2 + 1 + 4β. (24) t The result follows from the analysis presented in [9]. However, note that the bound is mildly better than that in the literature [2]. For the sake of completeness, we present a detailed analysis in Appendix A. Following the lines of Corollary 2 for MA EG now leads us to the following result which shows that ONS EG will be competitive with the best CRP in hindsight. Corollary 3 Let r 1,, r T be a sequence of wealth relatives with r t (j) l > 0 for all j, t and max j r t (j) = 1 for all t. Let p be the best CRP in hindsight over the corresponding sequence of price relatives is x 1,, x T. For any positive β l/16, the regret of the logarithmic wealth of MA ONS w.r.t. the best CRP in hindsight is bounded as follows log(p T x t ) log(wt T r t ) ( ) m β + n log(t + 1) + m β log m + 4β. (25) l2 8

Proof: The proof is similar to that of Corollary 2 with some additional calculations combining terms involving m and n. Thus, the regret of MA ONS is O(log T ) w.r.t. the best CRP in hindsight, whereas the regret of MA EG is O( T ). 4 Experiments and Results In this section, we discuss experimental results performed with different sets of base algorithms and different strategies for meta algorithms. Datasets: The experiments were conducted on two main datasets: the New York Stock Exchange dataset (NYSE), widely used in the literature [6, 11, 2], and a new Standard & Poor s 500 (S&P 500) dataset we created. The NYSE dataset consists of 36 stocks with data at daily resolution accumulated over a period of 22 years from July 3rd 1962 to Dec 31st, 1984. The dataset captures the bear market that lasted between January 1973 and December 1974. All of the 36 stocks increase in value in the 22-year run. S&P 500 is a capitalization-weighted index of 500 large-cap common stocks actively traded in the United States. The index is used to measure the changes in the US economy through the aggregate market value of these 500 stocks representing all major industries. The S&P500 dataset that we used for our experiments consists of 385 stocks which were persistent in the S&P500 through January 1995 to November 2008. The time frame includes two major financial meltdowns, viz the dot-com bubble burst in March 2000, and the housing bubble burst in October 2007. In order to better understand the performance of the algorithms, we also ran experiments on these datasets in reverse. In particular, following [1], we reverse the day ordering and consider the reciprocals of the price relatives. The reverse datasets are denoted by a superscript -1, i.e., NYSE 1 and S&P500 1. Methodology: We ran a set of base algorithms and different meta algorithms on the datasets: NYSE, S&P500, NYSE 1, S&P500 1 ). The meta algorithms were run with a pool of base algorithms. The pool included UP, 3 EG, ONS among the universal algorithms, and Anticor, AdaptiveFTL r, and UCRP among non-universal algorithms. AdaptiveFTL r is a variant of the follow-the-leader strategy which uniformly distributes the wealth among the top-r stocks at any time point. UCRP is the uniform constant rebalanced portfolio which maintains equal proportion of welath in all stocks. Most of the portfolio selection algorithms, universal or otherwise, required parameter choices to be made. For EG, the value of η was taken to be 0.05. We implemented the ONS from [2] with the following parameter settings: η = 0, β = 1 and δ = 1 8. Anticor was used with a window length of w = 30 and is hence referred to as Anticor 30. We also experimented with different window lengths as well as a Buy-and-Hold strategy with window lengths from 2 to 30, referred to as BAH(Anticor 3 0). AdaptiveFTL r was used with r = 5. For, MA EG the value of η used was taken to be greater than 1. MA ONS was run with β = 1 16 Results: Table 1 presents the monetary returns in dollars of the universal and non-universal algorithms. Amongst the non-universal algorithms, it includes results for UCRP, Anticor 30, and AdaptiveFTL 5. The winner and runner-up for each market dataset appears in bold-face. Initial investment for all the algorithms is $1. MA EG and Anticor 30 are the clear winners for all the four datasets. In all cases, Anticor 30 is the best performing method, but it is not a universal algorithm. MA EG and MA ONS are both universal algorithms whose empirical performance is close to that of Anticor, and is always orders of magnitude better than the existing universal algorithms, such as UP, EG, and ONS. An interesting observation is that even on the 3 A direct implementation of the UP algorithm is exponential in the number of stocks [12]. Blum and Kalai [4] proposed an approximate implementation based on the uniform random sampling of the portfolio simplex which in worst case, is also exponential in the number of stocks. The UP algorithm code used for the experiments is designed based on [4]. 9

Table 1: Monetary returns in dollars (per $1 investment) of universal and non-universal algorithms Algorithm NYSE SP500 NYSE 1 SP500 1 UP 18.56 4.29 0.16 1.08 EG 27.10 7.08 0.23 1.28 ONS 109.17 346.88 0.53 5.99 UCRP 27.08 7.20 0.23 1.28 Anticor 30 617,754.61 3,799,842.29 5.63 3849.17 AdaptiveFTL l 11.26 5.71 0.24 1.31 MA EG 482,029.07 539,877.39 2.75 2027.87 MA ONS 424,053.99 112567.03 2.274 2019.28 NYSE 1 dataset, where UP, EG and ONS actually end up losing money (due to the intrinsic nature of the dataset), the meta algorithms and Anticor, still manage to make profits. Table 2 presents the Annual Percentage Yields(APY) of the algorithms on the 4 datasets. The APY percentages were calculated using the standard asymptotic formula (for large number of years) ) APY = (exp 1 Tyears S Tyears 1 100 (26) where T years is the total number of years of investment which is roughly 22-years for the NYSE and 13-years for the S&P500 datasets respectively, S Tyears is the wealth gathered at the end of T years by an algorithm. As expected, ANTI 30 has the highest APY for all the datasets, with MA EG with MA ONS following close behind. On the S&P 500 dataset, while ANTI 30 achieves an APY of almost 221%, MA EG s APY is 176%, which is far past ONS s APY of 57%, the best amongst the universal algorithms. Table 2: Annual Percentage Yield Algorithm NYSE SP500 NYSE 1 SP500 1 UP 14.20 11.87-8.10 0.59 EG 16.18 16.24-6.48 1.89 ONS 23.78 56.82-2.87 14.76 UCRP 16.18 16.40-6.51 1.90 Anticor 30 83.32 220.73 8.18 88.71 AdaptiveFTL l 11.63 14.34-6.22 2.08 MA EG 81.27 176.02 4.71 79.63 MA ONS 80.22 144.66 3.81 79.58 Figures 2 and Figure 3 visually depict the performance of the algorithms in terms of the wealth growth. Due to the exceptionally large wealth growth factor of Anticor 30 and the meta algorithms, Figures 2(a) and (b)and Figure 3(b) show the logarithmic wealth growth. Additional Experiments: Based on the success of the meta algorithms based on good heuristics, we ran additional experiments to investigate two aspects: (i) Can we design even better heuristics as base algorithms and consequently improve the performance of the meta algorithms? and (ii) Are there heuristic meta algorithms which have no guarantees but outperform MA EG and MA ONS empirically? The experiments reported above worked with a fixed window size w for Anticor algorithm which was taken to be 30. Since it is not possible to ascertain what window size of Anticor algorithm will work best for a dataset, 10

we introduce a variant of the Anticor called the BAH(Anticor W ), which uses a Buy-and-Hold strategy to combine multiple Anticor algorithms with different window sizes w W. The BAH(Anticor W ) was found to perform better than the original Anticor w. Figure 4 shows that BAH(Anticor 30 )) beats Anticor 30 and of course EG, UP and ONS by a good margin on the NYSE as well as the S&P 500 dataset. BAH(Anticor W ) was added to the pool of existing base algorithms for the meta algorithms. We ran the meta algorithms MA EG and MA ONS with BAH(Anticor 30 ) in the pool. Figure 5 shows that the wealth achieved by MA EG and MA ONS with BAH(Anticor 30 ) is almost as much as BAH(Anticor 30 ) itself. Further, both meta algorithms now outperform Anticor 30. Since Anticor performed well as a base algorithm, we ran experiments using Anticor as a meta algorithm. MA Anticor simply redistributes wealth among the pool of base algorithms following the Anticor updates based on the wealth relatives of the individual algorithms. We experimented with different window lengths, and report results with a window length of 5, as this version was observed to perform reasonably well. The performance of MA Anticor was seen to decrease as the window length was increased beyond 10 days. MA Anticor had no observed performance improvement over MA EG and MA ONS on both the NYSE and S&P500 dataset. Figure 5 shows that the performance of the MA Anticor is actually inferior than both MA EG and MA ONS. Interestingly, the reversal to mean strategy does not seem as effective when run on base algorithms. The marked performance improvement with the Buy-and-Hold strategy over the Anticor algorithm with different window lengths inspired the final set of experiments where a Buy-and-Hold strategy was used as a meta algorithm. We ran a Buy-and-Hold version called MA BAH with UP, EG, ONS, AdaptiveFTL 5, Anticor 30 and BAH(Anticor 30 ) as base Algorithms. The results have been plotted in Figure 5. We observe that although multiplicative wealth gain of MA BAH is more than Anticor 30, but it is less than that of BAH(Anticor 30 ). More interestingly, MA EG and MA ONS both outperform MA BAH. Moving money to well performing algorithms explicitly, as done by MA EG and MA ONS, seems to be more effective than simply holding, as done by MA BAH, or moving money out assuming reversal to the mean, as done by MA Anticor. The experiments reveal interesting differences among effectiveness of strategies at the level of base algorithms where one works with stocks, and at the level of meta algorithms where one works with algorithms. 5 Conclusions In this paper, we have presented new meta algorithms for portfolio selection. The key motivation behind our work was the poor practical performance of universal algorithms such as UP, EG, and ONS. We wanted to take advantage of non-universal heuristics but at the same time hold on to the theoretical guarantees of the universal algorithms. We presented new meta algorithms which maintain a pool of base algorithms and move money around using the EG-update and ONS-update respectively. With the help of theoretical analysis we were able to show that the meta algorithms will be competitive with the best base algorithm. Also, if we include a universal algorithm in the pool, the meta algorithms can be shown to be competitive with the best CRP in hindsight (universal). With a universal algorithm in the pool, MA EG, achieved O( T ) regret, while MA ONS achieved O(log T ) regret w.r.t. the best CRP. This opens the door to including any number of good heuristics in the pool of base algorithms and still guarantee low regret. By comprehensive experiments on the NYSE and the S&P500 datasets and their variations we were able to exhibit the overwhelming performance improvement of the meta algorithms over the existing universal algorithms such as UP, EG, and ONS. The meta algorithms presented in this paper are the first of its kind in combining exceptional empirical performance with strong theoretical bounds, truly bringing the best of both worlds together. The existing literature on online portfolio selection algorithms [6, 11, 2, 1], does not take into account the commission one has to pay while trading. Most of these algorithms trade every stock every day which is not practical as one can incur huge amount of commission costs. As part of our future work, we would like to investigate if a sparse version of the meta algorithm can take care of commissions and yet achieve 11

good empirical performance. Further, the current models for on-line portfolio selection do not model risk. Modeling risk and taking account of volatility of stocks is another interesting direction for our future work. Acknowledgements: This research was supported by NSF CAREER grant IIS-0953274, and NSF grants IIS-0916750 and IIS-1029711. References [1] V. Gogan A. Borodin, R. El-Yaniv. Can we learn to beat the best stock. Journal of Artificial Intelligence Research, 21:579 594, 2004. [2] A. Agarwal, E. Hazan, S. Kale, and R. Schapire. Algorithms for portfolio management based on the newton method. Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 9 16, 2006. [3] A. Blum and A. Kalai. Universal portfolios with and without transaction costs. 1997. [4] A. Blum and A. Kalai. Universal portfolios with and without transaction costs. 1997. [5] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006. [6] T. Cover. Universal portfolios. Mathematical Finance, 1:1 29, 1991. [7] T. Cover and E. Ordentlich. Universal portfolios with side information. IEEE Transactions of Information Theory, 42:348 363, 1996. [8] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 1991. [9] E. Hazan, A. Agarwal, and S. Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169 192, 2007. [10] E. Hazan, A. Kalai, Satyen Kale, and A. Agarwal. Logarithmic regret algorithms for online convex optimization. Proceedings of the 19th Annual Conference on Learning Theory, pages 499 513, 2006. [11] D. Helmbold, E. Scahpire, Y. Singer, and M. Warmuth. Online portfolio setection using multiplicative weights. Mathematical Finance, 8(4):325 347, 1998. [12] A. Kalai and S. Vempala. Efficient algorithms for universal portfolios. Journal of Machine Learning Research, 3(3):423 440, 2002. [13] J. L. Kelly. A new interpretation of information rate. Bell Systems Technical Journal, 35:917 926, 1956. [14] J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 64, 1997. [15] H. Markowitz. Portfolio selection. Journal of Finance, 7:77 91, 1952. [16] E. Ordentlich T.M. Cover. Universal portfolio with side information. IEEE TRansactions on Information Theory, 42(2), 1996. 12

A Analysis of MA ONS We present an analysis of the ONS update which leads to better bounds compared to that in the literature [2], in particular in terms of the dependence on the number of stocks n. Since w t is the distribution of wealth among the m base algorithms, w t m, the m-simplex. Let and g t (w) = f t (w). If w t is the wealth distribution at time t, then let f t (w) = log(w T r t ) (27) t = f t (w t ) = 1 w T t r t r t. (28) Note that the second derivative 2 1 f t (w t ) = (wt T r t ) r tr T 2 t = t T t. (29) Consider the auxiliary function which satisfies h t (w t ) = log(w T t r t ) = f t (w t ). h t (w) = log(w T t r t ) + T t (w w t ) β 2 ( T t (w w t )) 2, (30) Lemma 1 For any β l 16 < 1/2, f tw) h t (w), i.e., log(w T r t ) log(w T t r t ) + T t (w w t ) β 2 (w w t) T t T t (w w t ). (31) ( ) 2β. 1 Proof: Let g t (w) = exp( 2βf t (w)) = w T r t Note that gt (w) = 2β f t (w) exp( 2βf t (w)). Further, since β < 1/2, implying g t (w) is concave. Hence, 2 g t (w) = 2β(2β 1) f t (w) f t (w) T exp( g t (w)) 0, g t (y) g t (w) + (y w) T g t (w) exp( 2βf t (y)) exp( 2βf t (w))(1 2β(y w) T f t (w)). Taking log on both sides and rearranging, we have f t (y) f t (w) 1 2β log(1 2β(y w)t f t (w)). From Holder s inequality, we have 2β(w y) T f t (y) 2β w y 1 f t (y) 2β( w 1 + y 1 ) rt y T r t 2β 2 l 1 4 since β l 16. For z 1/4, log(1 z) z + 1 4 z2. Using the above inequality with z = β(y x) T f t (w), we obtain f t (y) f t (w) + (y w) T f t (w) + β 2 ((y w)t f t (w)) 2. Setting y = w t and rearranging terms we obtain f t (w) f t (w t ) + (w w t ) T t β 2 (w w t) T t T t (w w t ). 13

That completes the proof. Since log(w T r t ) log(wt T r t ) = h t (w, w t ) h t (w t, w t ) T t (w w t ) β 2 (w w t) T t T t (w w t ) it is sufficient to upper bound the RHS. Following [3], we define t A t = τ T τ + I = A t 1 + t T t. (32) τ=1 Let φ t (x) = w T A t x. Note that φ t (w) is convex, and the corresponding Bregman divergence is given by d φt (w, y) = (w y) T A t (w y). (33) The Online Newton Step (ONS) algorithm proceeds by taking a Newton step from the current solution x t and then projecting it back to the simplex m. Let y t+1 denote the result of an unconstrained Newton step: y t+1 = w t 1 β A 1 t t. (34) Then, the next wealth distribution x t+1 is the Bregman projection of y t+1 into the simplex, i.e., w t+1 = argmin w d φ (w, y t+1 ). We are now ready to state the main regret result for ONS: Theorem 3 For any sequence of wealth relatives r t with r t (j) [l, 1], for any β l/16, and the no junk algorithm assumption, the ONS algorithm has the following regret: max log(w T r t ) log(wt T r t ) m ( ) mt w β log l 2 + 1 + 4β. (35) In particular, for β = l/16, the regret is given by max w t t log(w T tr t ) log(w T r t ) 16m l ( ) mt log l 2 + 1 + l 4. (36) Note that the regret bound is stronger than the existing bound on ONS for the portfolio selection problem [2]. In particular, the regret bound in [?] has a dependency on n 3/2 whereas the bound above has a dependency on n. Thus, while the algorithm is still based on ONS, our analysis leads to a mildly better regret bound. We need the following linear algebraic result (Lemma 11 in [3]) for our proof: Lemma 2 Let t = f t (w t ) = 1 w r t r t with r t (i) [l, 1]. Further, let A t = t τ=1 τ τ + I. Then, ( ) mt T t A 1 t t m log l 2 + 1. (37) 14

Proof of Theorem: Let w be the best wealth distribution in hindsight. Recall that f t (w ) f t (w t ) T t (w w t ) β 2 (w w t) T t T t (w w t ), (38) where we are going to denote the RHS as R t, the regret at the t(th step. From the unconstrained update of the ONS algorithm, note that A t y t+1 = A t x t 1 β t t = β( φ t (y t+1 ) φ t (w t )). From the three point property of Bregman divergences, we know for any vectors x, y, z, For any w, (x y) T ( φ(z) φ(y)) = d φ (x, y) d φ (x, z) + d φ (y, z). (39) (w w t ) T t = β(w w t ) T ( φ t (y t+1 ) φ t (w t )) = β[d φt (w, w t ) d φt (w, y t+1 ) + d φt (w t, y t+1 ) β[d φt (w, w t ) d φt (w, w t+1 ) + d φt (w t, y t+1 )], where the last inequality follows from the fact x t+1 is the Bregman projection of y t+1 onto so that d φt (w, w t+1 ) d φt (w, y t+1 ). Now, note that d φ (w t, y t+1 ) = (w t y t+1 ) T A t (w t y t+1 ) = 1 β 2 T t A 1 t A t A 1 t t = 1 β 2 T t A 1 t t. Replacing the other Bregman divergences according to definition, summing over t = 1,..., T and simplifying, we obtain (w w t ) T t 1 β β 2 (w w T +1 )A T (w w t+1 ) T t A 1 t t + β 2 (w w 1 ) T A 1 (w w 1 ) + β 2 (w w t ) T (A t A t 1 )(w w t ) t=2 1 β T t A 1 t t + β (w w t ) T t T t (w w t ) + β(w w 1 )(A 1 1 T 1 )(w w 1 ), where we have used A t A t 1 = t T t and (w w T +1 ) T A T (w w T +1 ) 0. By transferring the middle term to the LHS, we obtain an expression for T R t. In particular, f t (w ) f t (w t ) R t 1 β T t A 1 t t + β(w w 1 )(A 1 1 T 1 )(w w 1 ). Following Lemma 2, we have an upper bound on the first term. Focusing on the second term, we note that A 1 1 T 1 = I, and so w w 1 2 ( w w 1 1 ) 2 ( w 1 + w 1 1 ) 2 = 4. Plugging these back in, the regret is given by f t (w ) f t (w t ) m ( ) mt β log l 2 + 1 + 4β. (40) That completes the proof. 15

Logarithmic Wealth Growth 10 6 10 4 10 2 10 0 EG UP ONS Anticor Wealth Gathered on the NYSE dataset Logarithmic Wealth Growth 10 2 62 64 66 68 70 72 74 76 78 80 82 84 Year 10 8 10 6 10 4 10 2 EG UP ONS Anticor (a) NYSE Wealth Gathered on the S&P500 dataset 10 0 95 96 97 98 99 00 01 02 03 04 05 06 07 08 Year (b) S&P 500 Figure 1: Wealth gathered by Anticor exceeds UP, EG, and ONS on NYSE and S&P500 datasets. 16

Logarithmic Wealth Growth 10 6 10 4 10 2 10 0 EG UP UCRP AdaptiveFTL 5 Anticor 30 ONS MA EG MA ONS Monetary returns on the NYSE dataset Logarithmic Wealth Growth 10 2 62 64 66 68 70 72 74 76 78 80 82 84 Year 10 8 10 6 10 4 10 2 EG UP UCRP AdaptiveFTL 5 Anticor 30 ONS MA EG MA ONS (a) Monetary returns on NYSE Monetary returns on the S&P500 dataset 10 0 95 96 97 98 99 00 01 02 03 04 05 06 07 08 Year (b) Monetary returns on S&P500 Figure 2: Monetary returns with the meta algorithms, MA EG and MA ONS is competitive with the best base algorithm (Anticor 30 ) on original datasets (best viewed in color). 17

Wealth Growth 7 6 5 4 3 2 EG UP UCRP AdaptiveFTL 5 ANTI 30 ONS MA EG MA ONS Monetary returns on reverse NYSE Logarithmic Wealth Growth 10 4 10 3 10 2 10 1 10 0 1 0 62 64 66 68 70 72 74 76 78 80 82 84 Year EG UP UCRP AdaptiveFTL 5 Anticor 30 ONS MA EG MA ONS (a) Monetary returns on NYSE 1. Monetary returns on the reverse S&P500 10 1 95 96 97 98 99 00 01 02 03 04 05 06 07 08 Year (b) Monetary returns on S&P500 1 Figure 3: Monetary returns of the base algorithms and meta algorithms on reverse datasets (best viewed in color). 18

Logarithmic Wealth Growth 10 8 10 6 10 4 10 2 10 0 EG UP ONS Anticor 30 BAH(Anticor 30 ) Wealth Gathered on the NYSE dataset 10 2 62 64 66 68 70 72 74 76 78 80 82 84 Year (a) Monetary returns on NYSE. Logarithmic Wealth Growth 10 10 10 8 10 6 10 4 10 2 EG UP ONS Anticor BAH(Anticor 30 ) Wealth gathered on the S&P500 dataset 10 0 95 96 97 98 99 00 01 02 03 04 05 06 07 08 Year (b) Monetary returns on S&P500 Figure 4: Monetary returns of BAH(Anticor 30 ) for $1 investment, exceeds UP, EG, ONS and Anticor 30 (best viewed in color). 19

Logarithmic Wealth Growth 10 8 10 6 10 4 10 2 10 0 Anticor 30 BAH(Anticor 30 ) MA EG MA ONS MA BAH MA Anticor5 Monetary returns on the NYSE dataset Logarithmic Wealth Growth 10 2 62 64 66 68 70 72 74 76 78 80 82 84 Year 10 10 10 8 10 6 10 4 10 2 Anticor 30 BAH(Anticor 30 ) MA EG MA ONS MA BAH MA Anticor5 (a) Monetary returns on NYSE. Monetary returns on the S&P500 dataset 10 0 95 96 97 98 99 00 01 02 03 04 05 06 07 08 Year (b) Monetary returns on S&P500 Figure 5: Monetary returns of the meta algorithms(ma EG, MA ONS, MA Anticor, MA BAH ) for $1 investment, when BAH(Anticor 30 ) is added to the pool of base algorithms. Multiplicative wealth gain of MA EG, MA ONS and MA BAH exceeds that of Anticor 30 (best viewed in color). 20