Meta Algorithms for Portfolio Selection. Technical Report
|
|
- Thomas Stevenson
- 6 years ago
- Views:
Transcription
1 Meta Algorithms for Portfolio Selection Technical Report Department of Computer Science and Engineering University of Minnesota EECS Building 200 Union Street SE Minneapolis, MN USA TR Meta Algorithms for Portfolio Selection Puja Das and Arindam Banerjee September 20, 2010
2
3 Meta Algorithms for Portfolio Selection Puja Das Dept of Computer Science & Engg University of Minnesota, Twin Cities Minneapolis, MN Arindam Banerjee Dept of Computer Science & Engg University of Minnesota, Twin Cities Minneapolis, MN Abstract We consider the problem of sequential portfolio selection in the stock market. There are theoretically well grounded algorithms for the problem, such as Universal Portfolio (UP), Exponentiated Gradient (EG) and Online Newton Step (ONS). Such algorithms enjoy the property of being universal, i.e., having low regret with the best constant rebalanced portfolio. However, the practical performance of such popular algorithms is sobering compared to heuristics such as Anticor, which have no theoretical guarantees but perform surprisingly well in practice. Motivated by such discrepancies, in this paper we focus on designing meta algorithms for portfolio selection which can leverage the best of both worlds. Such algorithms work with a pool of base algorithms and use online learning to redistribute wealth among the base algorithms. We develop two meta-algorithms: MA EG which uses online gradient descent following EG and MA ONS which uses online Newton step following ONS. If one of the base algorithms is universal, it follows that the meta-algorithm is universal. Through extensive experiments on two real stock market datasets, we show that the meta-algorithms are competitive and often better than the best base algorithm, including heuristics, while maintaining the guarantee of being an universal algorithm. 1 Introduction Algorithms for automatically designing portfolio based on historical stock market data has been extensively investigated in the literature for the past five decades [15, 13]. Earlier literature on the topic advocated a statistical treatment of the problem, and important results were established [13, 8, 5]. With the realization that any statistical assumptions regarding the stock market may be inappropriate and eventually counterproductive, over the past two decades new methods of portfolio selection have been designed which make no statistical assumptions regarding the movement of stocks [6, 7, 11]. In a well-defined technical sense, such methods are guaranteed to perform competitively with certain families of adaptive portfolios even in an adversarial market. From the theoretical perspective, algorithm design for portfolio selection has been largely a success story [6, 11, 5]. The motivation of this paper comes from sobering practical performance of theoretically well motivated algorithms. Sometimes even simple heuristic based approaches can outperform these theoretically motivated strategies in terms of empirical performance. Heuristics which can gather admirable amounts of wealth in one market run the risk of stark poor performance in an adversarial situation. Ideally, we would want portfilio selection algorithms which have good empirical performance but at the same time have theoretical guarantees with regards to their performance. In this paper we bring both of these worlds together by designing meta algorithms for on-line portfolio selection. The objective of any on-line portfolio selection algorithm is maximization of wealth. But absolute wealth maximization is not possible due to the temporal nature of the problem. Recent theoretically motivated online algorithms usually demonstrate competitiveness with a class of target strategies called Constant 1
4 Rebalanced Portfolios (CRP). A CRP ensures a fixed proportion of wealth investment amongst the stocks by daily redistribution of wealth. The performance of a theoretically motivated online strategy is usually measured by regret, the difference in logarithmic wealth between the online strategy and the best CRP in hindsight. It is important to note that while the best CRP in hindsight takes advantage of knowing the market beforehand and hence forms a strong baseline, it cannot be implemented in practice. The regret analysis provides a lower bound for the worst performance of an on-line algorithm w.r.t. the best CRP in hindsight. Of the theory based approaches that require mention here are Universal Portfolios(UP) [6, 7], Exponentiated Gradient (EG) [11] and Online Newton Step (ONS) [3]. UP achieves O(log T ) regret, EG achieves O( T ) regret under the no-junk-bond assumption (to be explained shortly), and ONS achieves O(log T ) regret under the same assumption. In recent work, [1] proposed a heuristic approach called Anticor which was shown to beat the theoretically motivated EG, UP and the best CRP in practice on several datasets by an overwhelming margin. Therefore, it seems logical for an investor to use Anticor and maximize his wealth. The reason why one cannot use Anticor without any qualms is due to the absence of any worst case performance guarantees. Although Anticor does well on certain datasets, a real time market sequence can prove to be unsuitable and even adversarial for Anticor. While there is always the temptation to maximize wealth based on a heuristic that has worked well on certain markets, it is highly desirable to have worst case performance guarantees. In this paper, we introduce new meta algorithms for online portfolio selection which achieve the balance between good empirical performance and theoretical guarantees. In particular, we present two meta algorithms, their difference owing only to the weight update strategy: MA EG, relying on gradient-based updates inspired by EG and MA ONS, relying on Newton-step based updates inspired by ONS. The meta algorithms maintain a pool of base algorithms and shifts wealth to the better performing base algorithms by online learning over time. The novelty is their ability to be competitive with the best of heuristics in empirical performance while having theoretical guarantees, i.e., regret bounds w.r.t. the best CRP in hindsight. The meta algorithm achieves O( T ) regret with the EG-based update under the no-junk-algorithms assumption (to be explained shortly). With the ONS-based update, it can achieve O(log T ) regret under the same assumption. Through comprehensive experiments on two historical stock market datasets, spanning 14 and 22 years at a daily resolution, we show that the two versions of the meta algorithm outperform the existing algorithms with theoretical guarantees by several order of magnitude while maintaining the exact same guarantees The rest of the paper is organized as follows. In Section 2, we review the relevant literature on portfolio selection. Specifically we introduce UP, EG, ONS, and Anticor. We present the new meta algorithms (MA EG and MA ONS ) and analyze their properties in Section 3. In Section 4 we demonstrate the effectiveness of the meta algorithms through empirical results on the widely used NYSE dataset and a new S&P500 dataset. We conclude in Section 5 with a discussion and directions for future work. 2 Related Work We consider a stock market consisting of n stocks {s 1,..., s n } over T periods. For ease of exposition, we will consider a period to be a day, but the analysis presented in the paper holds for any valid definition of a period, such as an hour or a month. Let x t (i) denote the price relative of stock s i in day t, i.e., the multiplicative factor by which the price of s i changes in day t. Hence, x t (i) > 1 implies a gain, x t (i) < 1 implies a loss, and x t (i) = 1 implies the price remained unchanged. Further, x t (i) > 0 for all i, t. Let x t = x t (1),..., x t (n) denote the vector of price relatives for day t, and let x 1:t denote the collection of such price relative vectors upto and including day t. A portfolio p t = p t (1),..., p t (n) on day t can be viewed as a probability distribution over the stocks that prescribes investing p t (i) fraction of the current wealth in stock x t (i). Note that the portfolio p t has to be decided before knowing x t which will be revealed only at the end of the day. The multiplicative gain in wealth at the end of day t, is then simply p T t x t = n i=1 p t(i)x t (i). 2
5 For a sequence of price relatives x 1:t 1 = {x 1,..., x t 1 } upto day (t 1), the sequential portfolio selection problem in day t is to determine a portfolio p t based on past performance of the stocks. At the end of day t, x t is revealed and the actual performance of p t gets determined by p T t x t. Over a period of T days, for a sequence of portfolios p 1:T = {p 1,..., p T }, the multiplicative gain in wealth is then S(p 1:T, x 1:T ) = T ( ) p T t x t. (1) In the literature, one often looks at the logarithm of the multiplicative gain, given by LS(p 1:T, x 1:T ) = log ( p T ) t x t. (2) Ideally, we would like to maximize S(p 1:T, x 1:T ) over x 1:T. Unfortunately, portfolio selection cannot be posed as an optimization problem due to the temporal nature of the choices: x t is not available when one has to decide on p t. Further, in a stock market, (statistical) assumptions regarding x t can be difficult to make. Existing literature for theoretically motivated portfolio selection [6, 11, 5] focus on designing algorithms whose overall gain in wealth is guaranteed to be competitive with reasonable strategies to the problem. Constant Rebalanced Portfolios: One goal could be to compete against the best Constant Rebalanced Portfolio (CRP). The CRP investment strategy maintains a fixed fraction of total wealth in each of the stocks. So, a CRP has a fixed portfolio vector p crp = p(1),..., p(n) which is employed every day. Such a strategy requires vast amounts of trading every day to ensure that the investment proportions are rebalanced back to the vector p crp. To understand the strength of this family of strategies, consider the example of a market consisting of two stocks. The first stock is a no-growth stock and has price relatives 1, 1, 1, 1... over time. The second stock doubles in value on even days and on odd days its value gets halved. So, its price relatives are 1 2, 2, 1 2, 2,.... The sequence of market vectors in this case is (1, 1 2 ), (1, 2), (1, 1 2 ), (1, 2), (1, 1 2 ),.... A Buy-and-Hold strategy will not make any gains in this market. On the other hand, if we use a CRP of p crp = ( 1 2, ) 1 2, then the growth of wealth every 2 days will be 9 8 so that after T days, the multiplicative gain in wealth will be ( 9 T/2. 8) It is easy to see that the wealth accumulated by the best CRP will be at least as big as that by the best Buy-and-Hold strategy and hence by the single best stock. CRP is also known to have certain optimality properties when certain statistical assumptions regarding the price relatives can be made [8]. Universal Algorithms: Theoretically motivated strategies are guaranteed to be competitive with the best CRP by achieving small regret. For any sequence x 1,..., x T of price relatives, let p 1,..., p T be the sequence of portfolios selected by the algorithm. The regret of a portfolio selection algorithm ALG is given by: Regret(ALG) max p log ( p T ) T x t log ( p T ) t x t. (3) An investment strategy is deemed universal if it has sublinear regret, i.e., Regret(ALG) = o(t ). We now briefly review a set of important sequential portfolio selection strategies from the literature, both universal and non-universal. Universal Portfolios (UP): The seminal work of Cover [6] introduced Universal Portfolios (UP), the first algorithm which can be shown to be competitive with the best CRP. The algorithm has since been extended to various practical scenarios, such as investment with side information [7]. The key idea behind UP is to maintain a distribution over all CRPs and perform a Bayesian update after observing every x t. Since each CRP q is a distribution over n stocks and hence lies in the n-simplex, one uses a distribution µ(q) over the n-simplex. A popular choice is the Dirichlet prior µ(q) = Dir( 1 2,..., 1 2 ) over the simplex. For any CRP q, let S t 1 (q, x 1:t 1 ) = t 1 t =1 log ( q T x t ) denote the wealth accumulated by q over (t 1) days. Then, the 3
6 universal portfolio p t is defined as the weighted average over all such q: q p t (i) = q(i)s t 1(q, x 1:t 1 )µ(q)dq q S. (4) t 1(q, x 1:t 1 )µ(q)dq UP has a regret of O(log T ) with respect to the best CRP in hindsight. However, the updates for UP are computationally prohibitive. Discrete approximation or recursive series expansion are used to evaluate the above integrals. However, in either case, the time and space complexity for finding the new universal portfolio vector grows exponentially in the dimensionality of the simplex, i.e., number of stocks. Exponentiated Gradient (EG) portfolios: The key motivation behind the Exponentiated Gradient (EG) strategy [11] was to design a computationally efficient portfolio selection algorithm which stays competitive with the best CRP. The EG algorithm scales linearly with the number of stocks and the portfolios are guaranteed to stay competitive with the best CRP. However, the regret is O( T ) under the no-junk-bond assumption and is weaker than that of UP. The no-junk-bond assumption states that all the x t (i) are bounded from below such that x t (i) > α > 0 for all t. 1 The EG investment strategy was introduced and analyzed by [11]. Their framework for updating a portfolio vector is analogous to the framework developed by [14] for online regression. In the online learning framework, the portfolio vector itself encapsulates the necessary information from all previous price relatives. At the start of day t, the algorithm computes its new portfolio vector p t such that it stays close to p t 1 and does well on the price relatives x t 1 for the previous day. In particular, the new portfolio vector p t is chosen so as to maximize F (p t ) = η log(p T t x t ) KL(p t, p t 1 ), (5) where η > 0 is a parameter called the learning rate and KL(, ) is the KL-divergence ensuring p t stays close to p t 1. Using an approximation of F based on Taylor expansion, the updated portfolio turns out to be ( ) p t 1 (i) exp η xt 1(i) p p t (i) = T t 1 xt 1 ). (6) n i =1 p t 1(i ) exp (η xt 1(i ) p T t 1 xt 1 Note that p T t 1x t 1 is the average price relative, and the wealth allocated to stock i relies on the ratio x t 1(i). In particular, if the price relative for a stock is greater than the average in a particular round, the p T xt 1 t 1 investment on that stock is increased accordingly. Online Newton Step Method (ONS): The Online Newton Step (ONS) method for portfolio selection is an application of the Newton step to the online setting. Recent work has shown that the ONS approach can be used in online convex optimization to achieve sharper regret bounds compared to online gradient descent based methods [10, 2, 9]. Let p t is a portfolio vector, such that p t n, the n-simplex. The ONS algorithm uses following portfolio update method for round t > 1: p t = ( A t 1 p t 1 1 ) β A 1 t 1 t 1 (7) n where t = [log(p t. x t )] = 1 p t x t x t, A t = t τ=1 τ τ + I, β is a non-negative constant, and A t 1 n the projection onto the nsimplex n according to the norm induced by A t 1, i.e., is At 1 n (y) = argmin (y x) T A t 1 (y x). (8) x n Under the no-junk-bond assumption, ONS achieves a O(log T ) regret. 2 ONS is computationally more efficient than UP. ONS has a better regret bound than EG, but is however less efficient than EG. 1 The no-junk-bond assumption can be removed with a more advanced analysis yielding a regret of O(T 3/4 ) [11]. 2 Without the no junk bond assumption, the regret of ONS is O( T ). 4
7 Anticor: Anticor (AC) is a heuristic which does not confirm to the universal property for portfolio selection algorithms [1]. In AC, learning the best stocks (to invest money in) is done by exploiting the volatility of the market and the statistical relationship between the stocks. It implements the reversal to the mean market phenomenon rather aggressively. An important parameter for Anticor is the window length w. The version of Anticor implemented works with two most recent windows of length w. The strategy is to move money from a stock i to stock j if the growth rate of stock i is greater than the growth rate of j in the most recent window. An additional condition that needs to be satisfied is the existence of a positive correlation between stock i in the second last window and stock j in the last window. The satisfiability of this condition is an indicator that stock j will replicate stock i s past behavior in the near future. The amount of money that is transferred from stock i to stock j depends on the strength of correlation between the stocks and the strength of self-anti-correlations between each stock i over two consecutive windows. For a window length of w, LX 1 and LX 2 are defined as LX 1 = [log(x t 2w+1 ),, log(x t w )] T and LX 2 = [log(x t w+1 ),, log(x t )] T. Thus LX 1 and LX 2 are two w n matrices over two consecutive time windows. The j th column of LX k is denoted by LX k (j), which tracks the performance of stock j in window k. Let µ k (j) be the mean of LX k (j) and σ(k) be the corresponding standard deviation. The cross-covariance matrix between the column vectors of LX 1 and LX 2 is defined as follows: M cov (i, j) = 1 w 1 (LX 1(i) µ 1 (i)) T (LX 2 (j) µ 2 (j)). (9) The corresponding cross-correlation matrix is given by: M cor (i, j) = { Mcov(i,j) σ 1(i)σ 2(j) σ 1 (i), σ 2 (j) 0 0 otherwise. (10) Following the reversal to mean strategy, the proportion of wealth to be moved from stock i to stock j is defined as: C i j = M cor (i, j) + M cor (i, i) + M cor (j, j), (11) where x = max(0, x). The normalized transfer is defined as C i j T i j = p t (i) j C. (12) i j Using these transfer values, the portfolio is defined to be p t+1 (i) = p t (i) + j i(t j i T i j ). (13) For more details on the Anticor algorithm please refer to [1]. Preliminary Experiments: Experiments with different variations of Anticor in [1] brought to the fore the exceptional empirical performance improvement that a suitable heuristic can achieve over theoretically well grounded approaches. We ran Anticor with a window size of 30 (ANTI 30 ) on two datasets along with EG, UP and ONS. The two datasets used in Figure 1 are the historical NYSE dataset and the S&P500 dataset (description given in Section 4). For NYSE, Anticor s multiplicative gain is of the order of 10 6, more than four orders of magnitude larger than the wealth gathered by the best performing universal algorithm, in this case ONS. For S&P500, the results are similar. These results highlight the key discrepancy motivating our current work: The performance of theoretically well grounded universal algorithms are substantially worse compared to suitable heuristics such as Anticor. The meta-algorithms considered in the next section combines the good empirical performance of heuristics such as Anticor with the theoretical guarantees of universal algorithms. 5
8 3 Meta Algorithms The Meta Algorithms we consider maintain a pool of baseline portfolio selection algorithms. The meta algorithms were designed keeping in view the original objective of the paper: good empirical performance along with provably low regret bound with respect to a class of investment strategy, viz constant rebalanced portfolios (CRPs). The Meta Algorithms (referred to as MAs in the sequel) in spirit are also online weight update algorithms similar to the potfolio selection algorithms we came across earlier. We maintain a distribution over a set of base algorithms (BAs), each using its own approach to portfolio selection. In each step we observe the individual performance of the BAs and update our belief in them based on their performance. It is important to note that we can include any kind of portfolio selection strategy as a BA. In particular, the BAs need not necessarily be a provable universal strategy [6]. But in order for a MA to have low regret w.r.t. the best CRP in hindsight, at least one of the BAs should be universal. Let there be m base algorithms. Let r t denote the vector of wealth relatives at time t over all baseline algorithms, i.e., r t (j) is the multiplicative gain in wealth achieved by j th BA in time step t. If BA j maintained a portfolio p t,j over the stocks, then r t (j) = p T t,j x t, where x t is the vector of price relatives of the stocks under consideration. Without loss of generality, we assume r t (j) 1 for our analysis, since r t (j) u for any constant u will only add a constant additive term log u in the regret term. Further we work with a no junk algorithm assumption so that r t (j) l for some l > 0. Any MA maintains a distribution over the BAs. It adaptively moves weights to the algorithms that have shown good performance in the past, where performance here represents the wealth gathered by each of the BAs. In the context of the MAs, the BAs play the role of individual stocks, and the wealth relatives of the BAs correspond to the price relatives of stocks in the portfolio selection problem. As a result, the MAs can use one of the portfolio selection algorithms with regret guarantees to maintain a distribution over the BAs. Specifically we discuss two versions of MA: MA EG based on gradient based updates and MA ONS based on Newton updates. The total wealth accumulated by MA using wealth distributions w 1,, w T in T rounds over m base algorithms is and the logarithm of the wealth achieved is S(w 1:T, r 1:T ) = LS T (w 1:T, r 1:T ) = T (wt T r t ), (14) log(wt T r t ). (15) Analysis of MA EG : Algorithm 1 describes Meta Algorithm with EG based updates. Following the update property of EG [11], it can be shown that the difference in logarithmic wealth gathered by MA EG and any of its base algorithms is O( T ). The following theorem formalizes this statement. Theorem 1 Let u m be a distribution vector over the base algorithms, and let r 1,, r T be a sequence of wealth relatives with r t (j) l > 0 for all j, t and max j r t (j) = 1 for all t (no-junk-algorithms assumption). For η > 0 the logarithmic wealth due to the distribution vectors over the base algorithms produced by the MA EG is bounded from below as follows: log(u T r t ) t 1 log(w T t r t ) log m η + ηt 8l 2. (16) Furthermore, if w 1 is chosen to be the uniform proportion vector, and we set η = 2l 2 log m/t, then we have 2T log m log(u T r t ) log(wt T r t ). (17) 2l 6
9 Proof: The proof follows directly from the proof of Theorem 4.1 of [11] by replacing x t by r t and c by l. The w t is a distribution over the base algorithms instead of portfolio vector. What Theorem 1 implies is that we can include any number of heuristics or universal algorithms in the pool of BAs, and the wealth gathered by MA EG will be competitive with the best strategy. So if we create a pool with EG, UP, ONS, Anticor and run MA EG on the NYSE and S&P500 dataset, we will expect the wealth accumulated by MA EG to be close to that of Anticor. Corollary 1 MA EG will have a O( T ) regret with respect to any of the base algorithms. In particular, for j th base algorithm 2T log m log(p T t,jx t ) log(wt T r t ). (18) 2l Proof: The proof follows simply by using u in Theorem 1 to pick the j th base algorithm only, and substituting the wealth relative r t (j) by p T t,j x t. Consider an application of MA EG where one of the base algorithms is universal. In particular, assume that UP is used as a base algorithm. The above results readily imply that the corresponding MA EG will be competitive with the best CRP in hindsight. Corollary 2 Let r 1,, r T be a sequence of wealth relatives with r t (j) l > 0 for all j, t and max j r t (j) = 1 for all t. Let p be any CRP over the corresponding sequence of price relatives x 1,, x T. For η = 2l 2 log m/t and w 1 = uniform(1/m), the regret in log-wealth of MA EG w.r.t. to the best CRP in hindsight is bounded as follows max p 2T log m log(p T x t ) log(wt T r t ) + (n 1) log(t + 1). (19) 2l Proof: Corollary 1 implies that if we include UP which is competitive with the best CRP in hindsight in the pool of BAs, MA EG in turn will also be competitive with the best CRP in hindsight. From [16] we have that if p 1:T,UP denotes the sequence of portfolio vectors used by UP and p be any CRP on a sequence of price relatives x 1,, x T, then max p log(p T x t ) By adding (20) with (18) and considering the j th BA to be UP, we have log(p T t,up x t ) (n 1) log(t + 1). (20) max p 2T log m log(p T x t ) log(w t r t ) + (n 1) log(t + 1). (21) 2l Hence, we have shown that MA EG has O( T ) regret w.r.t. the best CRP in hindsight if UP (or any other universal algorithm) is included as one of the base algorithms. Analysis of MA ONS : Like MA EG, w t in MA ONS is a distribution on the base algorithms in round t. At each step, w t is learnt using the ONS-update [3]. Algorithm 2 describes the steps for the algorithm. An analysis of the ONS update yields the following regret bound for MA ONS. 7
10 Algorithm 1 MA EG : Meta Algorithm with EG-update Let w 1 = ( 1 m,, 1 m ) give the fraction of initial wealth invested in each of the base algorithms b 1 b m Let p t,j be the portfolio vector used by base algorithm b j in round t for t = 1, 2,... Receive price relatives x t Calculate r t (j) = p T t,j x t Update weights on each base algorithms b j as follows: w t+1 (j) = w t (j) exp(ηr t (j)/w T t r t ) m j=1 w t(j) exp(ηr t (j)/w T t r t ) Algorithm 2 MA ONS : Meta Algorithm with ONS-update Let w 1 = ( 1 m,, 1 m ) give the fraction of initial wealth invested in each of the base algorithms b 1 b m Let β = l 16 for t = 1, 2,... Receive price relatives x t Calculate r t (j) = p T t,j x t Calculate the new weight vector as follows w t+1 = ( A t x t 1 ) β A 1 t t (22) m where t = [log(p t. T tr t )] = 1 p r t, A T t = t t rt τ=1 τ τ +I and A t m is a projection in the norm induced by A t, i.e., At (y) = argmin (y x) T A t (y x) (23) m x m Theorem 2 For any sequence of wealth relatives r t with r t (j) [l, 1], for any β l/16, and the no junk algorithm assumption, MA ONS algorithm has the following regret: max log(w T r t ) log(wt T r t ) m ( ) mt w β log l β. (24) t The result follows from the analysis presented in [9]. However, note that the bound is mildly better than that in the literature [2]. For the sake of completeness, we present a detailed analysis in Appendix A. Following the lines of Corollary 2 for MA EG now leads us to the following result which shows that ONS EG will be competitive with the best CRP in hindsight. Corollary 3 Let r 1,, r T be a sequence of wealth relatives with r t (j) l > 0 for all j, t and max j r t (j) = 1 for all t. Let p be the best CRP in hindsight over the corresponding sequence of price relatives is x 1,, x T. For any positive β l/16, the regret of the logarithmic wealth of MA ONS w.r.t. the best CRP in hindsight is bounded as follows log(p T x t ) log(wt T r t ) ( ) m β + n log(t + 1) + m β log m + 4β. (25) l2 8
11 Proof: The proof is similar to that of Corollary 2 with some additional calculations combining terms involving m and n. Thus, the regret of MA ONS is O(log T ) w.r.t. the best CRP in hindsight, whereas the regret of MA EG is O( T ). 4 Experiments and Results In this section, we discuss experimental results performed with different sets of base algorithms and different strategies for meta algorithms. Datasets: The experiments were conducted on two main datasets: the New York Stock Exchange dataset (NYSE), widely used in the literature [6, 11, 2], and a new Standard & Poor s 500 (S&P 500) dataset we created. The NYSE dataset consists of 36 stocks with data at daily resolution accumulated over a period of 22 years from July 3rd 1962 to Dec 31st, The dataset captures the bear market that lasted between January 1973 and December All of the 36 stocks increase in value in the 22-year run. S&P 500 is a capitalization-weighted index of 500 large-cap common stocks actively traded in the United States. The index is used to measure the changes in the US economy through the aggregate market value of these 500 stocks representing all major industries. The S&P500 dataset that we used for our experiments consists of 385 stocks which were persistent in the S&P500 through January 1995 to November The time frame includes two major financial meltdowns, viz the dot-com bubble burst in March 2000, and the housing bubble burst in October In order to better understand the performance of the algorithms, we also ran experiments on these datasets in reverse. In particular, following [1], we reverse the day ordering and consider the reciprocals of the price relatives. The reverse datasets are denoted by a superscript -1, i.e., NYSE 1 and S&P Methodology: We ran a set of base algorithms and different meta algorithms on the datasets: NYSE, S&P500, NYSE 1, S&P500 1 ). The meta algorithms were run with a pool of base algorithms. The pool included UP, 3 EG, ONS among the universal algorithms, and Anticor, AdaptiveFTL r, and UCRP among non-universal algorithms. AdaptiveFTL r is a variant of the follow-the-leader strategy which uniformly distributes the wealth among the top-r stocks at any time point. UCRP is the uniform constant rebalanced portfolio which maintains equal proportion of welath in all stocks. Most of the portfolio selection algorithms, universal or otherwise, required parameter choices to be made. For EG, the value of η was taken to be We implemented the ONS from [2] with the following parameter settings: η = 0, β = 1 and δ = 1 8. Anticor was used with a window length of w = 30 and is hence referred to as Anticor 30. We also experimented with different window lengths as well as a Buy-and-Hold strategy with window lengths from 2 to 30, referred to as BAH(Anticor 3 0). AdaptiveFTL r was used with r = 5. For, MA EG the value of η used was taken to be greater than 1. MA ONS was run with β = 1 16 Results: Table 1 presents the monetary returns in dollars of the universal and non-universal algorithms. Amongst the non-universal algorithms, it includes results for UCRP, Anticor 30, and AdaptiveFTL 5. The winner and runner-up for each market dataset appears in bold-face. Initial investment for all the algorithms is $1. MA EG and Anticor 30 are the clear winners for all the four datasets. In all cases, Anticor 30 is the best performing method, but it is not a universal algorithm. MA EG and MA ONS are both universal algorithms whose empirical performance is close to that of Anticor, and is always orders of magnitude better than the existing universal algorithms, such as UP, EG, and ONS. An interesting observation is that even on the 3 A direct implementation of the UP algorithm is exponential in the number of stocks [12]. Blum and Kalai [4] proposed an approximate implementation based on the uniform random sampling of the portfolio simplex which in worst case, is also exponential in the number of stocks. The UP algorithm code used for the experiments is designed based on [4]. 9
12 Table 1: Monetary returns in dollars (per $1 investment) of universal and non-universal algorithms Algorithm NYSE SP500 NYSE 1 SP500 1 UP EG ONS UCRP Anticor , ,799, AdaptiveFTL l MA EG 482, , MA ONS 424, NYSE 1 dataset, where UP, EG and ONS actually end up losing money (due to the intrinsic nature of the dataset), the meta algorithms and Anticor, still manage to make profits. Table 2 presents the Annual Percentage Yields(APY) of the algorithms on the 4 datasets. The APY percentages were calculated using the standard asymptotic formula (for large number of years) ) APY = (exp 1 Tyears S Tyears (26) where T years is the total number of years of investment which is roughly 22-years for the NYSE and 13-years for the S&P500 datasets respectively, S Tyears is the wealth gathered at the end of T years by an algorithm. As expected, ANTI 30 has the highest APY for all the datasets, with MA EG with MA ONS following close behind. On the S&P 500 dataset, while ANTI 30 achieves an APY of almost 221%, MA EG s APY is 176%, which is far past ONS s APY of 57%, the best amongst the universal algorithms. Table 2: Annual Percentage Yield Algorithm NYSE SP500 NYSE 1 SP500 1 UP EG ONS UCRP Anticor AdaptiveFTL l MA EG MA ONS Figures 2 and Figure 3 visually depict the performance of the algorithms in terms of the wealth growth. Due to the exceptionally large wealth growth factor of Anticor 30 and the meta algorithms, Figures 2(a) and (b)and Figure 3(b) show the logarithmic wealth growth. Additional Experiments: Based on the success of the meta algorithms based on good heuristics, we ran additional experiments to investigate two aspects: (i) Can we design even better heuristics as base algorithms and consequently improve the performance of the meta algorithms? and (ii) Are there heuristic meta algorithms which have no guarantees but outperform MA EG and MA ONS empirically? The experiments reported above worked with a fixed window size w for Anticor algorithm which was taken to be 30. Since it is not possible to ascertain what window size of Anticor algorithm will work best for a dataset, 10
13 we introduce a variant of the Anticor called the BAH(Anticor W ), which uses a Buy-and-Hold strategy to combine multiple Anticor algorithms with different window sizes w W. The BAH(Anticor W ) was found to perform better than the original Anticor w. Figure 4 shows that BAH(Anticor 30 )) beats Anticor 30 and of course EG, UP and ONS by a good margin on the NYSE as well as the S&P 500 dataset. BAH(Anticor W ) was added to the pool of existing base algorithms for the meta algorithms. We ran the meta algorithms MA EG and MA ONS with BAH(Anticor 30 ) in the pool. Figure 5 shows that the wealth achieved by MA EG and MA ONS with BAH(Anticor 30 ) is almost as much as BAH(Anticor 30 ) itself. Further, both meta algorithms now outperform Anticor 30. Since Anticor performed well as a base algorithm, we ran experiments using Anticor as a meta algorithm. MA Anticor simply redistributes wealth among the pool of base algorithms following the Anticor updates based on the wealth relatives of the individual algorithms. We experimented with different window lengths, and report results with a window length of 5, as this version was observed to perform reasonably well. The performance of MA Anticor was seen to decrease as the window length was increased beyond 10 days. MA Anticor had no observed performance improvement over MA EG and MA ONS on both the NYSE and S&P500 dataset. Figure 5 shows that the performance of the MA Anticor is actually inferior than both MA EG and MA ONS. Interestingly, the reversal to mean strategy does not seem as effective when run on base algorithms. The marked performance improvement with the Buy-and-Hold strategy over the Anticor algorithm with different window lengths inspired the final set of experiments where a Buy-and-Hold strategy was used as a meta algorithm. We ran a Buy-and-Hold version called MA BAH with UP, EG, ONS, AdaptiveFTL 5, Anticor 30 and BAH(Anticor 30 ) as base Algorithms. The results have been plotted in Figure 5. We observe that although multiplicative wealth gain of MA BAH is more than Anticor 30, but it is less than that of BAH(Anticor 30 ). More interestingly, MA EG and MA ONS both outperform MA BAH. Moving money to well performing algorithms explicitly, as done by MA EG and MA ONS, seems to be more effective than simply holding, as done by MA BAH, or moving money out assuming reversal to the mean, as done by MA Anticor. The experiments reveal interesting differences among effectiveness of strategies at the level of base algorithms where one works with stocks, and at the level of meta algorithms where one works with algorithms. 5 Conclusions In this paper, we have presented new meta algorithms for portfolio selection. The key motivation behind our work was the poor practical performance of universal algorithms such as UP, EG, and ONS. We wanted to take advantage of non-universal heuristics but at the same time hold on to the theoretical guarantees of the universal algorithms. We presented new meta algorithms which maintain a pool of base algorithms and move money around using the EG-update and ONS-update respectively. With the help of theoretical analysis we were able to show that the meta algorithms will be competitive with the best base algorithm. Also, if we include a universal algorithm in the pool, the meta algorithms can be shown to be competitive with the best CRP in hindsight (universal). With a universal algorithm in the pool, MA EG, achieved O( T ) regret, while MA ONS achieved O(log T ) regret w.r.t. the best CRP. This opens the door to including any number of good heuristics in the pool of base algorithms and still guarantee low regret. By comprehensive experiments on the NYSE and the S&P500 datasets and their variations we were able to exhibit the overwhelming performance improvement of the meta algorithms over the existing universal algorithms such as UP, EG, and ONS. The meta algorithms presented in this paper are the first of its kind in combining exceptional empirical performance with strong theoretical bounds, truly bringing the best of both worlds together. The existing literature on online portfolio selection algorithms [6, 11, 2, 1], does not take into account the commission one has to pay while trading. Most of these algorithms trade every stock every day which is not practical as one can incur huge amount of commission costs. As part of our future work, we would like to investigate if a sparse version of the meta algorithm can take care of commissions and yet achieve 11
14 good empirical performance. Further, the current models for on-line portfolio selection do not model risk. Modeling risk and taking account of volatility of stocks is another interesting direction for our future work. Acknowledgements: This research was supported by NSF CAREER grant IIS , and NSF grants IIS and IIS References [1] V. Gogan A. Borodin, R. El-Yaniv. Can we learn to beat the best stock. Journal of Artificial Intelligence Research, 21: , [2] A. Agarwal, E. Hazan, S. Kale, and R. Schapire. Algorithms for portfolio management based on the newton method. Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 9 16, [3] A. Blum and A. Kalai. Universal portfolios with and without transaction costs [4] A. Blum and A. Kalai. Universal portfolios with and without transaction costs [5] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, [6] T. Cover. Universal portfolios. Mathematical Finance, 1:1 29, [7] T. Cover and E. Ordentlich. Universal portfolios with side information. IEEE Transactions of Information Theory, 42: , [8] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, [9] E. Hazan, A. Agarwal, and S. Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3): , [10] E. Hazan, A. Kalai, Satyen Kale, and A. Agarwal. Logarithmic regret algorithms for online convex optimization. Proceedings of the 19th Annual Conference on Learning Theory, pages , [11] D. Helmbold, E. Scahpire, Y. Singer, and M. Warmuth. Online portfolio setection using multiplicative weights. Mathematical Finance, 8(4): , [12] A. Kalai and S. Vempala. Efficient algorithms for universal portfolios. Journal of Machine Learning Research, 3(3): , [13] J. L. Kelly. A new interpretation of information rate. Bell Systems Technical Journal, 35: , [14] J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 64, [15] H. Markowitz. Portfolio selection. Journal of Finance, 7:77 91, [16] E. Ordentlich T.M. Cover. Universal portfolio with side information. IEEE TRansactions on Information Theory, 42(2),
15 A Analysis of MA ONS We present an analysis of the ONS update which leads to better bounds compared to that in the literature [2], in particular in terms of the dependence on the number of stocks n. Since w t is the distribution of wealth among the m base algorithms, w t m, the m-simplex. Let and g t (w) = f t (w). If w t is the wealth distribution at time t, then let f t (w) = log(w T r t ) (27) t = f t (w t ) = 1 w T t r t r t. (28) Note that the second derivative 2 1 f t (w t ) = (wt T r t ) r tr T 2 t = t T t. (29) Consider the auxiliary function which satisfies h t (w t ) = log(w T t r t ) = f t (w t ). h t (w) = log(w T t r t ) + T t (w w t ) β 2 ( T t (w w t )) 2, (30) Lemma 1 For any β l 16 < 1/2, f tw) h t (w), i.e., log(w T r t ) log(w T t r t ) + T t (w w t ) β 2 (w w t) T t T t (w w t ). (31) ( ) 2β. 1 Proof: Let g t (w) = exp( 2βf t (w)) = w T r t Note that gt (w) = 2β f t (w) exp( 2βf t (w)). Further, since β < 1/2, implying g t (w) is concave. Hence, 2 g t (w) = 2β(2β 1) f t (w) f t (w) T exp( g t (w)) 0, g t (y) g t (w) + (y w) T g t (w) exp( 2βf t (y)) exp( 2βf t (w))(1 2β(y w) T f t (w)). Taking log on both sides and rearranging, we have f t (y) f t (w) 1 2β log(1 2β(y w)t f t (w)). From Holder s inequality, we have 2β(w y) T f t (y) 2β w y 1 f t (y) 2β( w 1 + y 1 ) rt y T r t 2β 2 l 1 4 since β l 16. For z 1/4, log(1 z) z z2. Using the above inequality with z = β(y x) T f t (w), we obtain f t (y) f t (w) + (y w) T f t (w) + β 2 ((y w)t f t (w)) 2. Setting y = w t and rearranging terms we obtain f t (w) f t (w t ) + (w w t ) T t β 2 (w w t) T t T t (w w t ). 13
16 That completes the proof. Since log(w T r t ) log(wt T r t ) = h t (w, w t ) h t (w t, w t ) T t (w w t ) β 2 (w w t) T t T t (w w t ) it is sufficient to upper bound the RHS. Following [3], we define t A t = τ T τ + I = A t 1 + t T t. (32) τ=1 Let φ t (x) = w T A t x. Note that φ t (w) is convex, and the corresponding Bregman divergence is given by d φt (w, y) = (w y) T A t (w y). (33) The Online Newton Step (ONS) algorithm proceeds by taking a Newton step from the current solution x t and then projecting it back to the simplex m. Let y t+1 denote the result of an unconstrained Newton step: y t+1 = w t 1 β A 1 t t. (34) Then, the next wealth distribution x t+1 is the Bregman projection of y t+1 into the simplex, i.e., w t+1 = argmin w d φ (w, y t+1 ). We are now ready to state the main regret result for ONS: Theorem 3 For any sequence of wealth relatives r t with r t (j) [l, 1], for any β l/16, and the no junk algorithm assumption, the ONS algorithm has the following regret: max log(w T r t ) log(wt T r t ) m ( ) mt w β log l β. (35) In particular, for β = l/16, the regret is given by max w t t log(w T tr t ) log(w T r t ) 16m l ( ) mt log l l 4. (36) Note that the regret bound is stronger than the existing bound on ONS for the portfolio selection problem [2]. In particular, the regret bound in [?] has a dependency on n 3/2 whereas the bound above has a dependency on n. Thus, while the algorithm is still based on ONS, our analysis leads to a mildly better regret bound. We need the following linear algebraic result (Lemma 11 in [3]) for our proof: Lemma 2 Let t = f t (w t ) = 1 w r t r t with r t (i) [l, 1]. Further, let A t = t τ=1 τ τ + I. Then, ( ) mt T t A 1 t t m log l (37) 14
17 Proof of Theorem: Let w be the best wealth distribution in hindsight. Recall that f t (w ) f t (w t ) T t (w w t ) β 2 (w w t) T t T t (w w t ), (38) where we are going to denote the RHS as R t, the regret at the t(th step. From the unconstrained update of the ONS algorithm, note that A t y t+1 = A t x t 1 β t t = β( φ t (y t+1 ) φ t (w t )). From the three point property of Bregman divergences, we know for any vectors x, y, z, For any w, (x y) T ( φ(z) φ(y)) = d φ (x, y) d φ (x, z) + d φ (y, z). (39) (w w t ) T t = β(w w t ) T ( φ t (y t+1 ) φ t (w t )) = β[d φt (w, w t ) d φt (w, y t+1 ) + d φt (w t, y t+1 ) β[d φt (w, w t ) d φt (w, w t+1 ) + d φt (w t, y t+1 )], where the last inequality follows from the fact x t+1 is the Bregman projection of y t+1 onto so that d φt (w, w t+1 ) d φt (w, y t+1 ). Now, note that d φ (w t, y t+1 ) = (w t y t+1 ) T A t (w t y t+1 ) = 1 β 2 T t A 1 t A t A 1 t t = 1 β 2 T t A 1 t t. Replacing the other Bregman divergences according to definition, summing over t = 1,..., T and simplifying, we obtain (w w t ) T t 1 β β 2 (w w T +1 )A T (w w t+1 ) T t A 1 t t + β 2 (w w 1 ) T A 1 (w w 1 ) + β 2 (w w t ) T (A t A t 1 )(w w t ) t=2 1 β T t A 1 t t + β (w w t ) T t T t (w w t ) + β(w w 1 )(A 1 1 T 1 )(w w 1 ), where we have used A t A t 1 = t T t and (w w T +1 ) T A T (w w T +1 ) 0. By transferring the middle term to the LHS, we obtain an expression for T R t. In particular, f t (w ) f t (w t ) R t 1 β T t A 1 t t + β(w w 1 )(A 1 1 T 1 )(w w 1 ). Following Lemma 2, we have an upper bound on the first term. Focusing on the second term, we note that A 1 1 T 1 = I, and so w w 1 2 ( w w 1 1 ) 2 ( w 1 + w 1 1 ) 2 = 4. Plugging these back in, the regret is given by f t (w ) f t (w t ) m ( ) mt β log l β. (40) That completes the proof. 15
18 Logarithmic Wealth Growth EG UP ONS Anticor Wealth Gathered on the NYSE dataset Logarithmic Wealth Growth Year EG UP ONS Anticor (a) NYSE Wealth Gathered on the S&P500 dataset Year (b) S&P 500 Figure 1: Wealth gathered by Anticor exceeds UP, EG, and ONS on NYSE and S&P500 datasets. 16
19 Logarithmic Wealth Growth EG UP UCRP AdaptiveFTL 5 Anticor 30 ONS MA EG MA ONS Monetary returns on the NYSE dataset Logarithmic Wealth Growth Year EG UP UCRP AdaptiveFTL 5 Anticor 30 ONS MA EG MA ONS (a) Monetary returns on NYSE Monetary returns on the S&P500 dataset Year (b) Monetary returns on S&P500 Figure 2: Monetary returns with the meta algorithms, MA EG and MA ONS is competitive with the best base algorithm (Anticor 30 ) on original datasets (best viewed in color). 17
20 Wealth Growth EG UP UCRP AdaptiveFTL 5 ANTI 30 ONS MA EG MA ONS Monetary returns on reverse NYSE Logarithmic Wealth Growth Year EG UP UCRP AdaptiveFTL 5 Anticor 30 ONS MA EG MA ONS (a) Monetary returns on NYSE 1. Monetary returns on the reverse S&P Year (b) Monetary returns on S&P500 1 Figure 3: Monetary returns of the base algorithms and meta algorithms on reverse datasets (best viewed in color). 18
21 Logarithmic Wealth Growth EG UP ONS Anticor 30 BAH(Anticor 30 ) Wealth Gathered on the NYSE dataset Year (a) Monetary returns on NYSE. Logarithmic Wealth Growth EG UP ONS Anticor BAH(Anticor 30 ) Wealth gathered on the S&P500 dataset Year (b) Monetary returns on S&P500 Figure 4: Monetary returns of BAH(Anticor 30 ) for $1 investment, exceeds UP, EG, ONS and Anticor 30 (best viewed in color). 19
22 Logarithmic Wealth Growth Anticor 30 BAH(Anticor 30 ) MA EG MA ONS MA BAH MA Anticor5 Monetary returns on the NYSE dataset Logarithmic Wealth Growth Year Anticor 30 BAH(Anticor 30 ) MA EG MA ONS MA BAH MA Anticor5 (a) Monetary returns on NYSE. Monetary returns on the S&P500 dataset Year (b) Monetary returns on S&P500 Figure 5: Monetary returns of the meta algorithms(ma EG, MA ONS, MA Anticor, MA BAH ) for $1 investment, when BAH(Anticor 30 ) is added to the pool of base algorithms. Multiplicative wealth gain of MA EG, MA ONS and MA BAH exceeds that of Anticor 30 (best viewed in color). 20
Meta Optimization and its Application to Portfolio Selection
Meta Optimization and its Application to Portfolio Selection Puja Das Dept of Computer Science & Engg Univ of Minnesota, win Cities pdas@cs.umn.edu Arindam Banerjee Dept of Computer Science & Engg Univ
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.
More informationEfficient Algorithms for Universal Portfolios
Efficient Algorithms for Universal Portfolios Adam Kalai CMU Department of Computer Science akalai@cs.cmu.edu Santosh Vempala y MIT Department of Mathematics and Laboratory for Computer Science vempala@math.mit.edu
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationNo-Regret Algorithms for Unconstrained Online Convex Optimization
No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Online Learning Repeated game: Aim: minimize ˆL n = Decision method plays a t World reveals l t L n l t (a
More informationInternal Regret in On-line Portfolio Selection
Internal Regret in On-line Portfolio Selection Gilles Stoltz (gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi (lugosi@upf.es)
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationA survey: The convex optimization approach to regret minimization
A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization
More informationExponential Weights on the Hypercube in Polynomial Time
European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts
More information0.1 Motivating example: weighted majority algorithm
princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationLecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are
More informationAdaptivity and Optimism: An Improved Exponentiated Gradient Algorithm
Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationThe Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm
he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems
More informationQuasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization. Mark Franklin Godwin
Quasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization by Mark Franklin Godwin A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationLogarithmic Regret Algorithms for Strongly Convex Repeated Games
Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600
More informationPortfolio Optimization
Statistical Techniques in Robotics (16-831, F12) Lecture#12 (Monday October 8) Portfolio Optimization Lecturer: Drew Bagnell Scribe: Ji Zhang 1 1 Portfolio Optimization - No Regret Portfolio We want to
More informationInternal Regret in On-line Portfolio Selection
Internal Regret in On-line Portfolio Selection Gilles Stoltz gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi lugosi@upf.es)
More informationAdaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationRegret bounded by gradual variation for online convex optimization
Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationExponentiated Gradient Descent
CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,
More informationSequential Investment, Universal Portfolio Algos and Log-loss
1/37 Sequential Investment, Universal Portfolio Algos and Log-loss Chaitanya Ryali, ECE UCSD March 3, 2014 Table of contents 2/37 1 2 3 4 Definitions and Notations 3/37 A market vector x = {x 1,x 2,...,x
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationExtracting Certainty from Uncertainty: Regret Bounded by Variation in Costs
Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301
More informationLecture: Adaptive Filtering
ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate
More informationAdvanced Machine Learning
Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.
More informationDynamic Regret of Strongly Adaptive Methods
Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret
More informationStochastic and Adversarial Online Learning without Hyperparameters
Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationLogarithmic Regret Algorithms for Online Convex Optimization
Logarithmic Regret Algorithms for Online Convex Optimization Elad Hazan 1, Adam Kalai 2, Satyen Kale 1, and Amit Agarwal 1 1 Princeton University {ehazan,satyen,aagarwal}@princeton.edu 2 TTI-Chicago kalai@tti-c.org
More informationarxiv: v4 [math.oc] 5 Jan 2016
Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The
More informationExtracting Certainty from Uncertainty: Regret Bounded by Variation in Costs
Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationOn-line Variance Minimization
On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06
More informationRegret Bounds for Online Portfolio Selection with a Cardinality Constraint
Regret Bounds for Online Portfolio Selection with a Cardinality Constraint Shinji Ito NEC Corporation Daisuke Hatano RIKEN AIP Hanna Sumita okyo Metropolitan University Akihiro Yabe NEC Corporation akuro
More informationCS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm
CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the
More informationWorst-Case Analysis of the Perceptron and Exponentiated Update Algorithms
Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Tom Bylander Division of Computer Science The University of Texas at San Antonio San Antonio, Texas 7849 bylander@cs.utsa.edu April
More informationOptimization for Machine Learning
Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United
More information1 Review and Overview
DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last
More informationOnline Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems
216 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 216, Las Vegas, USA Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems
More informationBregman Divergences for Data Mining Meta-Algorithms
p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationOnline prediction with expert advise
Online prediction with expert advise Jyrki Kivinen Australian National University http://axiom.anu.edu.au/~kivinen Contents 1. Online prediction: introductory example, basic setting 2. Classification with
More informationThe convex optimization approach to regret minimization
The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision
More informationOnline Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More informationOnline Kernel PCA with Entropic Matrix Updates
Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationLearning with Large Number of Experts: Component Hedge Algorithm
Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationInformation Choice in Macroeconomics and Finance.
Information Choice in Macroeconomics and Finance. Laura Veldkamp New York University, Stern School of Business, CEPR and NBER Spring 2009 1 Veldkamp What information consumes is rather obvious: It consumes
More informationNew bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationDelay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms
Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,
More informationLecture 23: Online convex optimization Online convex optimization: generalization of several algorithms
EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationLEM. New Results on Betting Strategies, Market Selection, and the Role of Luck. Giulio Bottazzi Daniele Giachini
LEM WORKING PAPER SERIES New Results on Betting Strategies, Market Selection, and the Role of Luck Giulio Bottazzi Daniele Giachini Institute of Economics, Scuola Sant'Anna, Pisa, Italy 2018/08 February
More informationGeometry of functionally generated portfolios
Geometry of functionally generated portfolios Soumik Pal University of Washington Rutgers MF-PDE May 18, 2017 Multiplicative Cyclical Monotonicity Portfolio as a function on the unit simplex -unitsimplexindimensionn
More informationStructured Online Learning with Full and Bandit Information
Structured Online Learning with Full and Bandit Information A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Nicholas Johnson IN PARTIAL FULFILLMENT OF THE
More informationAnother Look at the Boom and Bust of Financial Bubbles
ANNALS OF ECONOMICS AND FINANCE 16-2, 417 423 (2015) Another Look at the Boom and Bust of Financial Bubbles Andrea Beccarini University of Münster, Department of Economics, Am Stadtgraben 9, 48143, Münster,
More informationMinimax Policies for Combinatorial Prediction Games
Minimax Policies for Combinatorial Prediction Games Jean-Yves Audibert Imagine, Univ. Paris Est, and Sierra, CNRS/ENS/INRIA, Paris, France audibert@imagine.enpc.fr Sébastien Bubeck Centre de Recerca Matemàtica
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationAlgorithmic Stability and Generalization Christoph Lampert
Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts
More informationUNIVERSAL PORTFOLIO GENERATED BY IDEMPOTENT MATRIX AND SOME PROBABILITY DISTRIBUTION LIM KIAN HENG MASTER OF MATHEMATICAL SCIENCES
UNIVERSAL PORTFOLIO GENERATED BY IDEMPOTENT MATRIX AND SOME PROBABILITY DISTRIBUTION LIM KIAN HENG MASTER OF MATHEMATICAL SCIENCES FACULTY OF ENGINEERING AND SCIENCE UNIVERSITI TUNKU ABDUL RAHMAN APRIL
More informationOptimization, Learning, and Games with Predictable Sequences
Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationOnline Prediction: Bayes versus Experts
Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,
More informationMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania
More informationOnline Convex Optimization Using Predictions
Online Convex Optimization Using Predictions Niangjun Chen Joint work with Anish Agarwal, Lachlan Andrew, Siddharth Barman, and Adam Wierman 1 c " c " (x " ) F x " 2 c ) c ) x ) F x " x ) β x ) x " 3 F
More informationOnline Bounds for Bayesian Algorithms
Online Bounds for Bayesian Algorithms Sham M. Kakade Computer and Information Science Department University of Pennsylvania Andrew Y. Ng Computer Science Department Stanford University Abstract We present
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior
More informationAdaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade
Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random
More informationA Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints
A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering
More informationOnline Aggregation of Unbounded Signed Losses Using Shifting Experts
Proceedings of Machine Learning Research 60: 5, 207 Conformal and Probabilistic Prediction and Applications Online Aggregation of Unbounded Signed Losses Using Shifting Experts Vladimir V. V yugin Institute
More informationOnline Optimization : Competing with Dynamic Comparators
Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationOn the Worst-case Analysis of Temporal-difference Learning Algorithms
Machine Learning, 22(1/2/3:95-121, 1996. On the Worst-case Analysis of Temporal-difference Learning Algorithms schapire@research.att.com ATT Bell Laboratories, 600 Mountain Avenue, Room 2A-424, Murray
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationThe Free Matrix Lunch
The Free Matrix Lunch Wouter M. Koolen Wojciech Kot lowski Manfred K. Warmuth Tuesday 24 th April, 2012 Koolen, Kot lowski, Warmuth (RHUL) The Free Matrix Lunch Tuesday 24 th April, 2012 1 / 26 Introduction
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationWeek 2 Quantitative Analysis of Financial Markets Bayesian Analysis
Week 2 Quantitative Analysis of Financial Markets Bayesian Analysis Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October
More informationVariable Metric Stochastic Approximation Theory
Variable Metric Stochastic Approximation Theory Abstract We provide a variable metric stochastic approximation theory. In doing so, we provide a convergence theory for a large class of online variable
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationLecture 19: Follow The Regulerized Leader
COS-511: Learning heory Spring 2017 Lecturer: Roi Livni Lecture 19: Follow he Regulerized Leader Disclaimer: hese notes have not been subjected to the usual scrutiny reserved for formal publications. hey
More informationOnline Submodular Minimization
Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com
More information