Meta Algorithms for Portfolio Selection. Technical Report

Size: px
Start display at page:

Download "Meta Algorithms for Portfolio Selection. Technical Report"

Transcription

1 Meta Algorithms for Portfolio Selection Technical Report Department of Computer Science and Engineering University of Minnesota EECS Building 200 Union Street SE Minneapolis, MN USA TR Meta Algorithms for Portfolio Selection Puja Das and Arindam Banerjee September 20, 2010

2

3 Meta Algorithms for Portfolio Selection Puja Das Dept of Computer Science & Engg University of Minnesota, Twin Cities Minneapolis, MN Arindam Banerjee Dept of Computer Science & Engg University of Minnesota, Twin Cities Minneapolis, MN Abstract We consider the problem of sequential portfolio selection in the stock market. There are theoretically well grounded algorithms for the problem, such as Universal Portfolio (UP), Exponentiated Gradient (EG) and Online Newton Step (ONS). Such algorithms enjoy the property of being universal, i.e., having low regret with the best constant rebalanced portfolio. However, the practical performance of such popular algorithms is sobering compared to heuristics such as Anticor, which have no theoretical guarantees but perform surprisingly well in practice. Motivated by such discrepancies, in this paper we focus on designing meta algorithms for portfolio selection which can leverage the best of both worlds. Such algorithms work with a pool of base algorithms and use online learning to redistribute wealth among the base algorithms. We develop two meta-algorithms: MA EG which uses online gradient descent following EG and MA ONS which uses online Newton step following ONS. If one of the base algorithms is universal, it follows that the meta-algorithm is universal. Through extensive experiments on two real stock market datasets, we show that the meta-algorithms are competitive and often better than the best base algorithm, including heuristics, while maintaining the guarantee of being an universal algorithm. 1 Introduction Algorithms for automatically designing portfolio based on historical stock market data has been extensively investigated in the literature for the past five decades [15, 13]. Earlier literature on the topic advocated a statistical treatment of the problem, and important results were established [13, 8, 5]. With the realization that any statistical assumptions regarding the stock market may be inappropriate and eventually counterproductive, over the past two decades new methods of portfolio selection have been designed which make no statistical assumptions regarding the movement of stocks [6, 7, 11]. In a well-defined technical sense, such methods are guaranteed to perform competitively with certain families of adaptive portfolios even in an adversarial market. From the theoretical perspective, algorithm design for portfolio selection has been largely a success story [6, 11, 5]. The motivation of this paper comes from sobering practical performance of theoretically well motivated algorithms. Sometimes even simple heuristic based approaches can outperform these theoretically motivated strategies in terms of empirical performance. Heuristics which can gather admirable amounts of wealth in one market run the risk of stark poor performance in an adversarial situation. Ideally, we would want portfilio selection algorithms which have good empirical performance but at the same time have theoretical guarantees with regards to their performance. In this paper we bring both of these worlds together by designing meta algorithms for on-line portfolio selection. The objective of any on-line portfolio selection algorithm is maximization of wealth. But absolute wealth maximization is not possible due to the temporal nature of the problem. Recent theoretically motivated online algorithms usually demonstrate competitiveness with a class of target strategies called Constant 1

4 Rebalanced Portfolios (CRP). A CRP ensures a fixed proportion of wealth investment amongst the stocks by daily redistribution of wealth. The performance of a theoretically motivated online strategy is usually measured by regret, the difference in logarithmic wealth between the online strategy and the best CRP in hindsight. It is important to note that while the best CRP in hindsight takes advantage of knowing the market beforehand and hence forms a strong baseline, it cannot be implemented in practice. The regret analysis provides a lower bound for the worst performance of an on-line algorithm w.r.t. the best CRP in hindsight. Of the theory based approaches that require mention here are Universal Portfolios(UP) [6, 7], Exponentiated Gradient (EG) [11] and Online Newton Step (ONS) [3]. UP achieves O(log T ) regret, EG achieves O( T ) regret under the no-junk-bond assumption (to be explained shortly), and ONS achieves O(log T ) regret under the same assumption. In recent work, [1] proposed a heuristic approach called Anticor which was shown to beat the theoretically motivated EG, UP and the best CRP in practice on several datasets by an overwhelming margin. Therefore, it seems logical for an investor to use Anticor and maximize his wealth. The reason why one cannot use Anticor without any qualms is due to the absence of any worst case performance guarantees. Although Anticor does well on certain datasets, a real time market sequence can prove to be unsuitable and even adversarial for Anticor. While there is always the temptation to maximize wealth based on a heuristic that has worked well on certain markets, it is highly desirable to have worst case performance guarantees. In this paper, we introduce new meta algorithms for online portfolio selection which achieve the balance between good empirical performance and theoretical guarantees. In particular, we present two meta algorithms, their difference owing only to the weight update strategy: MA EG, relying on gradient-based updates inspired by EG and MA ONS, relying on Newton-step based updates inspired by ONS. The meta algorithms maintain a pool of base algorithms and shifts wealth to the better performing base algorithms by online learning over time. The novelty is their ability to be competitive with the best of heuristics in empirical performance while having theoretical guarantees, i.e., regret bounds w.r.t. the best CRP in hindsight. The meta algorithm achieves O( T ) regret with the EG-based update under the no-junk-algorithms assumption (to be explained shortly). With the ONS-based update, it can achieve O(log T ) regret under the same assumption. Through comprehensive experiments on two historical stock market datasets, spanning 14 and 22 years at a daily resolution, we show that the two versions of the meta algorithm outperform the existing algorithms with theoretical guarantees by several order of magnitude while maintaining the exact same guarantees The rest of the paper is organized as follows. In Section 2, we review the relevant literature on portfolio selection. Specifically we introduce UP, EG, ONS, and Anticor. We present the new meta algorithms (MA EG and MA ONS ) and analyze their properties in Section 3. In Section 4 we demonstrate the effectiveness of the meta algorithms through empirical results on the widely used NYSE dataset and a new S&P500 dataset. We conclude in Section 5 with a discussion and directions for future work. 2 Related Work We consider a stock market consisting of n stocks {s 1,..., s n } over T periods. For ease of exposition, we will consider a period to be a day, but the analysis presented in the paper holds for any valid definition of a period, such as an hour or a month. Let x t (i) denote the price relative of stock s i in day t, i.e., the multiplicative factor by which the price of s i changes in day t. Hence, x t (i) > 1 implies a gain, x t (i) < 1 implies a loss, and x t (i) = 1 implies the price remained unchanged. Further, x t (i) > 0 for all i, t. Let x t = x t (1),..., x t (n) denote the vector of price relatives for day t, and let x 1:t denote the collection of such price relative vectors upto and including day t. A portfolio p t = p t (1),..., p t (n) on day t can be viewed as a probability distribution over the stocks that prescribes investing p t (i) fraction of the current wealth in stock x t (i). Note that the portfolio p t has to be decided before knowing x t which will be revealed only at the end of the day. The multiplicative gain in wealth at the end of day t, is then simply p T t x t = n i=1 p t(i)x t (i). 2

5 For a sequence of price relatives x 1:t 1 = {x 1,..., x t 1 } upto day (t 1), the sequential portfolio selection problem in day t is to determine a portfolio p t based on past performance of the stocks. At the end of day t, x t is revealed and the actual performance of p t gets determined by p T t x t. Over a period of T days, for a sequence of portfolios p 1:T = {p 1,..., p T }, the multiplicative gain in wealth is then S(p 1:T, x 1:T ) = T ( ) p T t x t. (1) In the literature, one often looks at the logarithm of the multiplicative gain, given by LS(p 1:T, x 1:T ) = log ( p T ) t x t. (2) Ideally, we would like to maximize S(p 1:T, x 1:T ) over x 1:T. Unfortunately, portfolio selection cannot be posed as an optimization problem due to the temporal nature of the choices: x t is not available when one has to decide on p t. Further, in a stock market, (statistical) assumptions regarding x t can be difficult to make. Existing literature for theoretically motivated portfolio selection [6, 11, 5] focus on designing algorithms whose overall gain in wealth is guaranteed to be competitive with reasonable strategies to the problem. Constant Rebalanced Portfolios: One goal could be to compete against the best Constant Rebalanced Portfolio (CRP). The CRP investment strategy maintains a fixed fraction of total wealth in each of the stocks. So, a CRP has a fixed portfolio vector p crp = p(1),..., p(n) which is employed every day. Such a strategy requires vast amounts of trading every day to ensure that the investment proportions are rebalanced back to the vector p crp. To understand the strength of this family of strategies, consider the example of a market consisting of two stocks. The first stock is a no-growth stock and has price relatives 1, 1, 1, 1... over time. The second stock doubles in value on even days and on odd days its value gets halved. So, its price relatives are 1 2, 2, 1 2, 2,.... The sequence of market vectors in this case is (1, 1 2 ), (1, 2), (1, 1 2 ), (1, 2), (1, 1 2 ),.... A Buy-and-Hold strategy will not make any gains in this market. On the other hand, if we use a CRP of p crp = ( 1 2, ) 1 2, then the growth of wealth every 2 days will be 9 8 so that after T days, the multiplicative gain in wealth will be ( 9 T/2. 8) It is easy to see that the wealth accumulated by the best CRP will be at least as big as that by the best Buy-and-Hold strategy and hence by the single best stock. CRP is also known to have certain optimality properties when certain statistical assumptions regarding the price relatives can be made [8]. Universal Algorithms: Theoretically motivated strategies are guaranteed to be competitive with the best CRP by achieving small regret. For any sequence x 1,..., x T of price relatives, let p 1,..., p T be the sequence of portfolios selected by the algorithm. The regret of a portfolio selection algorithm ALG is given by: Regret(ALG) max p log ( p T ) T x t log ( p T ) t x t. (3) An investment strategy is deemed universal if it has sublinear regret, i.e., Regret(ALG) = o(t ). We now briefly review a set of important sequential portfolio selection strategies from the literature, both universal and non-universal. Universal Portfolios (UP): The seminal work of Cover [6] introduced Universal Portfolios (UP), the first algorithm which can be shown to be competitive with the best CRP. The algorithm has since been extended to various practical scenarios, such as investment with side information [7]. The key idea behind UP is to maintain a distribution over all CRPs and perform a Bayesian update after observing every x t. Since each CRP q is a distribution over n stocks and hence lies in the n-simplex, one uses a distribution µ(q) over the n-simplex. A popular choice is the Dirichlet prior µ(q) = Dir( 1 2,..., 1 2 ) over the simplex. For any CRP q, let S t 1 (q, x 1:t 1 ) = t 1 t =1 log ( q T x t ) denote the wealth accumulated by q over (t 1) days. Then, the 3

6 universal portfolio p t is defined as the weighted average over all such q: q p t (i) = q(i)s t 1(q, x 1:t 1 )µ(q)dq q S. (4) t 1(q, x 1:t 1 )µ(q)dq UP has a regret of O(log T ) with respect to the best CRP in hindsight. However, the updates for UP are computationally prohibitive. Discrete approximation or recursive series expansion are used to evaluate the above integrals. However, in either case, the time and space complexity for finding the new universal portfolio vector grows exponentially in the dimensionality of the simplex, i.e., number of stocks. Exponentiated Gradient (EG) portfolios: The key motivation behind the Exponentiated Gradient (EG) strategy [11] was to design a computationally efficient portfolio selection algorithm which stays competitive with the best CRP. The EG algorithm scales linearly with the number of stocks and the portfolios are guaranteed to stay competitive with the best CRP. However, the regret is O( T ) under the no-junk-bond assumption and is weaker than that of UP. The no-junk-bond assumption states that all the x t (i) are bounded from below such that x t (i) > α > 0 for all t. 1 The EG investment strategy was introduced and analyzed by [11]. Their framework for updating a portfolio vector is analogous to the framework developed by [14] for online regression. In the online learning framework, the portfolio vector itself encapsulates the necessary information from all previous price relatives. At the start of day t, the algorithm computes its new portfolio vector p t such that it stays close to p t 1 and does well on the price relatives x t 1 for the previous day. In particular, the new portfolio vector p t is chosen so as to maximize F (p t ) = η log(p T t x t ) KL(p t, p t 1 ), (5) where η > 0 is a parameter called the learning rate and KL(, ) is the KL-divergence ensuring p t stays close to p t 1. Using an approximation of F based on Taylor expansion, the updated portfolio turns out to be ( ) p t 1 (i) exp η xt 1(i) p p t (i) = T t 1 xt 1 ). (6) n i =1 p t 1(i ) exp (η xt 1(i ) p T t 1 xt 1 Note that p T t 1x t 1 is the average price relative, and the wealth allocated to stock i relies on the ratio x t 1(i). In particular, if the price relative for a stock is greater than the average in a particular round, the p T xt 1 t 1 investment on that stock is increased accordingly. Online Newton Step Method (ONS): The Online Newton Step (ONS) method for portfolio selection is an application of the Newton step to the online setting. Recent work has shown that the ONS approach can be used in online convex optimization to achieve sharper regret bounds compared to online gradient descent based methods [10, 2, 9]. Let p t is a portfolio vector, such that p t n, the n-simplex. The ONS algorithm uses following portfolio update method for round t > 1: p t = ( A t 1 p t 1 1 ) β A 1 t 1 t 1 (7) n where t = [log(p t. x t )] = 1 p t x t x t, A t = t τ=1 τ τ + I, β is a non-negative constant, and A t 1 n the projection onto the nsimplex n according to the norm induced by A t 1, i.e., is At 1 n (y) = argmin (y x) T A t 1 (y x). (8) x n Under the no-junk-bond assumption, ONS achieves a O(log T ) regret. 2 ONS is computationally more efficient than UP. ONS has a better regret bound than EG, but is however less efficient than EG. 1 The no-junk-bond assumption can be removed with a more advanced analysis yielding a regret of O(T 3/4 ) [11]. 2 Without the no junk bond assumption, the regret of ONS is O( T ). 4

7 Anticor: Anticor (AC) is a heuristic which does not confirm to the universal property for portfolio selection algorithms [1]. In AC, learning the best stocks (to invest money in) is done by exploiting the volatility of the market and the statistical relationship between the stocks. It implements the reversal to the mean market phenomenon rather aggressively. An important parameter for Anticor is the window length w. The version of Anticor implemented works with two most recent windows of length w. The strategy is to move money from a stock i to stock j if the growth rate of stock i is greater than the growth rate of j in the most recent window. An additional condition that needs to be satisfied is the existence of a positive correlation between stock i in the second last window and stock j in the last window. The satisfiability of this condition is an indicator that stock j will replicate stock i s past behavior in the near future. The amount of money that is transferred from stock i to stock j depends on the strength of correlation between the stocks and the strength of self-anti-correlations between each stock i over two consecutive windows. For a window length of w, LX 1 and LX 2 are defined as LX 1 = [log(x t 2w+1 ),, log(x t w )] T and LX 2 = [log(x t w+1 ),, log(x t )] T. Thus LX 1 and LX 2 are two w n matrices over two consecutive time windows. The j th column of LX k is denoted by LX k (j), which tracks the performance of stock j in window k. Let µ k (j) be the mean of LX k (j) and σ(k) be the corresponding standard deviation. The cross-covariance matrix between the column vectors of LX 1 and LX 2 is defined as follows: M cov (i, j) = 1 w 1 (LX 1(i) µ 1 (i)) T (LX 2 (j) µ 2 (j)). (9) The corresponding cross-correlation matrix is given by: M cor (i, j) = { Mcov(i,j) σ 1(i)σ 2(j) σ 1 (i), σ 2 (j) 0 0 otherwise. (10) Following the reversal to mean strategy, the proportion of wealth to be moved from stock i to stock j is defined as: C i j = M cor (i, j) + M cor (i, i) + M cor (j, j), (11) where x = max(0, x). The normalized transfer is defined as C i j T i j = p t (i) j C. (12) i j Using these transfer values, the portfolio is defined to be p t+1 (i) = p t (i) + j i(t j i T i j ). (13) For more details on the Anticor algorithm please refer to [1]. Preliminary Experiments: Experiments with different variations of Anticor in [1] brought to the fore the exceptional empirical performance improvement that a suitable heuristic can achieve over theoretically well grounded approaches. We ran Anticor with a window size of 30 (ANTI 30 ) on two datasets along with EG, UP and ONS. The two datasets used in Figure 1 are the historical NYSE dataset and the S&P500 dataset (description given in Section 4). For NYSE, Anticor s multiplicative gain is of the order of 10 6, more than four orders of magnitude larger than the wealth gathered by the best performing universal algorithm, in this case ONS. For S&P500, the results are similar. These results highlight the key discrepancy motivating our current work: The performance of theoretically well grounded universal algorithms are substantially worse compared to suitable heuristics such as Anticor. The meta-algorithms considered in the next section combines the good empirical performance of heuristics such as Anticor with the theoretical guarantees of universal algorithms. 5

8 3 Meta Algorithms The Meta Algorithms we consider maintain a pool of baseline portfolio selection algorithms. The meta algorithms were designed keeping in view the original objective of the paper: good empirical performance along with provably low regret bound with respect to a class of investment strategy, viz constant rebalanced portfolios (CRPs). The Meta Algorithms (referred to as MAs in the sequel) in spirit are also online weight update algorithms similar to the potfolio selection algorithms we came across earlier. We maintain a distribution over a set of base algorithms (BAs), each using its own approach to portfolio selection. In each step we observe the individual performance of the BAs and update our belief in them based on their performance. It is important to note that we can include any kind of portfolio selection strategy as a BA. In particular, the BAs need not necessarily be a provable universal strategy [6]. But in order for a MA to have low regret w.r.t. the best CRP in hindsight, at least one of the BAs should be universal. Let there be m base algorithms. Let r t denote the vector of wealth relatives at time t over all baseline algorithms, i.e., r t (j) is the multiplicative gain in wealth achieved by j th BA in time step t. If BA j maintained a portfolio p t,j over the stocks, then r t (j) = p T t,j x t, where x t is the vector of price relatives of the stocks under consideration. Without loss of generality, we assume r t (j) 1 for our analysis, since r t (j) u for any constant u will only add a constant additive term log u in the regret term. Further we work with a no junk algorithm assumption so that r t (j) l for some l > 0. Any MA maintains a distribution over the BAs. It adaptively moves weights to the algorithms that have shown good performance in the past, where performance here represents the wealth gathered by each of the BAs. In the context of the MAs, the BAs play the role of individual stocks, and the wealth relatives of the BAs correspond to the price relatives of stocks in the portfolio selection problem. As a result, the MAs can use one of the portfolio selection algorithms with regret guarantees to maintain a distribution over the BAs. Specifically we discuss two versions of MA: MA EG based on gradient based updates and MA ONS based on Newton updates. The total wealth accumulated by MA using wealth distributions w 1,, w T in T rounds over m base algorithms is and the logarithm of the wealth achieved is S(w 1:T, r 1:T ) = LS T (w 1:T, r 1:T ) = T (wt T r t ), (14) log(wt T r t ). (15) Analysis of MA EG : Algorithm 1 describes Meta Algorithm with EG based updates. Following the update property of EG [11], it can be shown that the difference in logarithmic wealth gathered by MA EG and any of its base algorithms is O( T ). The following theorem formalizes this statement. Theorem 1 Let u m be a distribution vector over the base algorithms, and let r 1,, r T be a sequence of wealth relatives with r t (j) l > 0 for all j, t and max j r t (j) = 1 for all t (no-junk-algorithms assumption). For η > 0 the logarithmic wealth due to the distribution vectors over the base algorithms produced by the MA EG is bounded from below as follows: log(u T r t ) t 1 log(w T t r t ) log m η + ηt 8l 2. (16) Furthermore, if w 1 is chosen to be the uniform proportion vector, and we set η = 2l 2 log m/t, then we have 2T log m log(u T r t ) log(wt T r t ). (17) 2l 6

9 Proof: The proof follows directly from the proof of Theorem 4.1 of [11] by replacing x t by r t and c by l. The w t is a distribution over the base algorithms instead of portfolio vector. What Theorem 1 implies is that we can include any number of heuristics or universal algorithms in the pool of BAs, and the wealth gathered by MA EG will be competitive with the best strategy. So if we create a pool with EG, UP, ONS, Anticor and run MA EG on the NYSE and S&P500 dataset, we will expect the wealth accumulated by MA EG to be close to that of Anticor. Corollary 1 MA EG will have a O( T ) regret with respect to any of the base algorithms. In particular, for j th base algorithm 2T log m log(p T t,jx t ) log(wt T r t ). (18) 2l Proof: The proof follows simply by using u in Theorem 1 to pick the j th base algorithm only, and substituting the wealth relative r t (j) by p T t,j x t. Consider an application of MA EG where one of the base algorithms is universal. In particular, assume that UP is used as a base algorithm. The above results readily imply that the corresponding MA EG will be competitive with the best CRP in hindsight. Corollary 2 Let r 1,, r T be a sequence of wealth relatives with r t (j) l > 0 for all j, t and max j r t (j) = 1 for all t. Let p be any CRP over the corresponding sequence of price relatives x 1,, x T. For η = 2l 2 log m/t and w 1 = uniform(1/m), the regret in log-wealth of MA EG w.r.t. to the best CRP in hindsight is bounded as follows max p 2T log m log(p T x t ) log(wt T r t ) + (n 1) log(t + 1). (19) 2l Proof: Corollary 1 implies that if we include UP which is competitive with the best CRP in hindsight in the pool of BAs, MA EG in turn will also be competitive with the best CRP in hindsight. From [16] we have that if p 1:T,UP denotes the sequence of portfolio vectors used by UP and p be any CRP on a sequence of price relatives x 1,, x T, then max p log(p T x t ) By adding (20) with (18) and considering the j th BA to be UP, we have log(p T t,up x t ) (n 1) log(t + 1). (20) max p 2T log m log(p T x t ) log(w t r t ) + (n 1) log(t + 1). (21) 2l Hence, we have shown that MA EG has O( T ) regret w.r.t. the best CRP in hindsight if UP (or any other universal algorithm) is included as one of the base algorithms. Analysis of MA ONS : Like MA EG, w t in MA ONS is a distribution on the base algorithms in round t. At each step, w t is learnt using the ONS-update [3]. Algorithm 2 describes the steps for the algorithm. An analysis of the ONS update yields the following regret bound for MA ONS. 7

10 Algorithm 1 MA EG : Meta Algorithm with EG-update Let w 1 = ( 1 m,, 1 m ) give the fraction of initial wealth invested in each of the base algorithms b 1 b m Let p t,j be the portfolio vector used by base algorithm b j in round t for t = 1, 2,... Receive price relatives x t Calculate r t (j) = p T t,j x t Update weights on each base algorithms b j as follows: w t+1 (j) = w t (j) exp(ηr t (j)/w T t r t ) m j=1 w t(j) exp(ηr t (j)/w T t r t ) Algorithm 2 MA ONS : Meta Algorithm with ONS-update Let w 1 = ( 1 m,, 1 m ) give the fraction of initial wealth invested in each of the base algorithms b 1 b m Let β = l 16 for t = 1, 2,... Receive price relatives x t Calculate r t (j) = p T t,j x t Calculate the new weight vector as follows w t+1 = ( A t x t 1 ) β A 1 t t (22) m where t = [log(p t. T tr t )] = 1 p r t, A T t = t t rt τ=1 τ τ +I and A t m is a projection in the norm induced by A t, i.e., At (y) = argmin (y x) T A t (y x) (23) m x m Theorem 2 For any sequence of wealth relatives r t with r t (j) [l, 1], for any β l/16, and the no junk algorithm assumption, MA ONS algorithm has the following regret: max log(w T r t ) log(wt T r t ) m ( ) mt w β log l β. (24) t The result follows from the analysis presented in [9]. However, note that the bound is mildly better than that in the literature [2]. For the sake of completeness, we present a detailed analysis in Appendix A. Following the lines of Corollary 2 for MA EG now leads us to the following result which shows that ONS EG will be competitive with the best CRP in hindsight. Corollary 3 Let r 1,, r T be a sequence of wealth relatives with r t (j) l > 0 for all j, t and max j r t (j) = 1 for all t. Let p be the best CRP in hindsight over the corresponding sequence of price relatives is x 1,, x T. For any positive β l/16, the regret of the logarithmic wealth of MA ONS w.r.t. the best CRP in hindsight is bounded as follows log(p T x t ) log(wt T r t ) ( ) m β + n log(t + 1) + m β log m + 4β. (25) l2 8

11 Proof: The proof is similar to that of Corollary 2 with some additional calculations combining terms involving m and n. Thus, the regret of MA ONS is O(log T ) w.r.t. the best CRP in hindsight, whereas the regret of MA EG is O( T ). 4 Experiments and Results In this section, we discuss experimental results performed with different sets of base algorithms and different strategies for meta algorithms. Datasets: The experiments were conducted on two main datasets: the New York Stock Exchange dataset (NYSE), widely used in the literature [6, 11, 2], and a new Standard & Poor s 500 (S&P 500) dataset we created. The NYSE dataset consists of 36 stocks with data at daily resolution accumulated over a period of 22 years from July 3rd 1962 to Dec 31st, The dataset captures the bear market that lasted between January 1973 and December All of the 36 stocks increase in value in the 22-year run. S&P 500 is a capitalization-weighted index of 500 large-cap common stocks actively traded in the United States. The index is used to measure the changes in the US economy through the aggregate market value of these 500 stocks representing all major industries. The S&P500 dataset that we used for our experiments consists of 385 stocks which were persistent in the S&P500 through January 1995 to November The time frame includes two major financial meltdowns, viz the dot-com bubble burst in March 2000, and the housing bubble burst in October In order to better understand the performance of the algorithms, we also ran experiments on these datasets in reverse. In particular, following [1], we reverse the day ordering and consider the reciprocals of the price relatives. The reverse datasets are denoted by a superscript -1, i.e., NYSE 1 and S&P Methodology: We ran a set of base algorithms and different meta algorithms on the datasets: NYSE, S&P500, NYSE 1, S&P500 1 ). The meta algorithms were run with a pool of base algorithms. The pool included UP, 3 EG, ONS among the universal algorithms, and Anticor, AdaptiveFTL r, and UCRP among non-universal algorithms. AdaptiveFTL r is a variant of the follow-the-leader strategy which uniformly distributes the wealth among the top-r stocks at any time point. UCRP is the uniform constant rebalanced portfolio which maintains equal proportion of welath in all stocks. Most of the portfolio selection algorithms, universal or otherwise, required parameter choices to be made. For EG, the value of η was taken to be We implemented the ONS from [2] with the following parameter settings: η = 0, β = 1 and δ = 1 8. Anticor was used with a window length of w = 30 and is hence referred to as Anticor 30. We also experimented with different window lengths as well as a Buy-and-Hold strategy with window lengths from 2 to 30, referred to as BAH(Anticor 3 0). AdaptiveFTL r was used with r = 5. For, MA EG the value of η used was taken to be greater than 1. MA ONS was run with β = 1 16 Results: Table 1 presents the monetary returns in dollars of the universal and non-universal algorithms. Amongst the non-universal algorithms, it includes results for UCRP, Anticor 30, and AdaptiveFTL 5. The winner and runner-up for each market dataset appears in bold-face. Initial investment for all the algorithms is $1. MA EG and Anticor 30 are the clear winners for all the four datasets. In all cases, Anticor 30 is the best performing method, but it is not a universal algorithm. MA EG and MA ONS are both universal algorithms whose empirical performance is close to that of Anticor, and is always orders of magnitude better than the existing universal algorithms, such as UP, EG, and ONS. An interesting observation is that even on the 3 A direct implementation of the UP algorithm is exponential in the number of stocks [12]. Blum and Kalai [4] proposed an approximate implementation based on the uniform random sampling of the portfolio simplex which in worst case, is also exponential in the number of stocks. The UP algorithm code used for the experiments is designed based on [4]. 9

12 Table 1: Monetary returns in dollars (per $1 investment) of universal and non-universal algorithms Algorithm NYSE SP500 NYSE 1 SP500 1 UP EG ONS UCRP Anticor , ,799, AdaptiveFTL l MA EG 482, , MA ONS 424, NYSE 1 dataset, where UP, EG and ONS actually end up losing money (due to the intrinsic nature of the dataset), the meta algorithms and Anticor, still manage to make profits. Table 2 presents the Annual Percentage Yields(APY) of the algorithms on the 4 datasets. The APY percentages were calculated using the standard asymptotic formula (for large number of years) ) APY = (exp 1 Tyears S Tyears (26) where T years is the total number of years of investment which is roughly 22-years for the NYSE and 13-years for the S&P500 datasets respectively, S Tyears is the wealth gathered at the end of T years by an algorithm. As expected, ANTI 30 has the highest APY for all the datasets, with MA EG with MA ONS following close behind. On the S&P 500 dataset, while ANTI 30 achieves an APY of almost 221%, MA EG s APY is 176%, which is far past ONS s APY of 57%, the best amongst the universal algorithms. Table 2: Annual Percentage Yield Algorithm NYSE SP500 NYSE 1 SP500 1 UP EG ONS UCRP Anticor AdaptiveFTL l MA EG MA ONS Figures 2 and Figure 3 visually depict the performance of the algorithms in terms of the wealth growth. Due to the exceptionally large wealth growth factor of Anticor 30 and the meta algorithms, Figures 2(a) and (b)and Figure 3(b) show the logarithmic wealth growth. Additional Experiments: Based on the success of the meta algorithms based on good heuristics, we ran additional experiments to investigate two aspects: (i) Can we design even better heuristics as base algorithms and consequently improve the performance of the meta algorithms? and (ii) Are there heuristic meta algorithms which have no guarantees but outperform MA EG and MA ONS empirically? The experiments reported above worked with a fixed window size w for Anticor algorithm which was taken to be 30. Since it is not possible to ascertain what window size of Anticor algorithm will work best for a dataset, 10

13 we introduce a variant of the Anticor called the BAH(Anticor W ), which uses a Buy-and-Hold strategy to combine multiple Anticor algorithms with different window sizes w W. The BAH(Anticor W ) was found to perform better than the original Anticor w. Figure 4 shows that BAH(Anticor 30 )) beats Anticor 30 and of course EG, UP and ONS by a good margin on the NYSE as well as the S&P 500 dataset. BAH(Anticor W ) was added to the pool of existing base algorithms for the meta algorithms. We ran the meta algorithms MA EG and MA ONS with BAH(Anticor 30 ) in the pool. Figure 5 shows that the wealth achieved by MA EG and MA ONS with BAH(Anticor 30 ) is almost as much as BAH(Anticor 30 ) itself. Further, both meta algorithms now outperform Anticor 30. Since Anticor performed well as a base algorithm, we ran experiments using Anticor as a meta algorithm. MA Anticor simply redistributes wealth among the pool of base algorithms following the Anticor updates based on the wealth relatives of the individual algorithms. We experimented with different window lengths, and report results with a window length of 5, as this version was observed to perform reasonably well. The performance of MA Anticor was seen to decrease as the window length was increased beyond 10 days. MA Anticor had no observed performance improvement over MA EG and MA ONS on both the NYSE and S&P500 dataset. Figure 5 shows that the performance of the MA Anticor is actually inferior than both MA EG and MA ONS. Interestingly, the reversal to mean strategy does not seem as effective when run on base algorithms. The marked performance improvement with the Buy-and-Hold strategy over the Anticor algorithm with different window lengths inspired the final set of experiments where a Buy-and-Hold strategy was used as a meta algorithm. We ran a Buy-and-Hold version called MA BAH with UP, EG, ONS, AdaptiveFTL 5, Anticor 30 and BAH(Anticor 30 ) as base Algorithms. The results have been plotted in Figure 5. We observe that although multiplicative wealth gain of MA BAH is more than Anticor 30, but it is less than that of BAH(Anticor 30 ). More interestingly, MA EG and MA ONS both outperform MA BAH. Moving money to well performing algorithms explicitly, as done by MA EG and MA ONS, seems to be more effective than simply holding, as done by MA BAH, or moving money out assuming reversal to the mean, as done by MA Anticor. The experiments reveal interesting differences among effectiveness of strategies at the level of base algorithms where one works with stocks, and at the level of meta algorithms where one works with algorithms. 5 Conclusions In this paper, we have presented new meta algorithms for portfolio selection. The key motivation behind our work was the poor practical performance of universal algorithms such as UP, EG, and ONS. We wanted to take advantage of non-universal heuristics but at the same time hold on to the theoretical guarantees of the universal algorithms. We presented new meta algorithms which maintain a pool of base algorithms and move money around using the EG-update and ONS-update respectively. With the help of theoretical analysis we were able to show that the meta algorithms will be competitive with the best base algorithm. Also, if we include a universal algorithm in the pool, the meta algorithms can be shown to be competitive with the best CRP in hindsight (universal). With a universal algorithm in the pool, MA EG, achieved O( T ) regret, while MA ONS achieved O(log T ) regret w.r.t. the best CRP. This opens the door to including any number of good heuristics in the pool of base algorithms and still guarantee low regret. By comprehensive experiments on the NYSE and the S&P500 datasets and their variations we were able to exhibit the overwhelming performance improvement of the meta algorithms over the existing universal algorithms such as UP, EG, and ONS. The meta algorithms presented in this paper are the first of its kind in combining exceptional empirical performance with strong theoretical bounds, truly bringing the best of both worlds together. The existing literature on online portfolio selection algorithms [6, 11, 2, 1], does not take into account the commission one has to pay while trading. Most of these algorithms trade every stock every day which is not practical as one can incur huge amount of commission costs. As part of our future work, we would like to investigate if a sparse version of the meta algorithm can take care of commissions and yet achieve 11

14 good empirical performance. Further, the current models for on-line portfolio selection do not model risk. Modeling risk and taking account of volatility of stocks is another interesting direction for our future work. Acknowledgements: This research was supported by NSF CAREER grant IIS , and NSF grants IIS and IIS References [1] V. Gogan A. Borodin, R. El-Yaniv. Can we learn to beat the best stock. Journal of Artificial Intelligence Research, 21: , [2] A. Agarwal, E. Hazan, S. Kale, and R. Schapire. Algorithms for portfolio management based on the newton method. Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 9 16, [3] A. Blum and A. Kalai. Universal portfolios with and without transaction costs [4] A. Blum and A. Kalai. Universal portfolios with and without transaction costs [5] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, [6] T. Cover. Universal portfolios. Mathematical Finance, 1:1 29, [7] T. Cover and E. Ordentlich. Universal portfolios with side information. IEEE Transactions of Information Theory, 42: , [8] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, [9] E. Hazan, A. Agarwal, and S. Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3): , [10] E. Hazan, A. Kalai, Satyen Kale, and A. Agarwal. Logarithmic regret algorithms for online convex optimization. Proceedings of the 19th Annual Conference on Learning Theory, pages , [11] D. Helmbold, E. Scahpire, Y. Singer, and M. Warmuth. Online portfolio setection using multiplicative weights. Mathematical Finance, 8(4): , [12] A. Kalai and S. Vempala. Efficient algorithms for universal portfolios. Journal of Machine Learning Research, 3(3): , [13] J. L. Kelly. A new interpretation of information rate. Bell Systems Technical Journal, 35: , [14] J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1 64, [15] H. Markowitz. Portfolio selection. Journal of Finance, 7:77 91, [16] E. Ordentlich T.M. Cover. Universal portfolio with side information. IEEE TRansactions on Information Theory, 42(2),

15 A Analysis of MA ONS We present an analysis of the ONS update which leads to better bounds compared to that in the literature [2], in particular in terms of the dependence on the number of stocks n. Since w t is the distribution of wealth among the m base algorithms, w t m, the m-simplex. Let and g t (w) = f t (w). If w t is the wealth distribution at time t, then let f t (w) = log(w T r t ) (27) t = f t (w t ) = 1 w T t r t r t. (28) Note that the second derivative 2 1 f t (w t ) = (wt T r t ) r tr T 2 t = t T t. (29) Consider the auxiliary function which satisfies h t (w t ) = log(w T t r t ) = f t (w t ). h t (w) = log(w T t r t ) + T t (w w t ) β 2 ( T t (w w t )) 2, (30) Lemma 1 For any β l 16 < 1/2, f tw) h t (w), i.e., log(w T r t ) log(w T t r t ) + T t (w w t ) β 2 (w w t) T t T t (w w t ). (31) ( ) 2β. 1 Proof: Let g t (w) = exp( 2βf t (w)) = w T r t Note that gt (w) = 2β f t (w) exp( 2βf t (w)). Further, since β < 1/2, implying g t (w) is concave. Hence, 2 g t (w) = 2β(2β 1) f t (w) f t (w) T exp( g t (w)) 0, g t (y) g t (w) + (y w) T g t (w) exp( 2βf t (y)) exp( 2βf t (w))(1 2β(y w) T f t (w)). Taking log on both sides and rearranging, we have f t (y) f t (w) 1 2β log(1 2β(y w)t f t (w)). From Holder s inequality, we have 2β(w y) T f t (y) 2β w y 1 f t (y) 2β( w 1 + y 1 ) rt y T r t 2β 2 l 1 4 since β l 16. For z 1/4, log(1 z) z z2. Using the above inequality with z = β(y x) T f t (w), we obtain f t (y) f t (w) + (y w) T f t (w) + β 2 ((y w)t f t (w)) 2. Setting y = w t and rearranging terms we obtain f t (w) f t (w t ) + (w w t ) T t β 2 (w w t) T t T t (w w t ). 13

16 That completes the proof. Since log(w T r t ) log(wt T r t ) = h t (w, w t ) h t (w t, w t ) T t (w w t ) β 2 (w w t) T t T t (w w t ) it is sufficient to upper bound the RHS. Following [3], we define t A t = τ T τ + I = A t 1 + t T t. (32) τ=1 Let φ t (x) = w T A t x. Note that φ t (w) is convex, and the corresponding Bregman divergence is given by d φt (w, y) = (w y) T A t (w y). (33) The Online Newton Step (ONS) algorithm proceeds by taking a Newton step from the current solution x t and then projecting it back to the simplex m. Let y t+1 denote the result of an unconstrained Newton step: y t+1 = w t 1 β A 1 t t. (34) Then, the next wealth distribution x t+1 is the Bregman projection of y t+1 into the simplex, i.e., w t+1 = argmin w d φ (w, y t+1 ). We are now ready to state the main regret result for ONS: Theorem 3 For any sequence of wealth relatives r t with r t (j) [l, 1], for any β l/16, and the no junk algorithm assumption, the ONS algorithm has the following regret: max log(w T r t ) log(wt T r t ) m ( ) mt w β log l β. (35) In particular, for β = l/16, the regret is given by max w t t log(w T tr t ) log(w T r t ) 16m l ( ) mt log l l 4. (36) Note that the regret bound is stronger than the existing bound on ONS for the portfolio selection problem [2]. In particular, the regret bound in [?] has a dependency on n 3/2 whereas the bound above has a dependency on n. Thus, while the algorithm is still based on ONS, our analysis leads to a mildly better regret bound. We need the following linear algebraic result (Lemma 11 in [3]) for our proof: Lemma 2 Let t = f t (w t ) = 1 w r t r t with r t (i) [l, 1]. Further, let A t = t τ=1 τ τ + I. Then, ( ) mt T t A 1 t t m log l (37) 14

17 Proof of Theorem: Let w be the best wealth distribution in hindsight. Recall that f t (w ) f t (w t ) T t (w w t ) β 2 (w w t) T t T t (w w t ), (38) where we are going to denote the RHS as R t, the regret at the t(th step. From the unconstrained update of the ONS algorithm, note that A t y t+1 = A t x t 1 β t t = β( φ t (y t+1 ) φ t (w t )). From the three point property of Bregman divergences, we know for any vectors x, y, z, For any w, (x y) T ( φ(z) φ(y)) = d φ (x, y) d φ (x, z) + d φ (y, z). (39) (w w t ) T t = β(w w t ) T ( φ t (y t+1 ) φ t (w t )) = β[d φt (w, w t ) d φt (w, y t+1 ) + d φt (w t, y t+1 ) β[d φt (w, w t ) d φt (w, w t+1 ) + d φt (w t, y t+1 )], where the last inequality follows from the fact x t+1 is the Bregman projection of y t+1 onto so that d φt (w, w t+1 ) d φt (w, y t+1 ). Now, note that d φ (w t, y t+1 ) = (w t y t+1 ) T A t (w t y t+1 ) = 1 β 2 T t A 1 t A t A 1 t t = 1 β 2 T t A 1 t t. Replacing the other Bregman divergences according to definition, summing over t = 1,..., T and simplifying, we obtain (w w t ) T t 1 β β 2 (w w T +1 )A T (w w t+1 ) T t A 1 t t + β 2 (w w 1 ) T A 1 (w w 1 ) + β 2 (w w t ) T (A t A t 1 )(w w t ) t=2 1 β T t A 1 t t + β (w w t ) T t T t (w w t ) + β(w w 1 )(A 1 1 T 1 )(w w 1 ), where we have used A t A t 1 = t T t and (w w T +1 ) T A T (w w T +1 ) 0. By transferring the middle term to the LHS, we obtain an expression for T R t. In particular, f t (w ) f t (w t ) R t 1 β T t A 1 t t + β(w w 1 )(A 1 1 T 1 )(w w 1 ). Following Lemma 2, we have an upper bound on the first term. Focusing on the second term, we note that A 1 1 T 1 = I, and so w w 1 2 ( w w 1 1 ) 2 ( w 1 + w 1 1 ) 2 = 4. Plugging these back in, the regret is given by f t (w ) f t (w t ) m ( ) mt β log l β. (40) That completes the proof. 15

18 Logarithmic Wealth Growth EG UP ONS Anticor Wealth Gathered on the NYSE dataset Logarithmic Wealth Growth Year EG UP ONS Anticor (a) NYSE Wealth Gathered on the S&P500 dataset Year (b) S&P 500 Figure 1: Wealth gathered by Anticor exceeds UP, EG, and ONS on NYSE and S&P500 datasets. 16

19 Logarithmic Wealth Growth EG UP UCRP AdaptiveFTL 5 Anticor 30 ONS MA EG MA ONS Monetary returns on the NYSE dataset Logarithmic Wealth Growth Year EG UP UCRP AdaptiveFTL 5 Anticor 30 ONS MA EG MA ONS (a) Monetary returns on NYSE Monetary returns on the S&P500 dataset Year (b) Monetary returns on S&P500 Figure 2: Monetary returns with the meta algorithms, MA EG and MA ONS is competitive with the best base algorithm (Anticor 30 ) on original datasets (best viewed in color). 17

20 Wealth Growth EG UP UCRP AdaptiveFTL 5 ANTI 30 ONS MA EG MA ONS Monetary returns on reverse NYSE Logarithmic Wealth Growth Year EG UP UCRP AdaptiveFTL 5 Anticor 30 ONS MA EG MA ONS (a) Monetary returns on NYSE 1. Monetary returns on the reverse S&P Year (b) Monetary returns on S&P500 1 Figure 3: Monetary returns of the base algorithms and meta algorithms on reverse datasets (best viewed in color). 18

21 Logarithmic Wealth Growth EG UP ONS Anticor 30 BAH(Anticor 30 ) Wealth Gathered on the NYSE dataset Year (a) Monetary returns on NYSE. Logarithmic Wealth Growth EG UP ONS Anticor BAH(Anticor 30 ) Wealth gathered on the S&P500 dataset Year (b) Monetary returns on S&P500 Figure 4: Monetary returns of BAH(Anticor 30 ) for $1 investment, exceeds UP, EG, ONS and Anticor 30 (best viewed in color). 19

22 Logarithmic Wealth Growth Anticor 30 BAH(Anticor 30 ) MA EG MA ONS MA BAH MA Anticor5 Monetary returns on the NYSE dataset Logarithmic Wealth Growth Year Anticor 30 BAH(Anticor 30 ) MA EG MA ONS MA BAH MA Anticor5 (a) Monetary returns on NYSE. Monetary returns on the S&P500 dataset Year (b) Monetary returns on S&P500 Figure 5: Monetary returns of the meta algorithms(ma EG, MA ONS, MA Anticor, MA BAH ) for $1 investment, when BAH(Anticor 30 ) is added to the pool of base algorithms. Multiplicative wealth gain of MA EG, MA ONS and MA BAH exceeds that of Anticor 30 (best viewed in color). 20

Meta Optimization and its Application to Portfolio Selection

Meta Optimization and its Application to Portfolio Selection Meta Optimization and its Application to Portfolio Selection Puja Das Dept of Computer Science & Engg Univ of Minnesota, win Cities pdas@cs.umn.edu Arindam Banerjee Dept of Computer Science & Engg Univ

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

Efficient Algorithms for Universal Portfolios

Efficient Algorithms for Universal Portfolios Efficient Algorithms for Universal Portfolios Adam Kalai CMU Department of Computer Science akalai@cs.cmu.edu Santosh Vempala y MIT Department of Mathematics and Laboratory for Computer Science vempala@math.mit.edu

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Online Learning Repeated game: Aim: minimize ˆL n = Decision method plays a t World reveals l t L n l t (a

More information

Internal Regret in On-line Portfolio Selection

Internal Regret in On-line Portfolio Selection Internal Regret in On-line Portfolio Selection Gilles Stoltz (gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi (lugosi@upf.es)

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

0.1 Motivating example: weighted majority algorithm

0.1 Motivating example: weighted majority algorithm princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems

More information

Quasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization. Mark Franklin Godwin

Quasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization. Mark Franklin Godwin Quasi-Newton Algorithms for Non-smooth Online Strongly Convex Optimization by Mark Franklin Godwin A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

Portfolio Optimization

Portfolio Optimization Statistical Techniques in Robotics (16-831, F12) Lecture#12 (Monday October 8) Portfolio Optimization Lecturer: Drew Bagnell Scribe: Ji Zhang 1 1 Portfolio Optimization - No Regret Portfolio We want to

More information

Internal Regret in On-line Portfolio Selection

Internal Regret in On-line Portfolio Selection Internal Regret in On-line Portfolio Selection Gilles Stoltz gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi lugosi@upf.es)

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Sequential Investment, Universal Portfolio Algos and Log-loss

Sequential Investment, Universal Portfolio Algos and Log-loss 1/37 Sequential Investment, Universal Portfolio Algos and Log-loss Chaitanya Ryali, ECE UCSD March 3, 2014 Table of contents 2/37 1 2 3 4 Definitions and Notations 3/37 A market vector x = {x 1,x 2,...,x

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

Lecture: Adaptive Filtering

Lecture: Adaptive Filtering ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Dynamic Regret of Strongly Adaptive Methods

Dynamic Regret of Strongly Adaptive Methods Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret

More information

Stochastic and Adversarial Online Learning without Hyperparameters

Stochastic and Adversarial Online Learning without Hyperparameters Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Logarithmic Regret Algorithms for Online Convex Optimization

Logarithmic Regret Algorithms for Online Convex Optimization Logarithmic Regret Algorithms for Online Convex Optimization Elad Hazan 1, Adam Kalai 2, Satyen Kale 1, and Amit Agarwal 1 1 Princeton University {ehazan,satyen,aagarwal}@princeton.edu 2 TTI-Chicago kalai@tti-c.org

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

On-line Variance Minimization

On-line Variance Minimization On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06

More information

Regret Bounds for Online Portfolio Selection with a Cardinality Constraint

Regret Bounds for Online Portfolio Selection with a Cardinality Constraint Regret Bounds for Online Portfolio Selection with a Cardinality Constraint Shinji Ito NEC Corporation Daisuke Hatano RIKEN AIP Hanna Sumita okyo Metropolitan University Akihiro Yabe NEC Corporation akuro

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Tom Bylander Division of Computer Science The University of Texas at San Antonio San Antonio, Texas 7849 bylander@cs.utsa.edu April

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United

More information

1 Review and Overview

1 Review and Overview DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last

More information

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems 216 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 216, Las Vegas, USA Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Online prediction with expert advise

Online prediction with expert advise Online prediction with expert advise Jyrki Kivinen Australian National University http://axiom.anu.edu.au/~kivinen Contents 1. Online prediction: introductory example, basic setting 2. Classification with

More information

The convex optimization approach to regret minimization

The convex optimization approach to regret minimization The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Information Choice in Macroeconomics and Finance.

Information Choice in Macroeconomics and Finance. Information Choice in Macroeconomics and Finance. Laura Veldkamp New York University, Stern School of Business, CEPR and NBER Spring 2009 1 Veldkamp What information consumes is rather obvious: It consumes

More information

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,

More information

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected

More information

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo! Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training

More information

LEM. New Results on Betting Strategies, Market Selection, and the Role of Luck. Giulio Bottazzi Daniele Giachini

LEM. New Results on Betting Strategies, Market Selection, and the Role of Luck. Giulio Bottazzi Daniele Giachini LEM WORKING PAPER SERIES New Results on Betting Strategies, Market Selection, and the Role of Luck Giulio Bottazzi Daniele Giachini Institute of Economics, Scuola Sant'Anna, Pisa, Italy 2018/08 February

More information

Geometry of functionally generated portfolios

Geometry of functionally generated portfolios Geometry of functionally generated portfolios Soumik Pal University of Washington Rutgers MF-PDE May 18, 2017 Multiplicative Cyclical Monotonicity Portfolio as a function on the unit simplex -unitsimplexindimensionn

More information

Structured Online Learning with Full and Bandit Information

Structured Online Learning with Full and Bandit Information Structured Online Learning with Full and Bandit Information A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Nicholas Johnson IN PARTIAL FULFILLMENT OF THE

More information

Another Look at the Boom and Bust of Financial Bubbles

Another Look at the Boom and Bust of Financial Bubbles ANNALS OF ECONOMICS AND FINANCE 16-2, 417 423 (2015) Another Look at the Boom and Bust of Financial Bubbles Andrea Beccarini University of Münster, Department of Economics, Am Stadtgraben 9, 48143, Münster,

More information

Minimax Policies for Combinatorial Prediction Games

Minimax Policies for Combinatorial Prediction Games Minimax Policies for Combinatorial Prediction Games Jean-Yves Audibert Imagine, Univ. Paris Est, and Sierra, CNRS/ENS/INRIA, Paris, France audibert@imagine.enpc.fr Sébastien Bubeck Centre de Recerca Matemàtica

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Algorithmic Stability and Generalization Christoph Lampert

Algorithmic Stability and Generalization Christoph Lampert Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts

More information

UNIVERSAL PORTFOLIO GENERATED BY IDEMPOTENT MATRIX AND SOME PROBABILITY DISTRIBUTION LIM KIAN HENG MASTER OF MATHEMATICAL SCIENCES

UNIVERSAL PORTFOLIO GENERATED BY IDEMPOTENT MATRIX AND SOME PROBABILITY DISTRIBUTION LIM KIAN HENG MASTER OF MATHEMATICAL SCIENCES UNIVERSAL PORTFOLIO GENERATED BY IDEMPOTENT MATRIX AND SOME PROBABILITY DISTRIBUTION LIM KIAN HENG MASTER OF MATHEMATICAL SCIENCES FACULTY OF ENGINEERING AND SCIENCE UNIVERSITI TUNKU ABDUL RAHMAN APRIL

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Online Prediction: Bayes versus Experts

Online Prediction: Bayes versus Experts Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,

More information

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania

More information

Online Convex Optimization Using Predictions

Online Convex Optimization Using Predictions Online Convex Optimization Using Predictions Niangjun Chen Joint work with Anish Agarwal, Lachlan Andrew, Siddharth Barman, and Adam Wierman 1 c " c " (x " ) F x " 2 c ) c ) x ) F x " x ) β x ) x " 3 F

More information

Online Bounds for Bayesian Algorithms

Online Bounds for Bayesian Algorithms Online Bounds for Bayesian Algorithms Sham M. Kakade Computer and Information Science Department University of Pennsylvania Andrew Y. Ng Computer Science Department Stanford University Abstract We present

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random

More information

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

Online Aggregation of Unbounded Signed Losses Using Shifting Experts

Online Aggregation of Unbounded Signed Losses Using Shifting Experts Proceedings of Machine Learning Research 60: 5, 207 Conformal and Probabilistic Prediction and Applications Online Aggregation of Unbounded Signed Losses Using Shifting Experts Vladimir V. V yugin Institute

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

On the Worst-case Analysis of Temporal-difference Learning Algorithms

On the Worst-case Analysis of Temporal-difference Learning Algorithms Machine Learning, 22(1/2/3:95-121, 1996. On the Worst-case Analysis of Temporal-difference Learning Algorithms schapire@research.att.com ATT Bell Laboratories, 600 Mountain Avenue, Room 2A-424, Murray

More information

Perceptron Mistake Bounds

Perceptron Mistake Bounds Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce

More information

The Free Matrix Lunch

The Free Matrix Lunch The Free Matrix Lunch Wouter M. Koolen Wojciech Kot lowski Manfred K. Warmuth Tuesday 24 th April, 2012 Koolen, Kot lowski, Warmuth (RHUL) The Free Matrix Lunch Tuesday 24 th April, 2012 1 / 26 Introduction

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Week 2 Quantitative Analysis of Financial Markets Bayesian Analysis

Week 2 Quantitative Analysis of Financial Markets Bayesian Analysis Week 2 Quantitative Analysis of Financial Markets Bayesian Analysis Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 October

More information

Variable Metric Stochastic Approximation Theory

Variable Metric Stochastic Approximation Theory Variable Metric Stochastic Approximation Theory Abstract We provide a variable metric stochastic approximation theory. In doing so, we provide a convergence theory for a large class of online variable

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Lecture 19: Follow The Regulerized Leader

Lecture 19: Follow The Regulerized Leader COS-511: Learning heory Spring 2017 Lecturer: Roi Livni Lecture 19: Follow he Regulerized Leader Disclaimer: hese notes have not been subjected to the usual scrutiny reserved for formal publications. hey

More information

Online Submodular Minimization

Online Submodular Minimization Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com

More information