arxiv: v2 [cs.lg] 19 Oct 2018

Size: px
Start display at page:

Download "arxiv: v2 [cs.lg] 19 Oct 2018"

Transcription

1 Learning in Non-convex Games with an Optimization Oracle arxiv: v2 [cs.lg] 19 Oct 2018 Alon Gonen Elad Hazan Abstract We consider adversarial online learning in a non-convex setting under the assumption that the learner has an access to an offline optimization oracle. In the most general unstructured setting of prediction with expert advice, [5] established an exponential gap demonstrating that online learning can be significantly harder. Interestingly, this gap is eliminated once we assume a convex structure. A natural question which arises is whether the convexity assumption can be dropped. In this work we answer this question in the affirmative. Namely, we show that online learning is computationally equivalent to statistical learning in the Lipschitz-bounded setting. Notably, most deep neural networks satisfy these assumptions. We prove this result by adapting the ubiquitous Follow-he-Perturbed- Leader paradigm [7]. As an application we demonstrate how the offline oracle enables efficient computation of an equilibrium in non-convex games, that include GAN (generative adversarial networks) as a special case. 1 Introduction Online learning is a fundamental and expressive model which allows formulation of important and challenging learning tasks such as spam detection, online routing, etc ([2, 4, 10]). A key feature of this model is the ability of the environments to evolve over time, possibly in an adversarial manner. Consequently, this framework can be used to produce much more robust and powerful learners relative to the classic statistical and stationary learning framework. One would expect that such nice properties should come with a price. While it is well-known that any efficient online learner can be transformed into an efficient batch learner [1], it is important to understand to what extent is the online model harder. In this work we focus on the computational perspective while adopting the setting suggested in [5]. hat is, we assume that the learner has an access to the following two oracles: 1. Value oracle: the learner submits a pair of Department of Computer Science, Princeton University Google Brain and Princeton University 1

2 the form (hypothesis, loss function) and the oracle outputs the loss incurred by the hypothesis. 2. Optimization oracle: the learner submits a sequence of loss functions and receives a hypothesis that (approximately) minimize the cumulative loss w.r.t. this sequence. he second oracle might seem very powerful. However, as we know, many (offline) problems that are known to be hard in the worst case admit efficient solvers in practice. Moreover, it enables a systematic comparison between the online and the statistical models. Equipped with these oracles, we would like to compare between the computational complexities of online and statistical learning. he computational complexity is defined as the number of operations required to ensure expected excess risk of at most ɛ in the statistical model and expected average regret of at most ɛ in the online model. he required number of samples (or rounds in the online model) are implicit in the above definitions of the computational complexities. 1.1 Background [5] study our question in the general context of prediction with expert advice. In this setting, on each round t, the learner chooses (possibly in a random fashion) which expert to follow from a list of N candidates. hereafter, the environment assigns losses l t,j, j P rns to the experts. We measure the success of the learner using the notion of regret, which is the difference between the cumulative loss of the learner and that of the best single expert. In the statistical setting, we usually call experts hypotheses. A well-known fact, that follows from Hoeffding s inequality (and the union bound), is that statistical and the computational complexity of statistical learning is polynomial in log N and ɛ 1. In contrast, [5] showed a lower bound on the computational complexity of online learning (even for achieving a constant accuracy) of polypn q. his implies an exponential gap between the online and the statistical complexities. Interestingly, the above gap is completely eliminated if the problem is convex. o illustrate this idea, consider the online shortest path problem. Let G pv, Eq be a directed graph and fix some vertices s, t P V. On each round t, the learner chooses a path from s to t. hereafter, the adversary assigns nonnegative weights the edges. he loss associated with any path is the sum of edges along the path. his problem can be expressed in the framework of learning with expert advice, where each expert corresponds to a path between s and t. However, the number of such paths is typically exponential in E, rendering this reduction inefficient. In their seminal paper, [7] suggested the following algorithm: 1. At each time t and for each edge e in the graph, draw a random noise ν t res according to a suitable distribution with mean 0 and standard deviation of order?. Add this quantity to the cumulative weight of edge e during rounds 1,..., t Let w t be shortest path solution corresponding to the perturbed cumulative weights. It was shown in [7] that the regret of this algorithm is polynomial in E and sublinear in, so the overall complexity is 2

3 polynomial in E and ɛ 1, as in the batch setting. More generally, this approach yields a polypd, ɛ 1 q online algorithm for any family of linear loss functions defined over compact decision set W Ď R d. Even more generally, this approach can be extended to any family of convex loss functions by using a standard linearization technique (e.g., see Section 2.4 in [10]). 1.2 Main Result Naturally, the next obvious question is whether the convexity assumption can be dropped. Our paper answers this question in the affirmative. Before stating our result we specify the modest geomtric assumptions we make throughout the paper. Assumption 1. We assume that the following quantities are polynomial in the dimension d: a) he l 2 -Lipschitz parameter of any loss function. b) he magnitude of any loss function. c) he diameter of the domain W. heorem 1. (Main heorem) Let L be a class of loss functions satisfying Assumption 1. hen Algorithm 1 transforms any polypd, ɛ 1 q batch learner into a polypd, ɛ 1 q online learner. We deduce the following game theoretic result: Corollary 1. (informal) Convergence to equilibrium in two player zero-sum non-convex games is as hard as the corresponding offline best-response optimization problem. We elaborate on this implication and specify it to GANs in Section Overview and echniques Our result is proved by extending the Follow-he-Perturbed-Leader algorithm to the non-convex setting. As detailed below, online learnability requires algorithmic stability between consecutive rounds. In the linear case, FPL ensures stability by linearly perturbing the cumulative loss function. As the analysis of FPL in [7] relies heavily on the fact that the perturbation and the loss function are of the same type, it was not clear whether FPL can be extended to the non-convex case. herefore, a key challenge is to understand how randomness encourages Euclidean proximity between consecutive iterates of FPL in the non-convex setting. We illustrate the main ideas by restricting our attention to the relatively easy 1-dimensional non-convex objective w ÞÑ pσpwxq yq 2, where σpxq maxtx, 0u is the ReLU function. Note that this objective becomes convex when restricted either to the positive or the negative orthant. We show that FPL prevents frequent transitions between positive and negative predictions and consequently stabilize the predictions of the learner. We then proceed and extend this intuition to the full generality of online learning. 3

4 1.4 Additional Related Work Another interesting attempt to investigate settings that admit some structure but are non-convex has been conducted by [3]. hey revisited the experts setting and formulated abstract conditions under which the randomness can be shared between the experts. hey also demonstrated an application to online auctions. Several works have studied GANs in the regret minimization framework (e.g. [9, 8, 6]). We provide the first evidence that achieving equilibrium in GANs can be reduced to the offline problems associated with the players. Since its invention in [7], an extensive study of FPL has yielded efficient variants (e.g. [12]) and new insights. However, its extension to the non-convex setting has not been explored. 2 Preliminaries 2.1 Problem setting We denote the action set (a.k.a. domain or hypothesis class) by W Ď R d and assume that it is compact with diameter D. Let L Ď R W be a class of B- bounded and G-Lipschitz functions. For concreteness, we assume that B, D, G P polypdq. he online model can be described as a repeated game between the learner and a (possibly adversarial) environment. On each round t P r s, the learner decides on its next action w t P W. hereafter, the environment decides on a loss function l t P L. he regret of the learner is defined as ÿ ÿ Regret fi l t pw t q min l t pwq. wpw i 1 t 1 looomooon L pwq he goal of the learner is to attain a vanishing average regret (i.e., Regret op1q). Remark 1. o simplify the presentation, we assume in the sequel that the environment is oblivious, i.e., it chooses its strategy before the game begins. A standard modification enables us to cope with adaptive environments (coping with such environments is crucial for our application to non-convex games). We detail this extension in Section Optimization and value oracles he input to the optimization oracle is a pair pl 1:k, nq and the output is a predictor ŵ satisfying # + tÿ ÿ k l i pŵq n J ŵ ď min l i pwq n J w, wpw i 1 4 i 1

5 Note that we allow a linear perturbation of the standard offline oracle. As we deal with non-convex optimization, adding a linear term usually does not increase the computational complexity of the offline problem. We also emphasize that it is straightforward to extend our development to the case where the offline oracle is only assumed to return an ɛ 1 -approximate solution (where ɛ 1 ď ɛ) Statistical and online complexities For both the statistical and the batch models, we define the overall complexity as: sample complexity ` #of calls to the oracles he (statistical) sample complexity is defined as follows. Let P be an unknown distribution over L. he learner observes a finite sample of size n according to P n and outputs some ŵ P W. Given an accuracy parameter ɛ P p0, 1q, we are interested in bounds on the sample size n for which ErLpŵqs ď min wpw Lpwq`ɛ, wherelpwq E l P lpwq. Given the above oracles, it is easy to verify that the sample complexity is polypd, ɛ 1 q. 2 he sample complexity in the online model is defined as the number of rounds for which the expected average regret of the learner is at most ɛ. herefore, the main question we ask is: is there exists an online learner (in the above oracle model) whose overall complexity is polypd, ɛ 1 q. 2.4 Online to batch conversion he following well-known result due to [1] tells us that the online sample complexity dominates the batch sample complexity. he intuition that online learning is at least as hard as batch learning is formalized by the following online-tobatch conversion. heorem 2. [1] Suppose that A is an online learner with ErRegret s ď ɛp q for any. Consider the following algorithm for the batch setting: given a sample pl 1,..., l n q P n, the algorithm applies A to the sample in an online manner. hereafter, it draws a random round j P rns uniformly at random and returns ŵ w j. hen the expected excess risk of the algorithm, ErLpŵqs Lpw q, is at most ɛp q. 2.5 Online learning via stability he main challenge in online learning stems from the fact that the learner has to make decision before observing the adversarial action. Intuitively, we expect that the performance after shifting the actions of the learner by one step 1 We actually do not use the value oracle assumed in [5]. Formally, the input of this oracle is a pair pw, lq P W ˆ L and its output is lpwq. 2 his can be done various methods but perhaps the simplest is to consider the finite class obtained by covering the domain with l 1 -balls of radius G{2 (whose size is exponential in d and polynomial in the rest of the parameters), and applying the standard Oplnp H {ɛ 2 q bound for any finite class H. 5

6 (i.e. considering the loss l t pw t`1 q rather than l t pw t q) to be optimal. his view suggests that online learning is all about balancing between optimal performance w.r.t. previous rounds and ensuring stability between consecutive rounds. As we shall see, this is exactly the role of the random linear perturbation. In a sense n should be thought as a random regularizer. he next lemma provides a systematic approach for analyzing Follow-the-Regularized-Leader-type algorithms. Lemma 1. (FL-BL) he regret of the online learner is at most ErRegret s ď ÿ Erl t pw t q l t pw t`1 qs ` Ern J pw 1 w qs, i 1 where w argmint ř t 1 l ipwq : w P Wu. 2.6 he exponential distribution We use the following properties of the exponential distribution. Lemma 2. Let X be an exponential random variable with parameter η. 3 he following properties hold: a) for any s P R, P px ě sq expp ηsq. b) Memorylessness: for any s, q P R, P px ě s ` q X ě qq P px ě qq. c) if X 1,..., X d are i.i.d. with X i Exppηq, then Er}pX 1,..., X d q} 8 s ď η 1 plogpdq ` 1q. 3 FPL for Non-convex Losses: Warmup o illustrate the main ideas and provide some intuition, we consider the following 1-dimensional problem corresponding to a neural network with one hidden layer. Let W r 1, 1s be the decision set. Letting X r 1, 1s, Y r0, 1s, each l P L is of the form lpwq : l x,y ppwxq` yq 2, (where pzq` maxtz, 0u). he non-convexity of this objective is apparent in Figure 1 (where we set x, y 1). It can be verified that the minimizer of the cumulative loss after t rounds is either w t,` or w t,, where w ` argmint ÿ t:x tě0 l t pwqu, w argmint ÿ t:x tă0 l t pwqu For instance, if y t 1 for all t and x t alternates between `1 and 1, then w also alternates between w ` and w. As we explained above, our challenge now is to stabilize the output of the algorithm. We next explore several candidate approaches. 3 hat is, X has density ppxq η expp ηxq. 6

7 Figure 1: Square loss for one dimensional ReLU. Here we set x y Why standard approaches do not work? A common approach which works well in the convex setting is to apply l 2 - regularization, i.e., to let w t be the minimizer of ř iăt l ipwq ` η}w} 2 for some parameter η. However, the regularization has the effect of pushing the solution towards zero, which does not make sense in our case (for the alternating instance discussed above, any w with small magnitude misclassifies both the negative and the positive examples). Noting that stabilizing the algorithm amounts to deciding whether w t is positive or negative, we can reduce the problem to the experts setting by considering a positive and negative expert. his approach works well on our case but does not seem to extend to higher dimensions. 3.2 One-dimensional Non-convex FPL Although the loss function l x,y is not linear, we can still analyze the effect of adding a linear perturbation to the cumulative loss. Concretely, let ν be a d- dimensional i.i.d. random vector, whose coordinates are drawn according to the exponential distribution with parameter η α ą 0. he input to the offline oracle consists of a pair pl 1:t, νq and the output w t is assumed to satisfy # + tÿ w t P argmin l i pwq νw : w P W. iăt As we show below, this random modification reduces the probability of switching from positive predictors to negative predictors. Suppose that w t ě β ą 0, for some β P p0, 1q to be defined later. We next upper bound the probability 7

8 that w t`1 ă 0. By definition of w t, p@w ă 0q L t 1 pw t q νw t ď L t 1 pwq νw ñ p@w ă 0q νpw t wq ě L t 1 pw t q L t 1 pwq ñ ν ě L t 1 pw t q L t 1 pwq max : Z t 1 wpr 1,0q w t w Similar argument (using the definition of w t`1 ) implies that w t`1 remains nonnegative if 4 L t pw t q L t pwq ν ě max wpr 1,0q w t w Since the loss is B-bounded and w t w ě β for all w ă 0, it suffices to require that ν ě Z t 1 ` 2B β Using the memorylessness of the exponential distribution, we have that P rrν ě Z t 1 ` 2B β ν ě Z t 1 s P rrν ě 2B β s expp 2B β α q ñ P rrw t`1 ě 0 w t ě β s ě 1 expp 2B β α q ě 2B β α Similar argument can be applied for the bounding the probability that w t`1 ą 0 given that w t ď β. We can proceed to analyze the regret of the algorithm using the FL-BL lemma (a more rigorous proof is given in the next section). Note that B, G and D are constants. Let α 2{3, β 1{3. he above analysis shows that the probability for shifting between positive and negative prediction is at most Op 1{3 q. For consecutive rounds in which there is no transition between positive and negative prediction, we can use the Lipschitzenss assumption to deduce that l t pw t q l t pw t`1 q ď Op β q Op 1{3 q. herefore, the total instability (as defined in the FL-BL lemma) is Op 1{3 q Op 2{3 q. Since Erνs α, we obtain that the noise term in the FL-BL lemma is Op 2{3 q. Overall, the regret is Op 2{3 q which translates into a sample complexity bound of Op1{ɛ 3 q. In the next section we consider the general d-dimensional case while making the above arguments more formal. Note that in the d-dimensional case we might be worried about exponentially many transitions. he key idea will be to apply the above argument for each dimension separately. Finally, a more careful analysis allows us to improve the sample complexity bound from Op1{ɛ 3 q to Op1{ɛ 2 q. 4 General Analysis of Non-convex FPL We now consider the general Lipschitz-bounded case and analyze the variant of FPL presented in Algorithm 1. Our analysis below completes the proof of our main theorem. Fix the predictor w t chosen by the algorithm at time t and also 4 Simply since w t forms a better candidate. 8

9 Algorithm 1 Non-convex FPL Parameter: η ą 0 Draw a random vector ν pexppηqq d Prediction at time t: # + tÿ w t P argmin l i pwq ν J w : w P W iăt, fix an axis k P rds. Let U t,k tw P W : w t,k ě w k u. Our starting point is a simple extension of the analysis from the previous part; by using the definition of w t we obtain that L t 1 pw t q L t 1 puq ` řj k ν k ě max ν jpu j w t,j q fi z t,k. upu w t,k t,k u k We let E t,k be the conditional expectation given the identity of w t and z t,k. Lemma 3. We have E t,k rw t,k w t`1,k s ď Opη lnpη 1 qq. Proof. Based on the definition of w t and z t,k, for any u P U t,k, we have that w t`1 u provided that ν k ě z t,k ` ltpw t q l t puq w t,k u k (simply note that according to the definition of w t`1, w t forms a better candidate than u ). Using that the loss is B-bounded, we deduce that if ν k ě z t,k ` 2B{ for some ą 0, then w t,k w t`1,k ď. Equivalently, given that ν k q ě z t,k, we have that " * 2B w t,k w t`1,k ď min D, fi pz t,k q q z t,k 9

10 herefore, E t rw t,k w t`1,k s ď ż 8 qą0 ż 8 qąz tk ż 8 qą0 ż 8 ď ηd qą0 ż 1 η expp ηqq pz t,k q1 qązt,k expp ηz t,k q η expp ηqq pz t,kq expp ηz t,k q dq dq η expp ηpz t,k ` qqq mint2b{q, Du expp ηz t,k q η expp ηqq mint2b{q, Du dq qą0 ď ηd ` 2ηB ż η 1 ż expp ηqq 8 expp ηqq 1 dq ` 2ηB dq ` 2ηB dq q 1 q q q η 1 ż η 1 q 1 1 dt ` 2ηB q ż 8 q η 1 ď ηd ` 2ηB lnpη 1 q ` 2ηB expp ηqq ď ηd ` 2ηB lnpη 1 q ` 2ηB. dq expp ηqq η 1 dq Similar argument (this time using the optimality of w t at time t) can be used to derive the analogous inequality. Lemma 4. For any k P rds, E t,k rw t`1,k w t,k s ď Opη lnpη 1 qq. Summing over all coordinates k P rds yields the bound. Er}w t w t`1 } 1 s Oppolypdqη logpη 1 qq lomon η«p q 1{2 Oppolypdq 1{2 logp qq We are ready to conclude our main result. Proof. (of heorem 1) Using that Er}ν} 8 s ď η 1 plog d ` 1q and }w} 1 ď D, we can apply the FL-BL lemma (Lemma 1) to deduce the desired regret bound. Online-to-batch conversion yields the following corollary. Corollary 2. he sample complexity of non-convex online learning as defined above is O polypdq ɛ Oblivious vs. adaptive environments Our current analysis assumes that the environment is oblivious in the sense that its actions are chosen in advance. o cope with adaptive environments we apply a simple (standard) modification to Algorithm 1; instead of drawing a 10

11 single noise vector at the beginning of the game, the algorithm draws a fresh i.i.d. random vector on every round. he algorithm is detailed in Algorithm (2). 5 Algorithm 2 Non-convex FPL for adaptive adversaries Parameter: η ą 0 for t 1 to t do Draw a random vector ν t pexppηqq d Prediction at time t: # + tÿ w t P argmin l i pwq νt J w : w P W iăt, end for heorem 3. Algorithm 2 enjoys the same regret bound as Algorithm 2. Consequently, our main result (heorem 1) holds also in the non-oblivious setting. Proof. Clearly, the two algorithms suffer the same expected loss in the oblivious setting. Hence, Algorithm 2 attains the same regret bound as Algorithm 1. Since the distribution over the actions of Algorithm 2 is completely determined by the the loss sequence pf 1,..., f t 1 q, Lemma 4.1 in [2] implies the same regret bound against adaptive environments. 5 Implications to Non-convex Games Let F : X ˆ Y Ñ R, where X, Y Ď R d are compact with diameter at most D. he x-th player wishes to minimize F and whereas the y-th player wishes to maximize F. We assume that for all x P X and y P Y, both F p, yq and F px, q are G-Lipschitz and B-bounded. A known approach for achieving equilibrium is to apply (for each of the players) an online method with vanishing average regret. Precisely, on each round t both players choose a pair px t, y t q which induces the losses F px t, y t q and F px t, y t q, respectively. Finally, we draw a random index rjs P r s and output the pair pˆx, ŷq fi px j, y j q. By endowing the players with access to an offline oracle and playing according to non-convex FPL we can reach approximate equilibrium. heorem 4. Suppose that both the x-player and the y-player have an access to an offline oracle and play according to non-convex FPL (Algorithm 2). Given ɛ ą 0, let P polypdq{ɛ 2 such that the expected average regret of non-convex FPL is at most ɛ. hen, pˆx, ŷq forms an ɛ-approximated equilibrium, i.e., for any x P X and y P Y, ErF pˆx, ŷqs ď ErF px, ŷqs ` ɛ, ErF pˆx, ŷqs ě ErF pˆx, yqs ɛ. 5 In fact, the vanilla FPL also draws a fresh random noise vector on every round. 11

12 Note that the players can use their offline oracle to amplify their confidence and achieve an equilibrium with high probability. Proof. For all y P Y, 1 ErF pˆx, yqs E ñ ErF pˆx, ŷqs ď 1 Similarly, for all x P X, 1 ErF px, ŷqs E ñ ErF pˆx, ŷqs ě 1 ÿ j F pxt, yq ÿ F pxt, y t q ÿ j F px, yt q ÿ F pxt, y t q 1 ď E j ` ɛ 1 ě E j ɛ. ÿ j F pxt, y t q ` ɛ ÿ j F pxt, y t q ɛ 5.1 Implication to GANs In particular, we consider the case where the x-th player is a generator, who produces synthetic samples (e.g. images), whereas the y-th player acts as a discriminator by assigning scores to samples reflecting the probability of being generated from the true distribution. Formally, by choosing a parameter x P X and drawing a random noise z, the x-th player produces a sample denote G x pzq. Conversely, the y-th player chooses a parameter y P Y and assign the score D y pg x pzqq P r0, 1s to the sample G x pzq. he function F usually corresponds to the log-likelihood of mistakenly assigning an high score to a synthetic example and vice versa. It is reasonable to assume that F is Lipschitz and bounded w.r.t. the network parameters. As a result, efficient convergence to GANs is established by assuming an access to an offline oracle. 6 Discussion Our work establishes a computational equivalence between online and statistical learning in the non-convex setting. In a sense, we shed a light on the hardness result of [5] by demonstrating that online learning is significantly more difficult than statistical learning only when no structure is assumed. One interesting direction for further investigation is to refine the comparison model and study the polynomial dependencies more carefully. For instance, in the statistical setting, one can obtain dimension-independent bounds on the sample complexity under certain assumptions on the Lipschitz and boundedness parameters (e.g. Rademacher complexity bounds for SVM [11]). It is natural to ask whether one can achieve dimension-independent regret bounds in such settings. 12

13 References [1] Nicolò Cesa-Bianchi, Alex Conconi, and Claudio Gentile. On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions. IEEE ransactions on Information heory, 50: , [2] Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games [3] Miroslav Dudik, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, and Jennifer Wortman Vaughan. Oracle-efficient online learning and auction design. In Annual Symposium on Foundations of Computer Science - Proceedings, [4] Elad Hazan. Introduction to Online Convex Optimization. Foundations and rends R in Optimization, 2(3-4): , [5] Elad Hazan and omer Koren. he Computational Power of Optimization in Online Learning. [6] Elad Hazan, Karan Singh, and Cyril Zhang. Efficient regret minimization in non-convex games. In International Conference on Machine Learning, pages , [7] Adam Kalai and Santosh Vempala. Efficient Algorithms for Online Decision Problems [8] Naveen Kodali, Jacob Abernethy, James Hays, and Zsolt Kira. On Convergence and Stability of GANs [9] Dale Schuurmans and Martin A Zinkevich. Deep learning games. In Advances in Neural Information Processing Systems, pages , [10] Shai Shalev-Shwartz. Online Learning and Online Convex Optimization. Foundations and rends R in Machine Learning, 4(2): , [11] Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning. Cambridge university press, [12] im Van Erven, Wojciech Kot lowski, and Manfred K Warmuth. Follow the leader with dropout perturbations. In Conference on Learning heory, pages ,

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Perceptron Mistake Bounds

Perceptron Mistake Bounds Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

A Drifting-Games Analysis for Online Learning and Applications to Boosting

A Drifting-Games Analysis for Online Learning and Applications to Boosting A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Minimax strategy for prediction with expert advice under stochastic assumptions

Minimax strategy for prediction with expert advice under stochastic assumptions Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

arxiv: v4 [cs.lg] 27 Jan 2016

arxiv: v4 [cs.lg] 27 Jan 2016 The Computational Power of Optimization in Online Learning Elad Hazan Princeton University ehazan@cs.princeton.edu Tomer Koren Technion tomerk@technion.ac.il arxiv:1504.02089v4 [cs.lg] 27 Jan 2016 Abstract

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Online Submodular Minimization

Online Submodular Minimization Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected

More information

Dynamic Regret of Strongly Adaptive Methods

Dynamic Regret of Strongly Adaptive Methods Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

Online Learning: Random Averages, Combinatorial Parameters, and Learnability

Online Learning: Random Averages, Combinatorial Parameters, and Learnability Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago

More information

Move from Perturbed scheme to exponential weighting average

Move from Perturbed scheme to exponential weighting average Move from Perturbed scheme to exponential weighting average Chunyang Xiao Abstract In an online decision problem, one makes decisions often with a pool of decisions sequence called experts but without

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix

More information

An Online Convex Optimization Approach to Blackwell s Approachability

An Online Convex Optimization Approach to Blackwell s Approachability Journal of Machine Learning Research 17 (2016) 1-23 Submitted 7/15; Revised 6/16; Published 8/16 An Online Convex Optimization Approach to Blackwell s Approachability Nahum Shimkin Faculty of Electrical

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems 216 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 216, Las Vegas, USA Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Efficient learning by implicit exploration in bandit problems with side observations omáš Kocák Gergely Neu Michal Valko Rémi Munos SequeL team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

Near-Optimal Algorithms for Online Matrix Prediction

Near-Optimal Algorithms for Online Matrix Prediction JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Stochastic and Adversarial Online Learning without Hyperparameters

Stochastic and Adversarial Online Learning without Hyperparameters Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

From Batch to Transductive Online Learning

From Batch to Transductive Online Learning From Batch to Transductive Online Learning Sham Kakade Toyota Technological Institute Chicago, IL 60637 sham@tti-c.org Adam Tauman Kalai Toyota Technological Institute Chicago, IL 60637 kalai@tti-c.org

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Introduction to Machine Learning (67577) Lecture 3

Introduction to Machine Learning (67577) Lecture 3 Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz

More information

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

4.1 Online Convex Optimization

4.1 Online Convex Optimization CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

A Second-order Bound with Excess Losses

A Second-order Bound with Excess Losses A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Bandit Online Convex Optimization

Bandit Online Convex Optimization March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up

More information

Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection

Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Greedy Algorithms, Frank-Wolfe and Friends

More information

The Computational Power of Optimization in Online Learning

The Computational Power of Optimization in Online Learning he Computational Power of Optimization in Online Learning Elad Hazan Princeton University 35 Olden St, Princeton NJ 08540 ehazan@cs.princeton.edu omer Koren echnion Israel Institute of echnology echnion

More information

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

More information

Online Submodular Minimization

Online Submodular Minimization Journal of Machine Learning Research 13 (2012) 2903-2922 Submitted 12/11; Revised 7/12; Published 10/12 Online Submodular Minimization Elad Hazan echnion - Israel Inst. of ech. echnion City Haifa, 32000,

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Experts in a Markov Decision Process

Experts in a Markov Decision Process University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2004 Experts in a Markov Decision Process Eyal Even-Dar Sham Kakade University of Pennsylvania Yishay Mansour Follow

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904,

More information

Online Bounds for Bayesian Algorithms

Online Bounds for Bayesian Algorithms Online Bounds for Bayesian Algorithms Sham M. Kakade Computer and Information Science Department University of Pennsylvania Andrew Y. Ng Computer Science Department Stanford University Abstract We present

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

Learnability, Stability, Regularization and Strong Convexity

Learnability, Stability, Regularization and Strong Convexity Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

A New Perspective on an Old Perceptron Algorithm

A New Perspective on an Old Perceptron Algorithm A New Perspective on an Old Perceptron Algorithm Shai Shalev-Shwartz 1,2 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc, 1600 Amphitheater

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Extensive Form Abstract Economies and Generalized Perfect Recall

Extensive Form Abstract Economies and Generalized Perfect Recall Extensive Form Abstract Economies and Generalized Perfect Recall Nicholas Butler Princeton University July 30, 2015 Nicholas Butler (Princeton) EFAE and Generalized Perfect Recall July 30, 2015 1 / 1 Motivation

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy Recommended for Acceptance by the Department

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Regret Minimization With Concept Drift

Regret Minimization With Concept Drift Regret Minimization With Concept Drift Koby Crammer he echnion koby@ee.technion.ac.il Yishay Mansour el Aviv University mansour@cs.tau.ac.il Eyal Even-Dar Google Research evendar@google.com Jennifer Wortman

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

Learning Hurdles for Sleeping Experts

Learning Hurdles for Sleeping Experts Learning Hurdles for Sleeping Experts Varun Kanade EECS, University of California Berkeley, CA, USA vkanade@eecs.berkeley.edu Thomas Steinke SEAS, Harvard University Cambridge, MA, USA tsteinke@fas.harvard.edu

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

Machine Learning Theory Lecture 4

Machine Learning Theory Lecture 4 Machine Learning Theory Lecture 4 Nicholas Harvey October 9, 018 1 Basic Probability One of the first concentration bounds that you learn in probability theory is Markov s inequality. It bounds the right-tail

More information

Generative adversarial networks

Generative adversarial networks 14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based

More information

Online Learning for Non-Stationary A/B Tests

Online Learning for Non-Stationary A/B Tests CIKM 18, October 22-26, 218, Torino, Italy Online Learning for Non-Stationary A/B Tests Andrés Muñoz Medina Google AI New York, NY ammedina@google.com Sergei Vassilvitiskii Google AI New York, NY sergeiv@google.com

More information

Online Sparse Linear Regression

Online Sparse Linear Regression JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU

More information

Machine Learning in the Data Revolution Era

Machine Learning in the Data Revolution Era Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United

More information

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality Tim Roughgarden December 3, 2014 1 Preamble This lecture closes the loop on the course s circle of

More information

The convex optimization approach to regret minimization

The convex optimization approach to regret minimization The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision

More information

Bandit Convex Optimization: T Regret in One Dimension

Bandit Convex Optimization: T Regret in One Dimension Bandit Convex Optimization: T Regret in One Dimension arxiv:1502.06398v1 [cs.lg 23 Feb 2015 Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Tomer Koren Technion tomerk@technion.ac.il February

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information