The No-Regret Framework for Online Learning

Size: px
Start display at page:

Download "The No-Regret Framework for Online Learning"

Transcription

1 The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin, Technion 1 / 28

2 Outline 1 The online decision problem 2 Blackwell s Approachability 3 The Multiplicative Weight Algorithm 4 Online convex programming 5 Recent topics N. Shimkin, Technion 2 / 28

3 1. The Online Decision Problem N. Shimkin, Technion 3 / 28

4 Supervised Learning 1.1 Preliminaries: Supervised Learning The basic problem of supervised learning can be roughly formulated in the following terms. Given a sequence of examples (x n, y n ) N n=1, where x n X is the input pattern (or feature vector) and y n Y the corresponding label, find a map h : x X ŷ Y, that will correctly predict the true label y of other (yet unseen) input patterns x. N. Shimkin, Technion 4 / 28

5 Supervised Learning 1.1 Preliminaries: Supervised Learning The basic problem of supervised learning can be roughly formulated in the following terms. Given a sequence of examples (x n, y n ) N n=1, where x n X is the input pattern (or feature vector) and y n Y the corresponding label, find a map h : x X ŷ Y, that will correctly predict the true label y of other (yet unseen) input patterns x. When Y is a discrete (finite) set, this formulation corresponds to a classification problem. If Y = {0, 1} this a binary classification problem, and for Y a continuous set we obtain a regression problem. Statistical learning theory often assumes that the samples (x n, y n ) are i.i.d. samples from a fixed, yet unknown, probability distribution. N. Shimkin, Technion 4 / 28

6 Supervised learning The Online Decision Problem Supervised Learning The quality of prediction is measured by a cost (or loss) function c(ŷ, y). For classification this may be the 0-1 cost function: c(ŷ, y) = 1 {ŷ=y}. In regression, this may be the quadratic cost: c(ŷ, y) = ŷ y 2. N. Shimkin, Technion 5 / 28

7 Supervised learning The Online Decision Problem Supervised Learning The quality of prediction is measured by a cost (or loss) function c(ŷ, y). For classification this may be the 0-1 cost function: c(ŷ, y) = 1 {ŷ=y}. In regression, this may be the quadratic cost: c(ŷ, y) = ŷ y 2. It is sometimes convenient to allow the predictions ŷ to take values in a larger set Ŷ than the label set Y. For example, in the binary classification problems we may want to allow probabilistic predictions ( 80% chance for rain tomorrow ). We therefore consider prediction functions h : X Ŷ and cost c : Ŷ Y R +. The prediction function h is often restricted to some predefined class H. For example, in linear regression, h(x) = w, x, where w is a vector of weights to be tuned. N. Shimkin, Technion 5 / 28

8 Supervised learning The Online Decision Problem Supervised Learning The quality of prediction is measured by a cost (or loss) function c(ŷ, y). For classification this may be the 0-1 cost function: c(ŷ, y) = 1 {ŷ=y}. In regression, this may be the quadratic cost: c(ŷ, y) = ŷ y 2. It is sometimes convenient to allow the predictions ŷ to take values in a larger set Ŷ than the label set Y. For example, in the binary classification problems we may want to allow probabilistic predictions ( 80% chance for rain tomorrow ). We therefore consider prediction functions h : X Ŷ and cost c : Ŷ Y R +. The prediction function h is often restricted to some predefined class H. For example, in linear regression, h(x) = w, x, where w is a vector of weights to be tuned. Below we will variably denote the Euclidean inner product as w, x, w x, or w T x (for column vectors), as convenient. N. Shimkin, Technion 5 / 28

9 Online Learning and Regret 1.2 Online Learning and Regret In online learning, examples are displayed sequentially, and learning takes place simultaneously with prediction. A generic template for this process is as follows. For t = 1, 2,..., observe input x t X predict y t Ŷ observe true answer y t Y suffer cost (or loss) c(ŷ t, y t ) N. Shimkin, Technion 6 / 28

10 Online Learning and Regret 1.2 Online Learning and Regret In online learning, examples are displayed sequentially, and learning takes place simultaneously with prediction. A generic template for this process is as follows. For t = 1, 2,..., observe input x t X predict y t Ŷ observe true answer y t Y suffer cost (or loss) c(ŷ t, y t ) The cumulative cost over T periods is therefore C T = T c(ŷ t, y t ). It is generally required to make this cumulative loss as small as possible, in some appropriate sense. t=1 N. Shimkin, Technion 6 / 28

11 Online learning examples Online Learning and Regret For concreteness, consider the following familiar examples: 1 Weather prediction: Suppose we wish to predict tomorrow s weather. This may involve a classification problem (rain/no-rain), or a regression problem (max/min temperature prediction). In any case, while we may be improving our prediction capability over time, the goal is clearly to provide accurate predictions throughout this period. N. Shimkin, Technion 7 / 28

12 Online learning examples Online Learning and Regret For concreteness, consider the following familiar examples: 1 Weather prediction: Suppose we wish to predict tomorrow s weather. This may involve a classification problem (rain/no-rain), or a regression problem (max/min temperature prediction). In any case, while we may be improving our prediction capability over time, the goal is clearly to provide accurate predictions throughout this period. 2 Spam filter: Here we wish to classify incoming mail as spam / no spam. Some messages x t are displayed to the user to get their true label y t. (The process of choosing which messages to display falls in the area of active learning, which we do not address here. It is also interesting that valuable information can be gained from unlabeled examples; this is addressed within semi-supervised learning.) Here, again, learning takes place online. N. Shimkin, Technion 7 / 28

13 The Arbitrary Opponent Online Learning and Regret In these lectures we do not impose statistical assumptions on the examples sequence. Rather, we refer to this sequence as arbitrary. It is convenient to think of this sequence as chosen by an opponent (often an imaginary one). We may further distinguish between the following cases: 1 An oblivious opponent: Here the example sequence (x t, y t ) is preset, in the sense that it does not depend on the results (ŷ t ) of our prediction algorithm. This would be the case in the weather prediction problem. 2 An adaptive, or adversarial, opponent: Here the opponent may choose future examples based on past predictions of our algorithm. This might be the case in the spam filter example. We will make these assumptions more concrete in the problems discussed below. In either case, we refer to the choice of samples by the opponent as the opponent s strategy. N. Shimkin, Technion 8 / 28

14 The Arbitrary Opponent Online Learning and Regret Either way, it is evident that the cumulative cost will depend on the examples sequence. For example, in areas where it rains every day (or never rains), we expect to approach 100% success rate. While if Rain happens to follow an i.i.d. Bernoulli sequence with parameter q [0, 1], the best we can hope for is an asymptotic relative accuracy of min{q, 1 q}. Thus, it would be advisable to compare the performance of our learning prediction algorithm to ideal, baseline performance. This is where the concept of regret comes in. N. Shimkin, Technion 9 / 28

15 Regret The Online Decision Problem Online Learning and Regret The (cumulative, T -step) regret relative to some fixed predictor h is defined as follows: T T Regret T (h) = c(ŷ t, y t ) c(h(x t ), y t ) t=1 t=1 N. Shimkin, Technion 10 / 28

16 Regret The Online Decision Problem Online Learning and Regret The (cumulative, T -step) regret relative to some fixed predictor h is defined as follows: T T Regret T (h) = c(ŷ t, y t ) c(h(x t ), y t ) t=1 The regret relative to a predictor class H (e.g., the set of all linear predictors) is then defined as Regret T (H) = max h H Regret T (h) = t=1 T c(ŷ t, y t ) min t=1 h H t=1 T c(h(x t ), y t ) In the last term, the best fixed predictor h H is chosen with the benefit of hindsight, i.e., given the entire sample sequence (x t, y t ) up to T. N. Shimkin, Technion 10 / 28

17 The No-Regret Property Online Learning and Regret A learning algorithm is said to have the no-regret property (w.r.t. H) if the regret is guaranteed to grow sub-linearly in T, namely, Regret T (H) = o(t ) (in some appropriate sense), for any strategy of the opponent. N. Shimkin, Technion 11 / 28

18 Prediction 1.3 Prediction An important special case is the problem of sequential prediction. Here the challenge is to predict the next element y t based on the previous elements (y 1,..., y t 1 ) of the sequence, for t = 1, 2,.... This may be viewed as a special case of the previous model, with absent patterns x t. If we assume a statistical structure (e.g., Markovian) on the sequence, the problem becomes that of statistical model estimation and prediction. Prediction of arbitrary sequences with regret bounds has been treated extensively in the Information Theory literature (under the names universal prediction or individual sequence prediction; see Merhav and Feder (1998). N. Shimkin, Technion 12 / 28

19 Prediction with Expert Advice Prediction A related important model is that of prediction with expert advice (Littlestone and Warmuth, 1994). Here we are assisted in out prediction task by a set E of experts (which themselves may be prediction algorithms), among which we wish to follow the best one. N. Shimkin, Technion 13 / 28

20 Prediction with Expert Advice Prediction A related important model is that of prediction with expert advice (Littlestone and Warmuth, 1994). Here we are assisted in out prediction task by a set E of experts (which themselves may be prediction algorithms), among which we wish to follow the best one. The problem is formulated as follows. For t = 1, 2,..., 1 The environment chooses the outcome y t. 2 Simultaneously, each expert e chooses an advice ŷ e,t, which is revealed to the forecaster. 3 The forecaster chooses a prediction ŷ t, after which he observes the true answer y t. 4 The forecaster suffers a loss c(ŷ t, y t ) The goal of the forecaster here is to minimize his regret with respect the best expert. Here the regret is defined as T T Regret T = c(ŷ t, y t ) min c(ŷ e,t, y t ) e E t=1 N. Shimkin, Technion 13 / 28 t=1

21 Repeated Games 1.4 Repeated Games The notion of no-regret strategies, or no-regret play, was introduced in a seminal 1957 paper by Hannan (1957) in the context of repeated matrix games. N. Shimkin, Technion 14 / 28

22 Matrix Games (reminder) Repeated Games Recall that a zero-sum matrix game is defined through a payoff (or cost) matrix Γ = {c(i, j) : i I, j J }, where I is the set of actions of player 1 (PL1, the learner), and J is the set of actions for player 2 (PL2, the opponent). Let p (I) denote a mixed action of PL1, and q (J ) a mixed action of PL2. The expected cost (to PL1) under mixed actions p and q is c(p, q) = i,j p i c(i, j)q j The minimax value of the game (with PL1 the minimizer) is given by v(γ) = min max p (I) q (J ) = max min q (J ) p (I) c(p, q) c(p, q) N. Shimkin, Technion 15 / 28

23 The Repeated Game The Online Decision Problem Repeated Games The repeated game Γ proceeds in stages t = 1, 2,..., where at stage t, PL1 and PL2 simultaneously choose action i t and j t, respectively, which are then observed (and recalled) by both players. A strategy π 1 for PL1 is a map that assigns a mixed action p t (I) to each possible history H t = (i 1, j 1,..., i t 1, j t 1 ) and time t 1, and similarly a strategy π 2 for PL2 assigns a mixed action q t (J ) to each possible history. The actions i i and j t are chosen randomly according to p t and q t. Any pair (π 1, π 2 ) induces a probability measure on H, which we denote by P π1,π 2. We shall be interested in the (long-term) cumulative cost (or loss): C T = T c(i t, j t ) t=1 N. Shimkin, Technion 16 / 28

24 Regret (again) The Online Decision Problem Repeated Games The minimax value of this game (for any fixed T, and for T ) is easily seen to equal v(γ), the value of the single-shot game. However, assuming that PL2 is not necessarily adversarial (and, indeed, not necessary rational in the game-theoretic sense), it begs the question whether PL1 can gain more than the value by adapting to the observed action history of PL2. N. Shimkin, Technion 17 / 28

25 Regret (again) The Online Decision Problem Repeated Games The minimax value of this game (for any fixed T, and for T ) is easily seen to equal v(γ), the value of the single-shot game. However, assuming that PL2 is not necessarily adversarial (and, indeed, not necessary rational in the game-theoretic sense), it begs the question whether PL1 can gain more than the value by adapting to the observed action history of PL2. To this end, define the following (cumulative) regret for PL1, R T = T t=1 c(i t, j t ) min i I T c(i, j t ) (1) The second term on the RHS serves here as our reference level, to which the actual cost is compared. t=1 Naturally, PL1 would like to have a regret as small as possible. N. Shimkin, Technion 17 / 28

26 No-regret Strategies The Online Decision Problem Repeated Games Define the average regret: R T = 1 T R T Definition A strategy π 1 of PL1 is said to have the no-regret property (or be Hannan-consistent) if lim sup T R T 0, P π1,π 2 -a.s. (2) for any strategy π 2 of PL2. More succinctly, we may write this property as R T o(1) (a.s.), or R T o(t ) (a.s.). N. Shimkin, Technion 18 / 28

27 Some More Notations... Repeated Games The RHS of (1) can be written in another convenient form. Let q T = 1 T T t=1 e j t denote the empirical frequency vector of PL2 s actions (here e j (J ) is the mixed action concentrated on action j). Recalling our convention c(i, q) = j c(i, j)q j, it follows that min i I 1 T T t=1 c(i, j t ) = min c(i, q T ) = min c(p, q T ) = c ( q T ) i I p (I) where c (q) = min p c(p, q) is the best-response cost, also known as the Bayes risk of the game. The average regret may now be written more succinctly as R T = C T c ( q T ) N. Shimkin, Technion 19 / 28

28 Repeated Games The no-regret property R T o(1) (a.s.) may now be written as 1 T C T c (ȳ T ) + o(1) (a.s.). Accordingly, a no-regret strategy is sometimes said to attain the Bayes risk of the game. N. Shimkin, Technion 20 / 28

29 Notes on Randomization Repeated Games 1. Necessity of randomization: It is easily seen that no deterministic strategy, i.e., a map i t = f t (h t 1 ), can satisfy the no-regret property. Indeed, an (adaptive) opponent has access to h t 1, and can choose j t the maximize the loss vs. i t. For example, in the binary prediction problem he might choose j t = 1 i t, which obtains a cumulative loss of C T = T and cumulative regret of 1 2T or more. Hence, we use strictly mixed actions p t as part of the learning policy. N. Shimkin, Technion 21 / 28

30 Notes on Randomization Repeated Games 2. Smoothed regret: It is easily seen that the difference d t = c(p t, j t ) c(i t, j t ) is a bounded Martingale difference sequence on F t = σ{h t 1, j t }. Hence, by the CLT, the average difference 1 T T t=1 d t is of order O( T ), and in particular converges to 0 (a.s.). We can therefore define the regret in terms of c(p t, j t ) in place of c(i t, j t ), namely R T = T t=1 c(p t, j t ) min i I T c(i, q t ) This definition allows to establish sample-path (rather than a.s.) bounds on the regret. Henceforth we will use the latter definition. t=1 N. Shimkin, Technion 22 / 28

31 Hannan s No-regret Theorem Repeated Games Theorem (Hannan 57) There exists a strategy π for the learner so that R T c 0 T for any strategy of the opponent and T 1. Here c 0 = ( 3 2 n I ) 1 2 n J span(c). The constant c 0 in Hannan s result was subsequently improved, but not the rate T which is optimal. The proposed strategy was a perturbed FTL (follow the leader) scheme, which we briefly describe next. N. Shimkin, Technion 23 / 28

32 FTL-Type Strategies The Online Decision Problem Repeated Games The FTL (follow the leader) strategy is given by: i t+1 = argmin{ i I (with ties arbitrarily broken). t c(i, j t )} s=1 = argmin{c(i, q t )} = BR( q t ) i I Here the learner uses a best-response action against the empirical frequency of the opponent s actions. This simple rule is also known as fictitious play in the game literature. N. Shimkin, Technion 24 / 28

33 Repeated Games It is easily seen that FTL does not have the no-regret property, even against an oblivious opponent. Consider the binary prediction problem, namely i, j {0, 1}, with c(i, j) = 1 {i=j}. Suppose PL2 chooses the sequence (j, 1, 0, 1, 0,... ), where j is some auxiliary action with c(0, j ) = 0.5, c(1, j ) = 0. In that case FTL yields the sequence (i t ) = (?, 0, 1, 0, 1,... ), which oscillates opposite to PL2 s actions, leading to R T 1 2 T. N. Shimkin, Technion 25 / 28

34 Repeated Games It is easily seen that FTL does not have the no-regret property, even against an oblivious opponent. Consider the binary prediction problem, namely i, j {0, 1}, with c(i, j) = 1 {i=j}. Suppose PL2 chooses the sequence (j, 1, 0, 1, 0,... ), where j is some auxiliary action with c(0, j ) = 0.5, c(1, j ) = 0. In that case FTL yields the sequence (i t ) = (?, 0, 1, 0, 1,... ), which oscillates opposite to PL2 s actions, leading to R T 1 2 T. Still, FTL can be modified by essentially smoothing the best-response map, so that the oscillation observed above is prevented. N. Shimkin, Technion 25 / 28

35 Perturbed FTL The Online Decision Problem Repeated Games Let z t = (z j,t ) j J, t 1 be a collection of i.i.d. random variables, with z j,t U[0, 1] uniformly distributed on [0, 1]. Let i t+1 = BR( q t + λ t z t ) Hannan s result holds with z j,t U[0, 1] uniformly distributed on [0, 1], and λ t = c 1 / t, with c 1 = (3n 2 J /2n I) 1/2. N. Shimkin, Technion 26 / 28

36 Smooth Fictitious Play Repeated Games In terms of mixed actions, Perturbed FTL effectively leads to p t+1 = BR λt (ȳ t ), where q p = BR λ (q) is a smooth version of BR( ) (for each λ > 0). In the next variant, smoothing is obtained analytically through function minimization. In this scheme, introduced by Fudenberg and Levine (1995) and others, smoothing the best-response map is implemented directly using function minimization. Here BR λ (q) = argmin{c(p, q) + λv(p)} p (I) where v : (I) R is a smooth, strictly convex function, with derivatives that are steep at the vertices of (I). N. Shimkin, Technion 27 / 28

37 Smooth Fictitious Play Repeated Games In terms of mixed actions, Perturbed FTL effectively leads to p t+1 = BR λt (ȳ t ), where q p = BR λ (q) is a smooth version of BR( ) (for each λ > 0). In the next variant, smoothing is obtained analytically through function minimization. In this scheme, introduced by Fudenberg and Levine (1995) and others, smoothing the best-response map is implemented directly using function minimization. Here BR λ (q) = argmin{c(p, q) + λv(p)} p (I) where v : (I) R is a smooth, strictly convex function, with derivatives that are steep at the vertices of (I). In particular, choosing v(p) = i p i log p i yields the logistic map BR λ (q) = exp(λ 1 c(i, q)) i exp(λ 1 c(i, q)) N. Shimkin, Technion 27 / 28

38 Pointers to the Literature Literature There are a number of monographs and surveys that encompass a variety of problems within the no-regret framework. The textbook by Cesa-Bianchi and Lugosi (2006) surveys and unifies the different approaches developed within the game theory, information theory statistical decision theory and machines learning communities. The monographs by Fudenberg and Levine (1998) and by Young (2004) provide a game-theoretic viewpoint. A recent survey by Shalev-Shwartz (2011) considers the general problem of Online Convex Optimization, while Bubeck and Cesa-Bianchi (2012) provide an overview of the related (Stochastic and Nonstochastic) Multi-armed Bandit problem. N. Shimkin, Technion 28 / 28

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart August 2016 Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart Center for the Study of Rationality

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo! Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training

More information

An Online Convex Optimization Approach to Blackwell s Approachability

An Online Convex Optimization Approach to Blackwell s Approachability Journal of Machine Learning Research 17 (2016) 1-23 Submitted 7/15; Revised 6/16; Published 8/16 An Online Convex Optimization Approach to Blackwell s Approachability Nahum Shimkin Faculty of Electrical

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality Tim Roughgarden December 3, 2014 1 Preamble This lecture closes the loop on the course s circle of

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview

More information

Game Theory, On-line prediction and Boosting (Freund, Schapire)

Game Theory, On-line prediction and Boosting (Freund, Schapire) Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Gambling in a rigged casino: The adversarial multi-armed bandit problem

Gambling in a rigged casino: The adversarial multi-armed bandit problem Gambling in a rigged casino: The adversarial multi-armed bandit problem Peter Auer Institute for Theoretical Computer Science University of Technology Graz A-8010 Graz (Austria) pauer@igi.tu-graz.ac.at

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007 MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

Applications of on-line prediction. in telecommunication problems

Applications of on-line prediction. in telecommunication problems Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some

More information

CS281B/Stat241B. Statistical Learning Theory. Lecture 1.

CS281B/Stat241B. Statistical Learning Theory. Lecture 1. CS281B/Stat241B. Statistical Learning Theory. Lecture 1. Peter Bartlett 1. Organizational issues. 2. Overview. 3. Probabilistic formulation of prediction problems. 4. Game theoretic formulation of prediction

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Online Learning for Time Series Prediction

Online Learning for Time Series Prediction Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

Online Prediction: Bayes versus Experts

Online Prediction: Bayes versus Experts Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,

More information

An Algorithms-based Intro to Machine Learning

An Algorithms-based Intro to Machine Learning CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based

More information

CS261: Problem Set #3

CS261: Problem Set #3 CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Perceptron Mistake Bounds

Perceptron Mistake Bounds Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce

More information

NOTE. A 2 2 Game without the Fictitious Play Property

NOTE. A 2 2 Game without the Fictitious Play Property GAMES AND ECONOMIC BEHAVIOR 14, 144 148 1996 ARTICLE NO. 0045 NOTE A Game without the Fictitious Play Property Dov Monderer and Aner Sela Faculty of Industrial Engineering and Management, The Technion,

More information

Time Series Prediction & Online Learning

Time Series Prediction & Online Learning Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

Adaptive Game Playing Using Multiplicative Weights

Adaptive Game Playing Using Multiplicative Weights Games and Economic Behavior 29, 79 03 (999 Article ID game.999.0738, available online at http://www.idealibrary.com on Adaptive Game Playing Using Multiplicative Weights Yoav Freund and Robert E. Schapire

More information

THE first formalization of the multi-armed bandit problem

THE first formalization of the multi-armed bandit problem EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can

More information

arxiv: v1 [cs.lg] 8 Feb 2018

arxiv: v1 [cs.lg] 8 Feb 2018 Online Learning: A Comprehensive Survey Steven C.H. Hoi, Doyen Sahoo, Jing Lu, Peilin Zhao School of Information Systems, Singapore Management University, Singapore School of Software Engineering, South

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

arxiv: v1 [cs.lg] 8 Nov 2010

arxiv: v1 [cs.lg] 8 Nov 2010 Blackwell Approachability and Low-Regret Learning are Equivalent arxiv:0.936v [cs.lg] 8 Nov 200 Jacob Abernethy Computer Science Division University of California, Berkeley jake@cs.berkeley.edu Peter L.

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

Online Learning: Random Averages, Combinatorial Parameters, and Learnability

Online Learning: Random Averages, Combinatorial Parameters, and Learnability Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over

More information

Algorithms, Games, and Networks January 17, Lecture 2

Algorithms, Games, and Networks January 17, Lecture 2 Algorithms, Games, and Networks January 17, 2013 Lecturer: Avrim Blum Lecture 2 Scribe: Aleksandr Kazachkov 1 Readings for today s lecture Today s topic is online learning, regret minimization, and minimax

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

arxiv: v4 [cs.lg] 27 Jan 2016

arxiv: v4 [cs.lg] 27 Jan 2016 The Computational Power of Optimization in Online Learning Elad Hazan Princeton University ehazan@cs.princeton.edu Tomer Koren Technion tomerk@technion.ac.il arxiv:1504.02089v4 [cs.lg] 27 Jan 2016 Abstract

More information

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs CS26: A Second Course in Algorithms Lecture #2: Applications of Multiplicative Weights to Games and Linear Programs Tim Roughgarden February, 206 Extensions of the Multiplicative Weights Guarantee Last

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

On Competitive Prediction and Its Relation to Rate-Distortion Theory

On Competitive Prediction and Its Relation to Rate-Distortion Theory IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 12, DECEMBER 2003 3185 On Competitive Prediction and Its Relation to Rate-Distortion Theory Tsachy Weissman, Member, IEEE, and Neri Merhav, Fellow,

More information

Piecewise-stationary Bandit Problems with Side Observations

Piecewise-stationary Bandit Problems with Side Observations Jia Yuan Yu jia.yu@mcgill.ca Department Electrical and Computer Engineering, McGill University, Montréal, Québec, Canada. Shie Mannor shie.mannor@mcgill.ca; shie@ee.technion.ac.il Department Electrical

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

Multi-Armed Bandit Formulations for Identification and Control

Multi-Armed Bandit Formulations for Identification and Control Multi-Armed Bandit Formulations for Identification and Control Cristian R. Rojas Joint work with Matías I. Müller and Alexandre Proutiere KTH Royal Institute of Technology, Sweden ERNSI, September 24-27,

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

0.1 Motivating example: weighted majority algorithm

0.1 Motivating example: weighted majority algorithm princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

Online Bounds for Bayesian Algorithms

Online Bounds for Bayesian Algorithms Online Bounds for Bayesian Algorithms Sham M. Kakade Computer and Information Science Department University of Pennsylvania Andrew Y. Ng Computer Science Department Stanford University Abstract We present

More information

Agnostic Online Learning

Agnostic Online Learning Agnostic Online Learning Shai Ben-David and Dávid Pál David R. Cheriton School of Computer Science University of Waterloo Waterloo, ON, Canada {shai,dpal}@cs.uwaterloo.ca Shai Shalev-Shwartz Toyota Technological

More information

Classification objectives COMS 4771

Classification objectives COMS 4771 Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

1 Review of Winnow Algorithm

1 Review of Winnow Algorithm COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

A Second-order Bound with Excess Losses

A Second-order Bound with Excess Losses A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands

More information

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are

More information

Algorithmic Stability and Generalization Christoph Lampert

Algorithmic Stability and Generalization Christoph Lampert Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts

More information

University of Alberta. The Role of Information in Online Learning

University of Alberta. The Role of Information in Online Learning University of Alberta The Role of Information in Online Learning by Gábor Bartók A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree

More information

Brown s Original Fictitious Play

Brown s Original Fictitious Play manuscript No. Brown s Original Fictitious Play Ulrich Berger Vienna University of Economics, Department VW5 Augasse 2-6, A-1090 Vienna, Austria e-mail: ulrich.berger@wu-wien.ac.at March 2005 Abstract

More information

Learning to play partially-specified equilibrium

Learning to play partially-specified equilibrium Learning to play partially-specified equilibrium Ehud Lehrer and Eilon Solan August 29, 2007 August 29, 2007 First draft: June 2007 Abstract: In a partially-specified correlated equilibrium (PSCE ) the

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

Online Learning Summer School Copenhagen 2015 Lecture 1

Online Learning Summer School Copenhagen 2015 Lecture 1 Online Learning Summer School Copenhagen 2015 Lecture 1 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Online Learning Shai Shalev-Shwartz (Hebrew U) OLSS Lecture

More information

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Sequential Decision Making in Non-stochastic Environments

Sequential Decision Making in Non-stochastic Environments Sequential Decision Making in Non-stochastic Environments Jacob Abernethy Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2012-25 http://www.eecs.berkeley.edu/pubs/techrpts/2012/eecs-2012-25.html

More information