The No-Regret Framework for Online Learning
|
|
- Cordelia Dennis
- 6 years ago
- Views:
Transcription
1 The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin, Technion 1 / 28
2 Outline 1 The online decision problem 2 Blackwell s Approachability 3 The Multiplicative Weight Algorithm 4 Online convex programming 5 Recent topics N. Shimkin, Technion 2 / 28
3 1. The Online Decision Problem N. Shimkin, Technion 3 / 28
4 Supervised Learning 1.1 Preliminaries: Supervised Learning The basic problem of supervised learning can be roughly formulated in the following terms. Given a sequence of examples (x n, y n ) N n=1, where x n X is the input pattern (or feature vector) and y n Y the corresponding label, find a map h : x X ŷ Y, that will correctly predict the true label y of other (yet unseen) input patterns x. N. Shimkin, Technion 4 / 28
5 Supervised Learning 1.1 Preliminaries: Supervised Learning The basic problem of supervised learning can be roughly formulated in the following terms. Given a sequence of examples (x n, y n ) N n=1, where x n X is the input pattern (or feature vector) and y n Y the corresponding label, find a map h : x X ŷ Y, that will correctly predict the true label y of other (yet unseen) input patterns x. When Y is a discrete (finite) set, this formulation corresponds to a classification problem. If Y = {0, 1} this a binary classification problem, and for Y a continuous set we obtain a regression problem. Statistical learning theory often assumes that the samples (x n, y n ) are i.i.d. samples from a fixed, yet unknown, probability distribution. N. Shimkin, Technion 4 / 28
6 Supervised learning The Online Decision Problem Supervised Learning The quality of prediction is measured by a cost (or loss) function c(ŷ, y). For classification this may be the 0-1 cost function: c(ŷ, y) = 1 {ŷ=y}. In regression, this may be the quadratic cost: c(ŷ, y) = ŷ y 2. N. Shimkin, Technion 5 / 28
7 Supervised learning The Online Decision Problem Supervised Learning The quality of prediction is measured by a cost (or loss) function c(ŷ, y). For classification this may be the 0-1 cost function: c(ŷ, y) = 1 {ŷ=y}. In regression, this may be the quadratic cost: c(ŷ, y) = ŷ y 2. It is sometimes convenient to allow the predictions ŷ to take values in a larger set Ŷ than the label set Y. For example, in the binary classification problems we may want to allow probabilistic predictions ( 80% chance for rain tomorrow ). We therefore consider prediction functions h : X Ŷ and cost c : Ŷ Y R +. The prediction function h is often restricted to some predefined class H. For example, in linear regression, h(x) = w, x, where w is a vector of weights to be tuned. N. Shimkin, Technion 5 / 28
8 Supervised learning The Online Decision Problem Supervised Learning The quality of prediction is measured by a cost (or loss) function c(ŷ, y). For classification this may be the 0-1 cost function: c(ŷ, y) = 1 {ŷ=y}. In regression, this may be the quadratic cost: c(ŷ, y) = ŷ y 2. It is sometimes convenient to allow the predictions ŷ to take values in a larger set Ŷ than the label set Y. For example, in the binary classification problems we may want to allow probabilistic predictions ( 80% chance for rain tomorrow ). We therefore consider prediction functions h : X Ŷ and cost c : Ŷ Y R +. The prediction function h is often restricted to some predefined class H. For example, in linear regression, h(x) = w, x, where w is a vector of weights to be tuned. Below we will variably denote the Euclidean inner product as w, x, w x, or w T x (for column vectors), as convenient. N. Shimkin, Technion 5 / 28
9 Online Learning and Regret 1.2 Online Learning and Regret In online learning, examples are displayed sequentially, and learning takes place simultaneously with prediction. A generic template for this process is as follows. For t = 1, 2,..., observe input x t X predict y t Ŷ observe true answer y t Y suffer cost (or loss) c(ŷ t, y t ) N. Shimkin, Technion 6 / 28
10 Online Learning and Regret 1.2 Online Learning and Regret In online learning, examples are displayed sequentially, and learning takes place simultaneously with prediction. A generic template for this process is as follows. For t = 1, 2,..., observe input x t X predict y t Ŷ observe true answer y t Y suffer cost (or loss) c(ŷ t, y t ) The cumulative cost over T periods is therefore C T = T c(ŷ t, y t ). It is generally required to make this cumulative loss as small as possible, in some appropriate sense. t=1 N. Shimkin, Technion 6 / 28
11 Online learning examples Online Learning and Regret For concreteness, consider the following familiar examples: 1 Weather prediction: Suppose we wish to predict tomorrow s weather. This may involve a classification problem (rain/no-rain), or a regression problem (max/min temperature prediction). In any case, while we may be improving our prediction capability over time, the goal is clearly to provide accurate predictions throughout this period. N. Shimkin, Technion 7 / 28
12 Online learning examples Online Learning and Regret For concreteness, consider the following familiar examples: 1 Weather prediction: Suppose we wish to predict tomorrow s weather. This may involve a classification problem (rain/no-rain), or a regression problem (max/min temperature prediction). In any case, while we may be improving our prediction capability over time, the goal is clearly to provide accurate predictions throughout this period. 2 Spam filter: Here we wish to classify incoming mail as spam / no spam. Some messages x t are displayed to the user to get their true label y t. (The process of choosing which messages to display falls in the area of active learning, which we do not address here. It is also interesting that valuable information can be gained from unlabeled examples; this is addressed within semi-supervised learning.) Here, again, learning takes place online. N. Shimkin, Technion 7 / 28
13 The Arbitrary Opponent Online Learning and Regret In these lectures we do not impose statistical assumptions on the examples sequence. Rather, we refer to this sequence as arbitrary. It is convenient to think of this sequence as chosen by an opponent (often an imaginary one). We may further distinguish between the following cases: 1 An oblivious opponent: Here the example sequence (x t, y t ) is preset, in the sense that it does not depend on the results (ŷ t ) of our prediction algorithm. This would be the case in the weather prediction problem. 2 An adaptive, or adversarial, opponent: Here the opponent may choose future examples based on past predictions of our algorithm. This might be the case in the spam filter example. We will make these assumptions more concrete in the problems discussed below. In either case, we refer to the choice of samples by the opponent as the opponent s strategy. N. Shimkin, Technion 8 / 28
14 The Arbitrary Opponent Online Learning and Regret Either way, it is evident that the cumulative cost will depend on the examples sequence. For example, in areas where it rains every day (or never rains), we expect to approach 100% success rate. While if Rain happens to follow an i.i.d. Bernoulli sequence with parameter q [0, 1], the best we can hope for is an asymptotic relative accuracy of min{q, 1 q}. Thus, it would be advisable to compare the performance of our learning prediction algorithm to ideal, baseline performance. This is where the concept of regret comes in. N. Shimkin, Technion 9 / 28
15 Regret The Online Decision Problem Online Learning and Regret The (cumulative, T -step) regret relative to some fixed predictor h is defined as follows: T T Regret T (h) = c(ŷ t, y t ) c(h(x t ), y t ) t=1 t=1 N. Shimkin, Technion 10 / 28
16 Regret The Online Decision Problem Online Learning and Regret The (cumulative, T -step) regret relative to some fixed predictor h is defined as follows: T T Regret T (h) = c(ŷ t, y t ) c(h(x t ), y t ) t=1 The regret relative to a predictor class H (e.g., the set of all linear predictors) is then defined as Regret T (H) = max h H Regret T (h) = t=1 T c(ŷ t, y t ) min t=1 h H t=1 T c(h(x t ), y t ) In the last term, the best fixed predictor h H is chosen with the benefit of hindsight, i.e., given the entire sample sequence (x t, y t ) up to T. N. Shimkin, Technion 10 / 28
17 The No-Regret Property Online Learning and Regret A learning algorithm is said to have the no-regret property (w.r.t. H) if the regret is guaranteed to grow sub-linearly in T, namely, Regret T (H) = o(t ) (in some appropriate sense), for any strategy of the opponent. N. Shimkin, Technion 11 / 28
18 Prediction 1.3 Prediction An important special case is the problem of sequential prediction. Here the challenge is to predict the next element y t based on the previous elements (y 1,..., y t 1 ) of the sequence, for t = 1, 2,.... This may be viewed as a special case of the previous model, with absent patterns x t. If we assume a statistical structure (e.g., Markovian) on the sequence, the problem becomes that of statistical model estimation and prediction. Prediction of arbitrary sequences with regret bounds has been treated extensively in the Information Theory literature (under the names universal prediction or individual sequence prediction; see Merhav and Feder (1998). N. Shimkin, Technion 12 / 28
19 Prediction with Expert Advice Prediction A related important model is that of prediction with expert advice (Littlestone and Warmuth, 1994). Here we are assisted in out prediction task by a set E of experts (which themselves may be prediction algorithms), among which we wish to follow the best one. N. Shimkin, Technion 13 / 28
20 Prediction with Expert Advice Prediction A related important model is that of prediction with expert advice (Littlestone and Warmuth, 1994). Here we are assisted in out prediction task by a set E of experts (which themselves may be prediction algorithms), among which we wish to follow the best one. The problem is formulated as follows. For t = 1, 2,..., 1 The environment chooses the outcome y t. 2 Simultaneously, each expert e chooses an advice ŷ e,t, which is revealed to the forecaster. 3 The forecaster chooses a prediction ŷ t, after which he observes the true answer y t. 4 The forecaster suffers a loss c(ŷ t, y t ) The goal of the forecaster here is to minimize his regret with respect the best expert. Here the regret is defined as T T Regret T = c(ŷ t, y t ) min c(ŷ e,t, y t ) e E t=1 N. Shimkin, Technion 13 / 28 t=1
21 Repeated Games 1.4 Repeated Games The notion of no-regret strategies, or no-regret play, was introduced in a seminal 1957 paper by Hannan (1957) in the context of repeated matrix games. N. Shimkin, Technion 14 / 28
22 Matrix Games (reminder) Repeated Games Recall that a zero-sum matrix game is defined through a payoff (or cost) matrix Γ = {c(i, j) : i I, j J }, where I is the set of actions of player 1 (PL1, the learner), and J is the set of actions for player 2 (PL2, the opponent). Let p (I) denote a mixed action of PL1, and q (J ) a mixed action of PL2. The expected cost (to PL1) under mixed actions p and q is c(p, q) = i,j p i c(i, j)q j The minimax value of the game (with PL1 the minimizer) is given by v(γ) = min max p (I) q (J ) = max min q (J ) p (I) c(p, q) c(p, q) N. Shimkin, Technion 15 / 28
23 The Repeated Game The Online Decision Problem Repeated Games The repeated game Γ proceeds in stages t = 1, 2,..., where at stage t, PL1 and PL2 simultaneously choose action i t and j t, respectively, which are then observed (and recalled) by both players. A strategy π 1 for PL1 is a map that assigns a mixed action p t (I) to each possible history H t = (i 1, j 1,..., i t 1, j t 1 ) and time t 1, and similarly a strategy π 2 for PL2 assigns a mixed action q t (J ) to each possible history. The actions i i and j t are chosen randomly according to p t and q t. Any pair (π 1, π 2 ) induces a probability measure on H, which we denote by P π1,π 2. We shall be interested in the (long-term) cumulative cost (or loss): C T = T c(i t, j t ) t=1 N. Shimkin, Technion 16 / 28
24 Regret (again) The Online Decision Problem Repeated Games The minimax value of this game (for any fixed T, and for T ) is easily seen to equal v(γ), the value of the single-shot game. However, assuming that PL2 is not necessarily adversarial (and, indeed, not necessary rational in the game-theoretic sense), it begs the question whether PL1 can gain more than the value by adapting to the observed action history of PL2. N. Shimkin, Technion 17 / 28
25 Regret (again) The Online Decision Problem Repeated Games The minimax value of this game (for any fixed T, and for T ) is easily seen to equal v(γ), the value of the single-shot game. However, assuming that PL2 is not necessarily adversarial (and, indeed, not necessary rational in the game-theoretic sense), it begs the question whether PL1 can gain more than the value by adapting to the observed action history of PL2. To this end, define the following (cumulative) regret for PL1, R T = T t=1 c(i t, j t ) min i I T c(i, j t ) (1) The second term on the RHS serves here as our reference level, to which the actual cost is compared. t=1 Naturally, PL1 would like to have a regret as small as possible. N. Shimkin, Technion 17 / 28
26 No-regret Strategies The Online Decision Problem Repeated Games Define the average regret: R T = 1 T R T Definition A strategy π 1 of PL1 is said to have the no-regret property (or be Hannan-consistent) if lim sup T R T 0, P π1,π 2 -a.s. (2) for any strategy π 2 of PL2. More succinctly, we may write this property as R T o(1) (a.s.), or R T o(t ) (a.s.). N. Shimkin, Technion 18 / 28
27 Some More Notations... Repeated Games The RHS of (1) can be written in another convenient form. Let q T = 1 T T t=1 e j t denote the empirical frequency vector of PL2 s actions (here e j (J ) is the mixed action concentrated on action j). Recalling our convention c(i, q) = j c(i, j)q j, it follows that min i I 1 T T t=1 c(i, j t ) = min c(i, q T ) = min c(p, q T ) = c ( q T ) i I p (I) where c (q) = min p c(p, q) is the best-response cost, also known as the Bayes risk of the game. The average regret may now be written more succinctly as R T = C T c ( q T ) N. Shimkin, Technion 19 / 28
28 Repeated Games The no-regret property R T o(1) (a.s.) may now be written as 1 T C T c (ȳ T ) + o(1) (a.s.). Accordingly, a no-regret strategy is sometimes said to attain the Bayes risk of the game. N. Shimkin, Technion 20 / 28
29 Notes on Randomization Repeated Games 1. Necessity of randomization: It is easily seen that no deterministic strategy, i.e., a map i t = f t (h t 1 ), can satisfy the no-regret property. Indeed, an (adaptive) opponent has access to h t 1, and can choose j t the maximize the loss vs. i t. For example, in the binary prediction problem he might choose j t = 1 i t, which obtains a cumulative loss of C T = T and cumulative regret of 1 2T or more. Hence, we use strictly mixed actions p t as part of the learning policy. N. Shimkin, Technion 21 / 28
30 Notes on Randomization Repeated Games 2. Smoothed regret: It is easily seen that the difference d t = c(p t, j t ) c(i t, j t ) is a bounded Martingale difference sequence on F t = σ{h t 1, j t }. Hence, by the CLT, the average difference 1 T T t=1 d t is of order O( T ), and in particular converges to 0 (a.s.). We can therefore define the regret in terms of c(p t, j t ) in place of c(i t, j t ), namely R T = T t=1 c(p t, j t ) min i I T c(i, q t ) This definition allows to establish sample-path (rather than a.s.) bounds on the regret. Henceforth we will use the latter definition. t=1 N. Shimkin, Technion 22 / 28
31 Hannan s No-regret Theorem Repeated Games Theorem (Hannan 57) There exists a strategy π for the learner so that R T c 0 T for any strategy of the opponent and T 1. Here c 0 = ( 3 2 n I ) 1 2 n J span(c). The constant c 0 in Hannan s result was subsequently improved, but not the rate T which is optimal. The proposed strategy was a perturbed FTL (follow the leader) scheme, which we briefly describe next. N. Shimkin, Technion 23 / 28
32 FTL-Type Strategies The Online Decision Problem Repeated Games The FTL (follow the leader) strategy is given by: i t+1 = argmin{ i I (with ties arbitrarily broken). t c(i, j t )} s=1 = argmin{c(i, q t )} = BR( q t ) i I Here the learner uses a best-response action against the empirical frequency of the opponent s actions. This simple rule is also known as fictitious play in the game literature. N. Shimkin, Technion 24 / 28
33 Repeated Games It is easily seen that FTL does not have the no-regret property, even against an oblivious opponent. Consider the binary prediction problem, namely i, j {0, 1}, with c(i, j) = 1 {i=j}. Suppose PL2 chooses the sequence (j, 1, 0, 1, 0,... ), where j is some auxiliary action with c(0, j ) = 0.5, c(1, j ) = 0. In that case FTL yields the sequence (i t ) = (?, 0, 1, 0, 1,... ), which oscillates opposite to PL2 s actions, leading to R T 1 2 T. N. Shimkin, Technion 25 / 28
34 Repeated Games It is easily seen that FTL does not have the no-regret property, even against an oblivious opponent. Consider the binary prediction problem, namely i, j {0, 1}, with c(i, j) = 1 {i=j}. Suppose PL2 chooses the sequence (j, 1, 0, 1, 0,... ), where j is some auxiliary action with c(0, j ) = 0.5, c(1, j ) = 0. In that case FTL yields the sequence (i t ) = (?, 0, 1, 0, 1,... ), which oscillates opposite to PL2 s actions, leading to R T 1 2 T. Still, FTL can be modified by essentially smoothing the best-response map, so that the oscillation observed above is prevented. N. Shimkin, Technion 25 / 28
35 Perturbed FTL The Online Decision Problem Repeated Games Let z t = (z j,t ) j J, t 1 be a collection of i.i.d. random variables, with z j,t U[0, 1] uniformly distributed on [0, 1]. Let i t+1 = BR( q t + λ t z t ) Hannan s result holds with z j,t U[0, 1] uniformly distributed on [0, 1], and λ t = c 1 / t, with c 1 = (3n 2 J /2n I) 1/2. N. Shimkin, Technion 26 / 28
36 Smooth Fictitious Play Repeated Games In terms of mixed actions, Perturbed FTL effectively leads to p t+1 = BR λt (ȳ t ), where q p = BR λ (q) is a smooth version of BR( ) (for each λ > 0). In the next variant, smoothing is obtained analytically through function minimization. In this scheme, introduced by Fudenberg and Levine (1995) and others, smoothing the best-response map is implemented directly using function minimization. Here BR λ (q) = argmin{c(p, q) + λv(p)} p (I) where v : (I) R is a smooth, strictly convex function, with derivatives that are steep at the vertices of (I). N. Shimkin, Technion 27 / 28
37 Smooth Fictitious Play Repeated Games In terms of mixed actions, Perturbed FTL effectively leads to p t+1 = BR λt (ȳ t ), where q p = BR λ (q) is a smooth version of BR( ) (for each λ > 0). In the next variant, smoothing is obtained analytically through function minimization. In this scheme, introduced by Fudenberg and Levine (1995) and others, smoothing the best-response map is implemented directly using function minimization. Here BR λ (q) = argmin{c(p, q) + λv(p)} p (I) where v : (I) R is a smooth, strictly convex function, with derivatives that are steep at the vertices of (I). In particular, choosing v(p) = i p i log p i yields the logistic map BR λ (q) = exp(λ 1 c(i, q)) i exp(λ 1 c(i, q)) N. Shimkin, Technion 27 / 28
38 Pointers to the Literature Literature There are a number of monographs and surveys that encompass a variety of problems within the no-regret framework. The textbook by Cesa-Bianchi and Lugosi (2006) surveys and unifies the different approaches developed within the game theory, information theory statistical decision theory and machines learning communities. The monographs by Fudenberg and Levine (1998) and by Young (2004) provide a game-theoretic viewpoint. A recent survey by Shalev-Shwartz (2011) considers the general problem of Online Convex Optimization, while Bubeck and Cesa-Bianchi (2012) provide an overview of the related (Stochastic and Nonstochastic) Multi-armed Bandit problem. N. Shimkin, Technion 28 / 28
Agnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationSmooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics
Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart August 2016 Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart Center for the Study of Rationality
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationAn Online Convex Optimization Approach to Blackwell s Approachability
Journal of Machine Learning Research 17 (2016) 1-23 Submitted 7/15; Revised 6/16; Published 8/16 An Online Convex Optimization Approach to Blackwell s Approachability Nahum Shimkin Faculty of Electrical
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationCS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality
CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality Tim Roughgarden December 3, 2014 1 Preamble This lecture closes the loop on the course s circle of
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationAdvanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12
Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview
More informationGame Theory, On-line prediction and Boosting (Freund, Schapire)
Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationGambling in a rigged casino: The adversarial multi-armed bandit problem
Gambling in a rigged casino: The adversarial multi-armed bandit problem Peter Auer Institute for Theoretical Computer Science University of Technology Graz A-8010 Graz (Austria) pauer@igi.tu-graz.ac.at
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationLecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007
MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationOnline Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm
More informationExponential Weights on the Hypercube in Polynomial Time
European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationApplications of on-line prediction. in telecommunication problems
Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some
More informationCS281B/Stat241B. Statistical Learning Theory. Lecture 1.
CS281B/Stat241B. Statistical Learning Theory. Lecture 1. Peter Bartlett 1. Organizational issues. 2. Overview. 3. Probabilistic formulation of prediction problems. 4. Game theoretic formulation of prediction
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationA survey: The convex optimization approach to regret minimization
A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationOnline Learning for Time Series Prediction
Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationBandits for Online Optimization
Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16
600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationBandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University
Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationOnline Prediction: Bayes versus Experts
Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,
More informationAn Algorithms-based Intro to Machine Learning
CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based
More informationCS261: Problem Set #3
CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.
More informationNew bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationNOTE. A 2 2 Game without the Fictitious Play Property
GAMES AND ECONOMIC BEHAVIOR 14, 144 148 1996 ARTICLE NO. 0045 NOTE A Game without the Fictitious Play Property Dov Monderer and Aner Sela Faculty of Industrial Engineering and Management, The Technion,
More informationTime Series Prediction & Online Learning
Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.
More informationExplore no more: Improved high-probability regret bounds for non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret
More informationAdaptive Game Playing Using Multiplicative Weights
Games and Economic Behavior 29, 79 03 (999 Article ID game.999.0738, available online at http://www.idealibrary.com on Adaptive Game Playing Using Multiplicative Weights Yoav Freund and Robert E. Schapire
More informationTHE first formalization of the multi-armed bandit problem
EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can
More informationarxiv: v1 [cs.lg] 8 Feb 2018
Online Learning: A Comprehensive Survey Steven C.H. Hoi, Doyen Sahoo, Jing Lu, Peilin Zhao School of Information Systems, Singapore Management University, Singapore School of Software Engineering, South
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationarxiv: v1 [cs.lg] 8 Nov 2010
Blackwell Approachability and Low-Regret Learning are Equivalent arxiv:0.936v [cs.lg] 8 Nov 200 Jacob Abernethy Computer Science Division University of California, Berkeley jake@cs.berkeley.edu Peter L.
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationSelecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden
1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized
More informationOnline Learning: Random Averages, Combinatorial Parameters, and Learnability
Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationAlgorithms, Games, and Networks January 17, Lecture 2
Algorithms, Games, and Networks January 17, 2013 Lecturer: Avrim Blum Lecture 2 Scribe: Aleksandr Kazachkov 1 Readings for today s lecture Today s topic is online learning, regret minimization, and minimax
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationarxiv: v4 [cs.lg] 27 Jan 2016
The Computational Power of Optimization in Online Learning Elad Hazan Princeton University ehazan@cs.princeton.edu Tomer Koren Technion tomerk@technion.ac.il arxiv:1504.02089v4 [cs.lg] 27 Jan 2016 Abstract
More informationCS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs
CS26: A Second Course in Algorithms Lecture #2: Applications of Multiplicative Weights to Games and Linear Programs Tim Roughgarden February, 206 Extensions of the Multiplicative Weights Guarantee Last
More informationAdaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More informationOn Competitive Prediction and Its Relation to Rate-Distortion Theory
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 12, DECEMBER 2003 3185 On Competitive Prediction and Its Relation to Rate-Distortion Theory Tsachy Weissman, Member, IEEE, and Neri Merhav, Fellow,
More informationPiecewise-stationary Bandit Problems with Side Observations
Jia Yuan Yu jia.yu@mcgill.ca Department Electrical and Computer Engineering, McGill University, Montréal, Québec, Canada. Shie Mannor shie.mannor@mcgill.ca; shie@ee.technion.ac.il Department Electrical
More informationEASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University
EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play
More informationMulti-Armed Bandit Formulations for Identification and Control
Multi-Armed Bandit Formulations for Identification and Control Cristian R. Rojas Joint work with Matías I. Müller and Alexandre Proutiere KTH Royal Institute of Technology, Sweden ERNSI, September 24-27,
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More information0.1 Motivating example: weighted majority algorithm
princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationOnline Bounds for Bayesian Algorithms
Online Bounds for Bayesian Algorithms Sham M. Kakade Computer and Information Science Department University of Pennsylvania Andrew Y. Ng Computer Science Department Stanford University Abstract We present
More informationAgnostic Online Learning
Agnostic Online Learning Shai Ben-David and Dávid Pál David R. Cheriton School of Computer Science University of Waterloo Waterloo, ON, Canada {shai,dpal}@cs.uwaterloo.ca Shai Shalev-Shwartz Toyota Technological
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More informationExtracting Certainty from Uncertainty: Regret Bounded by Variation in Costs
Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301
More information1 Review of Winnow Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm
More informationExplore no more: Improved high-probability regret bounds for non-stochastic bandits
Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret
More informationA Second-order Bound with Excess Losses
A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands
More informationLecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are
More informationAlgorithmic Stability and Generalization Christoph Lampert
Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts
More informationUniversity of Alberta. The Role of Information in Online Learning
University of Alberta The Role of Information in Online Learning by Gábor Bartók A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree
More informationBrown s Original Fictitious Play
manuscript No. Brown s Original Fictitious Play Ulrich Berger Vienna University of Economics, Department VW5 Augasse 2-6, A-1090 Vienna, Austria e-mail: ulrich.berger@wu-wien.ac.at March 2005 Abstract
More informationLearning to play partially-specified equilibrium
Learning to play partially-specified equilibrium Ehud Lehrer and Eilon Solan August 29, 2007 August 29, 2007 First draft: June 2007 Abstract: In a partially-specified correlated equilibrium (PSCE ) the
More informationThe FTRL Algorithm with Strongly Convex Regularizers
CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked
More informationOnline Learning Summer School Copenhagen 2015 Lecture 1
Online Learning Summer School Copenhagen 2015 Lecture 1 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Online Learning Shai Shalev-Shwartz (Hebrew U) OLSS Lecture
More informationOn Minimaxity of Follow the Leader Strategy in the Stochastic Setting
On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationLecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem
Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationBlackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks
Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationSequential Decision Making in Non-stochastic Environments
Sequential Decision Making in Non-stochastic Environments Jacob Abernethy Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2012-25 http://www.eecs.berkeley.edu/pubs/techrpts/2012/eecs-2012-25.html
More information