Online Learning with Experts & Multiplicative Weights Algorithms

Size: px
Start display at page:

Download "Online Learning with Experts & Multiplicative Weights Algorithms"

Transcription

1 Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech

2 Table of contents 1. Online Learning with Experts With a perfect expert Without perfect experts 2. Multiplicative Weights algorithms Regret of Multiplicative Weights 3. Learning with Experts revisited Continuous vector predictions Subtlety in the discrete case 2

3 Today What should you take away from this lecture? How should you base your prediction on expert predictions? What are the characteristics of multiplicative weights algorithms? 3

4 References Online Learning [Ch 2-4], Gabor Bartok, David Pal, Csaba Szepesvari, and Istvan Szita. The Multiplicative Weights Update Method: A Meta-Algorithm and Applications, Sanjeev Arora, Elad Hazan, and Satyen Kale. Theory of Computing, 8, ,

5 Online Learning with Experts

6 Leo at the Oscars: an online binary prediction problem Will Leo win an Oscar this year? (running question since ) 6

7 Leo at the Oscars: an online binary prediction problem... No 2005 No 2006 No 2007 No 2008 No 2009 No 2010 No 7

8 Leo at the Oscars: an online binary prediction problem 2011 No 2012 No 2013 No 2014 No 2015 No 2016 Yes! 8

9 Leo at the Oscars: an online binary prediction problem A frivolous extrapolation: 2016 Yes! 2017 No 2018 Yes! 2019 No 2020 Yes 2021 Yes 9

10 Leo at the Oscars: an online binary prediction problem Imagine you re a showbiz fan and want to predict the answer every year t. But, you don t know all the ins and outs of Hollywood. Instead, you are lucky enough to have access to experts {i} i I, that each make a (possibly wrong!) prediction p i,t. How do you use the experts for your own prediction? How do you incorporate the feedback from Nature? How do we achieve sublinear regret? 10

11 Formal description 11

12 Online Learning with Experts Formal setup: there are T rounds in which we predict a binary label ˆp t Y = {0, 1}. At every timestep t: 1. Each expert i = 1... n predicts p i,t Y 2. You make a prediction ˆp t Y 3. Nature reveals y t Y 4. We suffer a loss l(ˆp t, y t ) = 1 [ˆp t y t ] 5. Each expert suffers a loss l(p i,t, y t) = 1 [p i,t y t] Loss: ˆL T = T l(ˆpt, yt), L i,t = T l(p i,t, y t) # of mistakes made Regret: R T = ˆL T min i L i,t How do you decide what ˆp t to predict? How do you incorporate the feedback from Nature? How do we achieve sublinear regret? 12

13 Online Learning with Experts Experts is a way to abstract a hypothesis class. For the most part, we ll deal with a finite, discrete number of experts, because that s easier to analyze. In general, there can be a continuous space of experts = using a standard hypothesis class. Boosting uses a collection of n weak classifiers as experts. At every time t it adds 1 weak classifier with weight α t to the ensemble classifier h 1:t. The prediction ˆp t is based on the ensemble h 1:t that we have collected so far. 13

14 Weighted Majority: the halving algorithm Let s assume that there is a perfect expert i, that is guaranteed to know the right answer (say a mind-reader that can read the minds of the voting Academy members). That is, t : p i,t = y t and l i,t = 0. Keep regret (= # mistakes) small find the perfect expert quickly with few mistakes. Eliminate experts that make a mistake Take a majority vote of alive experts Let s define some groups of experts: The alive set: E t = {i : i did not make a mistake until time t}, E 0 = {1,..., n} The nay-sayers: E 0 t 1 = {i E t 1 : p i,t = 0} The yay-sayers: E 1 t 1 = {i E t 1 : p i,t = 1} 14

15 Weighted Majority: the halving algorithm t ˆlt y t ˆp t p 1,t p 2,t p 3,t p 4,t p 5,t p 6,t p 7,t p 8,t p 9,t p 10,t p 11,t p 12,t No No No No No Yes! No Yes! No Yes Yes 15

16 Weighted Majority: the halving algorithm The halving algorithm 1: for t = 1... T do 2: Receive expert predictions (p 1,t... p n,t ) 3: Split E t 1 into E 0 t 1 E 1 t 1 4: if E 1 t 1 > E 0 t 1 then follow the majority 5: Predict ˆp t = 1 6: else 7: Predict ˆp t = 0 8: end if 9: Receive Nature s answer y t (and incur loss l(ˆp t, y t )) 10: Update E t with experts that continue to be right 11: end for 16

17 Weighted Majority: the halving algorithm Notice that if halving makes a mistake, then at least half of the experts in E t 1 were wrong: W t = E t = E y t t 1 E t 1 /2. Since there is always a perfect expert, the algorithm makes no more than log 2 E 0 = log 2 n mistakes sublinear regret. Qualitatively speaking: W t multiplicatively decreases when halving makes a mistake. If W t doesn t shrink too much, then halving can t make too many mistakes. There is a lower bound on W t for all t, since there is an expert. W t can t shrink too much starting from its initial value. We ll see similar behavior later. 17

18 Weighted Majority So what should we do if we don t know anything about the experts? The majority vote might be always wrong! Give each of them a weight w i,t, initialized as w i,1 = 1. Decide based on a weighted sum of experts: ˆp t = 1 [weighted sum of yay-sayers > weighted sum of nay-sayers] [ n ˆp t = 1 w i,t 1 p i,t > i=1 ] n w i,t 1 (1 p i,t ) i=1 18

19 Weighted Majority Recall: with a perfect expert we had The halving algorithm 1: for t = 1... T do 2: Receive expert predictions (p 1,t... p n,t ) 3: Split E t 1 into E 0 t 1 E 1 t 1 4: if E 1 t 1 > E 0 t 1 then follow the majority 5: Predict ˆp t = 1 6: else 7: Predict ˆp t = 0 8: end if 9: Receive Nature s answer y t (and incur loss l(ˆp t, y t )) 10: Update E t with experts that continue to be right 11: end for 19

20 Weighted Majority Without knowledge about the experts: choose a decay factor β [0, 1) The Weighted Majority algorithm 1: Initialize w i,1 = 1, W 1 = n. 2: for t = 1... T do 3: Receive expert predictions (p 1,t... p n,t ) 4: if n i=1 w i,t 1p i,t > n i=1 w i,t 1(1 p i,t ) then 5: Predict ˆp t = 1 6: else 7: Predict ˆp t = 0 8: end if 9: Receive Nature s answer y t (and incur loss l(ˆp t, y t )) 10: w i,t w i,t 1 β 1(pi,t y t) 11: end for 20

21 Weighted Majority halving is equivalent to choosing w i,t = 1 if i is alive (β = 0). What happens as β 1? Faulty experts are not punished that much. 21

22 Weighted Majority Define the potential W t = n i=1 w i,t (the weighted size of the set of experts). Theorem 1: W t W t 1 If ˆp t y t then W t 1+β 2 W t 1. Compare with before: W t multiplicatively decreases when halving makes a mistake. Is there always a lower non-zero bound on W t for all t? Yes, with finite T. No, infinite time all weights could decay towards 0. Note that we didn t normalize the w i,t - we ll fix this in the next part. What about the regret behavior? Is it sublinear? We ll leave this for now. 22

23 Multiplicative Weights algorithms

24 Multiplicative Weights High-level intuition: More general case: choose from n options. Stochastic strategies allowed. Not always experts present. 24

25 Multiplicative Weights Setup: at each timestep t = 1... T: 1. you want to take a decision d D = {1,..., n} 2. you choose a distribution p t over D and sample randomly from it. 3. Nature reveals the cost vector c t, where each cost c i,t is bounded in [ 1, 1] 4. the expected cost is then E i pt [c t ] = c t p t. Goal: minimize regret with respect to the best decision in hindsight, min i T c i,t. In halving, decision = choose expert, cost = mistake, and p t was 1 for the majority of alive experts (note that p is not prediction here). 25

26 The Multiplicative Weights algorithm Comes in many forms! The multiplicative weights algorithm 1: Fix η : Initialize w i,1 = 1 3: Initialize W 1 = n 4: for t = 1... T do 5: W t = i w i,t. 6: Choose a decision i p t = w i,t W t 7: Nature reveals c t 8: w i,t+1 w i,t (1 ηc i,t ) 9: end for 26

27 Regret of Multiplicative Weights Regret of multiplicative weights Assume cost c i,t is bounded in [ 1, 1] and η 1. Then after T rounds, for 2 any decision i: c t p t c i,t + η c i,t + log n η 27

28 Regret of Multiplicative Weights Regret of multiplicative weights Assume cost c i,t is bounded in [ 1, 1] and η 1. Then after T rounds, for 2 any decision i: c t p t c i,t + η c i,t + log n η The cost of any fixed decision i Relates to how quickly the MWA updates itself after observing the costs at time t How conservative the MWA should be - if η is too small, we overfit too quickly 28

29 Regret of Multiplicative Weights Regret of multiplicative weights Assume cost c i,t is bounded in [ 1, 1] and η 1. Then after T rounds, for 2 any decision i: c t p t c i,t + η c i,t + log n η η 1 is needed to make some inequalities work. 2 Generality: no assumptions about the sequence of events (may be correlated, costs may be adversarial)! Holds for all i: taking a min over decisions, we see: R T log n η + η min i c i,t 29

30 Regret of Multiplicative Weights What is the regret behavior? R T log n η + η min i c i,t Since the sum is O(T), it s sub-linear with appropriate choice η Then ( ) R T O T log n What is the issue with this? We need to know the horizon T up front! ( ) log n If T is unknown, use η t min, 1 or the doubling trick. t 2 log n T. 30

31 Regret of Multiplicative Weights What if we don t choose η R T log n η 1 T? What if η is constant w.r.t. T? What if η < 1 T? + η min i c i,t Fundamental tension between fitting to expert and learning from Nature Optimality of MW: in the online setting you can t do better: the regret is ( ) lower bounded as R T Ω T log n. See Theorem 4.1 in Arora. 31

32 Regret of Multiplicative Weights R T log n η + η min i c i,t Why can t you achieve O(log T) regret as with FTL in this case? Recall for Follow The Leader with strongly convex loss, the difference between the two consecutive decisions and losses scaled as p t p t 1 = O( 1 t ) In MW, the loss-differential scales with p t p t 1 = O(η), which is O( 1 t ), so the MW algorithm needs to be prepared to make much bigger changes than FTL over time. 32

33 Proof of MW The steps in the proof are: 1. Upper bound W t in terms of cumulative decay factors 2. Lower bound W t by using convexity arguments 3. Combine the upper and lower bounds on W t to get the answer Let s go through the proof carefully. 33

34 Proof of MW: Getting the upper bound on W t W t+1 = i w i,t+1 34

35 Proof of MW: Getting the upper bound on W t W t+1 = i = i w i,t+1 w i,t (1 ηc i,t ) 34

36 Proof of MW: Getting the upper bound on W t W t+1 = i w i,t+1 = w i,t (1 ηc i,t ) i = W t ηw t c i,t p i,t i w i,t = W tp i,t 34

37 Proof of MW: Getting the upper bound on W t W t+1 = i w i,t+1 = w i,t (1 ηc i,t ) i = W t ηw t c i,t p i,t = W t (1 ηc t p t ) i w i,t = W tp i,t 34

38 Proof of MW: Getting the upper bound on W t W t+1 = i w i,t+1 = w i,t (1 ηc i,t ) i = W t ηw t c i,t p i,t = W t (1 ηc t p t ) W t exp ( ηc t p t ) i w i,t = W tp i,t convexity 34

39 Proof of MW: Getting the upper bound on W t W t+1 = i w i,t+1 = w i,t (1 ηc i,t ) i = W t ηw t c i,t p i,t = W t (1 ηc t p t ) W t exp ( ηc t p t ) Recursively using this inequality: ( ) W T+1 W 1 exp η c t p t ( = n exp η i ) c t p t w i,t = W tp i,t convexity x R : 1 x e x 34

40 Proof of MW: Getting the lower bound on W t W T+1 w i,t+1 35

41 Proof of MW: Getting the lower bound on W t W T+1 w i,t+1 = t T (1 ηc i,t ) multipl. updates and w i,1 = 1 35

42 Proof of MW: Getting the lower bound on W t W T+1 w i,t+1 = (1 ηc i,t ) multipl. updates and w i,1 = 1 t T = (1 ηc i,t ) (1 ηc i,t ) split positive + negative costs t:c i,t 0 t:c i,t <0 35

43 Proof of MW: Getting the lower bound on W t W T+1 w i,t+1 = (1 ηc i,t ) multipl. updates and w i,1 = 1 t T = (1 ηc i,t ) (1 ηc i,t ) split positive + negative costs t:c i,t 0 t:c i,t <0 Now we use the fact that: x [0, 1] : (1 η) x (1 ηx) x [ 1, 0] : (1 + η) x (1 ηx) 35

44 Proof of MW: Getting the lower bound on W t W T+1 w i,t+1 = (1 ηc i,t ) multipl. updates and w i,1 = 1 t T = (1 ηc i,t ) (1 ηc i,t ) split positive + negative costs t:c i,t 0 t:c i,t <0 Now we use the fact that: x [0, 1] : (1 η) x (1 ηx) x [ 1, 0] : (1 + η) x (1 ηx) Since c i,t [ 1, 1], we can apply this to each factor in the product above: W T+1 (1 η) 0 c i,t (1 + η) <0 c i,t This is the lower bound we want. 35

45 Proof of MW: Combine the upper and lower bounds We get: ( (1 η) 0 c i,t (1 + η) <0 c i,t W T+1 n exp η Take logs: c i,t log (1 η) c i,t log (1 + η) log n η <0 0 ) c t p t c t p t 36

46 Proof of MW: Combine the upper and lower bounds c i,t log (1 η) c i,t log (1 + η) log n η <0 0 Now we ll massage this into the form we want: c t p t 37

47 Proof of MW: Combine the upper and lower bounds c i,t log (1 η) c i,t log (1 + η) log n η <0 0 Now we ll massage this into the form we want: η c t p t c t p t log n c i,t log (1 η) + c i,t log (1 + η) 0 <0 37

48 Proof of MW: Combine the upper and lower bounds c i,t log (1 η) c i,t log (1 + η) log n η <0 0 Now we ll massage this into the form we want: η c t p t c t p t log n c i,t log (1 η) + c i,t log (1 + η) 0 <0 = log n + ( ) 1 c i,t log + c i,t log (1 + η) 1 η 0 <0 37

49 Proof of MW: Combine the upper and lower bounds Since η 1 2, we can use log ( 1 1 η ) η + η 2 and log (1 + η) η η 2 : η c t p t 38

50 Proof of MW: Combine the upper and lower bounds Since η 1 2, we can use log ( 1 1 η ) η + η 2 and log (1 + η) η η 2 : η c t p t log n + 0 ( ) 1 c i,t log + c i,t log (1 + η) 1 η <0 38

51 Proof of MW: Combine the upper and lower bounds Since η 1 2, we can use log ( 1 1 η ) η + η 2 and log (1 + η) η η 2 : η c t p t log n + 0 log n + 0 ( ) 1 c i,t log + c i,t log (1 + η) 1 η <0 ( c i,t η + η 2) + c i,t (η η 2) use inequalities <0 38

52 Proof of MW: Combine the upper and lower bounds Since η 1 2, we can use log ( 1 1 η ) η + η 2 and log (1 + η) η η 2 : η c t p t log n + ( ) 1 c i,t log + c i,t log (1 + η) 1 η 0 <0 log n + ( c i,t η + η 2) + c i,t (η η 2) use inequalities 0 <0 = log n + η c i,t + η 2 c i,t η 2 c i,t 0 <0 38

53 Proof of MW: Combine the upper and lower bounds Since η 1 2, we can use log ( 1 1 η ) η + η 2 and log (1 + η) η η 2 : η c t p t log n + ( ) 1 c i,t log + c i,t log (1 + η) 1 η 0 <0 log n + ( c i,t η + η 2) + c i,t (η η 2) use inequalities 0 <0 = log n + η c i,t + η 2 c i,t η 2 c i,t 0 <0 = log n + η c i,t + η 2 c i,t combine sums 38

54 Proof of MW: Combine the upper and lower bounds Dividing by η, we get what we want: c t p t log n η + c i,t + η c i,t 39

55 Some remarks Matrix form of MW Gains instead of losses See Arora s paper for more details 40

56 Learning with Experts revisited

57 Multiplicative Weights with continuous label spaces. 42

58 Exponential Weighted Average: continuous prediction case Formal setup: there are T rounds in which you predict a label ˆp t C. C is a convex subset of some vector space. At every timestep t: 1. Each expert i = 1... n predicts p i,t C 2. You make a prediction ˆp t C 3. Nature reveals y t Y 4. We suffer a loss l(ˆp t, y t ) 5. Each expert suffers a loss l(p i,t, y t ) Loss: ˆL t = T l(ˆpt, yt), L i,t = T l(p i,t, y t) Regret: R t = ˆL t min i L i,t 43

59 Exponential Weighted Average: continuous prediction case We assume that the loss l(ˆp t, y t ) as a function l : C Y R: is bounded: p C, y Y : l(p, y) [0, 1] l(., y) is convex for any fixed y Y 44

60 Exponential Weighted Average This brings us to a familiar algorithm. Choose an η > 0 (we ll make this more directed later). Exponential Weighted Average algorithm 1: Initialize w i,0 = 1, W 0 = n. 2: for t = 1... T do 3: Receive expert predictions (p 1,t... p N,t ) Ni=1 w i,t 1 p i,t 4: Predict ˆp t = W t 1 5: Receive Nature s answer y t (and incur loss l(ˆp t, y t)) 6: w i,t w i,t 1 e ηl(p i,t,y t) 7: W t N i=1 w i,t 8: end for 45

61 Exponential Weighted Average Regret of EWA Assume that the loss l is a function l : C Y [0, 1], where C is convex and l(., y) is convex for any fixed y Y. Then R T = ˆL T min L i,t log n + ηt i η 8 8 log n T Hence, if we choose η =, the regret R T T log n = O( T). 2 The proof follows from an application of Jensen s inequality and Hoeffding s lemma. See chapter 3 of Bartok for details. Again note that we need to know what T is in advance. 46

62 So how about the regret in the discrete case? 47

63 Weighted Majority Regret of Weighted Majority Let C = Y, Y have at least 2 elements and l(p, y) = 1(p y). Let L i,t = min i L i,t. ( ) ( ) ) 1 2 (log 2 log β 2 L 1+β i,t + log 2 N R T = ˆL T min L i,t ( ) i 2 log 2 1+β This bound is of the form R T = al i,t + b = O(T)! The discrete case is harder than the continuous case if we stick to deterministic algorithms. The worst-case regret is achieved by a set of 2 experts and an outcome sequence y t such that ˆL T = T. See chapter 4 of Bartok for details. 48

64 Today What should you (at least) take away from this lecture? How should you base your prediction on expert predictions? Can use a weighted majority algorithm, which is a type of MWA. General framework for online learning with experts (e.g. boosting). What are the characteristics of multiplicative weights algorithms?... Few assumptions on costs / Nature (can be adversarial), therefore broadly applicable. Fundamental tension between fitting to decision and learning from Nature. Tends to have instantenous loss-differential O( 1 t ), worse than the version of FTL O( 1 ) that we saw in lecture 1. t 49

65 Questions? 50

THE WEIGHTED MAJORITY ALGORITHM

THE WEIGHTED MAJORITY ALGORITHM THE WEIGHTED MAJORITY ALGORITHM Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 3, 2006 OUTLINE 1 PREDICTION WITH EXPERT ADVICE 2 HALVING: FIND THE PERFECT EXPERT!

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Online learning CMPUT 654. October 20, 2011

Online learning CMPUT 654. October 20, 2011 Online learning CMPUT 654 Gábor Bartók Dávid Pál Csaba Szepesvári István Szita October 20, 2011 Contents 1 Shooting Game 4 1.1 Exercises...................................... 6 2 Weighted Majority Algorithm

More information

0.1 Motivating example: weighted majority algorithm

0.1 Motivating example: weighted majority algorithm princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Weighted Majority and the Online Learning Approach

Weighted Majority and the Online Learning Approach Statistical Techniques in Robotics (16-81, F12) Lecture#9 (Wednesday September 26) Weighted Majority and the Online Learning Approach Lecturer: Drew Bagnell Scribe:Narek Melik-Barkhudarov 1 Figure 1: Drew

More information

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,

More information

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

1 Review and Overview

1 Review and Overview DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last

More information

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over

More information

CS261: Problem Set #3

CS261: Problem Set #3 CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron

Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron Stat 928: Statistical Learning Theory Lecture: 18 Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron Instructor: Sham Kakade 1 Introduction This course will be divided into 2 parts.

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Answering Many Queries with Differential Privacy

Answering Many Queries with Differential Privacy 6.889 New Developments in Cryptography May 6, 2011 Answering Many Queries with Differential Privacy Instructors: Shafi Goldwasser, Yael Kalai, Leo Reyzin, Boaz Barak, and Salil Vadhan Lecturer: Jonathan

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Online Prediction: Bayes versus Experts

Online Prediction: Bayes versus Experts Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Algorithms, Games, and Networks January 17, Lecture 2

Algorithms, Games, and Networks January 17, Lecture 2 Algorithms, Games, and Networks January 17, 2013 Lecturer: Avrim Blum Lecture 2 Scribe: Aleksandr Kazachkov 1 Readings for today s lecture Today s topic is online learning, regret minimization, and minimax

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call

More information

Notes on AdaGrad. Joseph Perla 2014

Notes on AdaGrad. Joseph Perla 2014 Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13 Classification Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY Slides adapted from Mohri Jordan Boyd-Graber UMD Classification 1 / 13 Beyond Binary Classification Before we ve talked about

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Online Learning under Full and Bandit Information

Online Learning under Full and Bandit Information Online Learning under Full and Bandit Information Artem Sokolov Computerlinguistik Universität Heidelberg 1 Motivation 2 Adversarial Online Learning Hedge EXP3 3 Stochastic Bandits ε-greedy UCB Real world

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Computational Learning Theory. Definitions

Computational Learning Theory. Definitions Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012 CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU10701 11. Learning Theory Barnabás Póczos Learning Theory We have explored many ways of learning from data But How good is our classifier, really? How much data do we

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

1 Review of the Perceptron Algorithm

1 Review of the Perceptron Algorithm COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs CS26: A Second Course in Algorithms Lecture #2: Applications of Multiplicative Weights to Games and Linear Programs Tim Roughgarden February, 206 Extensions of the Multiplicative Weights Guarantee Last

More information

CS260: Machine Learning Theory Lecture 12: No Regret and the Minimax Theorem of Game Theory November 2, 2011

CS260: Machine Learning Theory Lecture 12: No Regret and the Minimax Theorem of Game Theory November 2, 2011 CS260: Machine Learning heory Lecture 2: No Regret and the Minimax heorem of Game heory November 2, 20 Lecturer: Jennifer Wortman Vaughan Regret Bound for Randomized Weighted Majority In the last class,

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo! Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training

More information

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d CS161, Lecture 4 Median, Selection, and the Substitution Method Scribe: Albert Chen and Juliana Cook (2015), Sam Kim (2016), Gregory Valiant (2017) Date: January 23, 2017 1 Introduction Last lecture, we

More information

CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010

CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010 CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010 Computational complexity studies the amount of resources necessary to perform given computations.

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality Tim Roughgarden December 3, 2014 1 Preamble This lecture closes the loop on the course s circle of

More information

Online Convex Optimization Using Predictions

Online Convex Optimization Using Predictions Online Convex Optimization Using Predictions Niangjun Chen Joint work with Anish Agarwal, Lachlan Andrew, Siddharth Barman, and Adam Wierman 1 c " c " (x " ) F x " 2 c ) c ) x ) F x " x ) β x ) x " 3 F

More information

Generalization bounds

Generalization bounds Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical

More information

Hybrid Machine Learning Algorithms

Hybrid Machine Learning Algorithms Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline

More information

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity; CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7 Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector

More information

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015

Foundations of Machine Learning and Data Science. Lecturer: Avrim Blum Lecture 9: October 7, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Avrim Blum Lecture 9: October 7, 2015 1 Computational Hardness of Learning Today we will talk about some computational hardness results

More information

Online Sparse Linear Regression

Online Sparse Linear Regression JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU

More information

Game Theory, On-line prediction and Boosting (Freund, Schapire)

Game Theory, On-line prediction and Boosting (Freund, Schapire) Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Lecture 3: Lower Bounds for Bandit Algorithms

Lecture 3: Lower Bounds for Bandit Algorithms CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

An Algorithms-based Intro to Machine Learning

An Algorithms-based Intro to Machine Learning CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based

More information

2 Upper-bound of Generalization Error of AdaBoost

2 Upper-bound of Generalization Error of AdaBoost COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y

More information

Turing Machine Recap

Turing Machine Recap Turing Machine Recap DFA with (infinite) tape. One move: read, write, move, change state. High-level Points Church-Turing thesis: TMs are the most general computing devices. So far no counter example Every

More information

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015 5-859E: Advanced Algorithms CMU, Spring 205 Lecture #6: Gradient Descent February 8, 205 Lecturer: Anupam Gupta Scribe: Guru Guruganesh In this lecture, we will study the gradient descent algorithm and

More information

Empirical Risk Minimization Algorithms

Empirical Risk Minimization Algorithms Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:

More information

Nondeterministic finite automata

Nondeterministic finite automata Lecture 3 Nondeterministic finite automata This lecture is focused on the nondeterministic finite automata (NFA) model and its relationship to the DFA model. Nondeterminism is an important concept in the

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable

More information

arxiv: v4 [cs.lg] 27 Jan 2016

arxiv: v4 [cs.lg] 27 Jan 2016 The Computational Power of Optimization in Online Learning Elad Hazan Princeton University ehazan@cs.princeton.edu Tomer Koren Technion tomerk@technion.ac.il arxiv:1504.02089v4 [cs.lg] 27 Jan 2016 Abstract

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Adversarial bandits Definition: sequential game. Lower bounds on regret from the stochastic case. Exp3: exponential weights

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

Lecture 9: Generalization

Lecture 9: Generalization Lecture 9: Generalization Roger Grosse 1 Introduction When we train a machine learning model, we don t just want it to learn to model the training data. We want it to generalize to data it hasn t seen

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview

More information

Boosting: Foundations and Algorithms. Rob Schapire

Boosting: Foundations and Algorithms. Rob Schapire Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you

More information

Decision Trees: Overfitting

Decision Trees: Overfitting Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

An analogy from Calculus: limits

An analogy from Calculus: limits COMP 250 Fall 2018 35 - big O Nov. 30, 2018 We have seen several algorithms in the course, and we have loosely characterized their runtimes in terms of the size n of the input. We say that the algorithm

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018 CS17 Integrated Introduction to Computer Science Klein Contents Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018 1 Tree definitions 1 2 Analysis of mergesort using a binary tree 1 3 Analysis of

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix

More information

The Perceptron algorithm

The Perceptron algorithm The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following

More information