Online Learning under Full and Bandit Information

Size: px
Start display at page:

Download "Online Learning under Full and Bandit Information"

Transcription

1 Online Learning under Full and Bandit Information Artem Sokolov Computerlinguistik Universität Heidelberg 1 Motivation 2 Adversarial Online Learning Hedge EXP3 3 Stochastic Bandits ε-greedy UCB

2 Real world online learning tasks advertising (which ad to display) medical treatment (which drug to prescribe) design/functionality rollouts (works or not) spam/malware filtering (filter or keep) stock market (sell or acquire bonds) network routing (which path to take) compression, weather, etc. in every task there is a decision to be made under missing information Online Learning under Weak (Bandit) Information 1 / 48

3 Goals 1 introduce online learning 2 explain distinction between full and partial information tasks 3 introduce the notion of regret 4 present basic algorithms for those cases Online Learning under Weak (Bandit) Information 2 / 48

4 Relation to batch learning Batch learning many i.i.d examples D = {x i, y i } N define some loss l(d) (e.g. negative log-likelihood, square error) learn a model by l(d) min deploy on a test set Online learning one example x t predict ŷ t get feedback suffer some penalty l t (x t, ŷ t ) improve the model repeat note: no training/testing set distinction Online Learning under Weak (Bandit) Information 3 / 48

5 Examples input x t X input space truth y t Y truth space prediction ŷ t P decision space X Y P penalty/loss online regression R d R R y t ŷ t online classification R d {1,..., K} {1,..., K} [y t ŷ t ] expert advice R N R d {1,..., N} y t [ŷ t ] structured prediction K m K m K m m [yi t ŷt i ] Online Learning under Weak (Bandit) Information 4 / 48

6 Why online learning? early days 50-70s: online learning is a requirement first computers, very low memory, very slow CPUs perceptron from 1957 is originally an online algorithm! later 70-90s: batch learning became possible reasonable CPU power, reasonable memory great convergence guarantees! 2000s-now: computers are very powerful, memory is cheap batch algorithms explode memory and time easy access to data made datasets practically infinite discarding data is a bad idea, we want it all! some people say that data acquisition outpaced the Moore s law effectively are back into the 50s Online Learning under Weak (Bandit) Information 5 / 48

7 Why online learning? not only a question of resources: the larger the data, the harder it is to guarantee stationarity cannot to be cramped into a fixed size dataset hence algorithms need to be adaptive and frequent re-training is not an option (because resources...) Online Learning under Weak (Bandit) Information 6 / 48

8 Advantages Online learning one example x t predict ŷ t get feedback suffer some penalty l t (x t, ŷ t ) improve the model repeat small memory footprint faster updates faster adaptation better test performance (in certain sense) Online Learning under Weak (Bandit) Information 7 / 48

9 Domain of online learning environment i.i.d assumption is convenient often cannot be guaranteed or is obviously violated sometimes we assume nothing about distribution: adversarial case feedback full information is best but correct labels are expensive and slow to get often partial feedback is all you have: bandit case structure no state (important but rare case) usually there is some state or context structured spaces (actions change the environment) resources batch learning is costly saving everything is impractical learn from one x and discard Online Learning under Weak (Bandit) Information 8 / 48

10 The space of online learning algorithms environment adversarial expert advice adversarial bandits i.i.d stochastic bandits context no state full bandit feedback reinforcement learning structure [Seldin 15] Online Learning under Weak (Bandit) Information 9 / 48

11 Adversarial Environment with Full Information (just means there are no statistical assumptions) Online Learning under Weak (Bandit) Information 10 / 48

12 Measure of success Online learning protocol 1: for t = 0,... do 2: observe x t (if available) 3: predict ŷ t 4: suffer loss l t (ŷ t ) 5: update l t are arbitrary (e.g., does not mean there are uniformly distributed) could be random or non-random, depend on previous history we want algorithms that work in any case What about the goal? no training set, so cannot minimize loss over training set even if we could, does not always make sense as l t can be anything measure of success has to be calculated w.r.t. to the whole interaction, not just some end objective Online Learning under Weak (Bandit) Information 11 / 48

13 Regret What do we want to achieve? in principle we want to minimize our total loss still not ideal, because l t can scale arbitrary so we need a relative measure e.g., with respect to some fixed (but unknown) arm-pulling strategy h = h t or with respect to the best arm-pulling strategy from a set H note: the larger is H the harder it is the task we measure a cost of ignorance or regret for not following that strategy R T = l t(ŷ t) l t(h t) Our ultimate goal: R T = average regret R T /T 0 as fast as possible l t(ŷ t) min l t(h) h H as the learning goes on, our loss is less and less different from the alternative one Online Learning under Weak (Bandit) Information 12 / 48

14 Adversary restriction Why do we need different definitions of regret? on the one hand, it s a tool to analyze a problem, to test it under different assumptions on the other, w/o any restrictions online learning is too hard (or impossible) need to restrict the power of adversary and vary R T accordingly different definitions than reflect our knowledge about the environment: 1 if we believe that true data is generated by some fixed function h, y t = h (x t ), it s reasonable to minimize R T w.r.t. to that function R T (h ) = l t (w t ) l t (h ) 2 if not, the adversary must not at least change his mind at will, i.e. has to commit to some y t before seeing ŷ t ; then it makes sense to optimize R T w.r.t. to the best function from some set H: R T (H) = l t (w t ) min l t (h) h H Online Learning under Weak (Bandit) Information 13 / 48

15 Adversary restriction What happens if we don t have the commitment requirement? Example: online classification 1: for t = 0,... do 2: observe x t 3: predict ŷ t {0, 1} 4: receive true y t 5: suffer loss l t (ŷ t ) = y t ŷ t 6: update w t+1 R T = l(ŷ t ) min h H l t (h(x t ))R T = y t ŷ t min h H y t h(x t ) take simplest H = {h 0, h 1 }, where h a a (constant function) Exercise 1: can you make the learner always lose? [Shalev-Shwartz 12] wait until ŷ t and set y t = 1 ŷ t Online Learning under Weak (Bandit) Information 14 / 48

16 Algorithm for the realizability case Realizability assumption: h H s.t. t y t = h (x t ). Also H < Consistent 1: Initialize V 0 = H 2: for t = 0,... do 3: observe x t 4: choose any h V t 5: predict ŷ t = h(x t ) 6: receive true y t = h (x t ) 7: update V t+1 = {h V t : h(x t ) = y t } Analysis: t at least one h is removed if there was an error (and none if not) 1 V t H #errors R T = #errors 0 = #errors H 1 can we do better? hint: purge hypotheses faster Online Learning under Weak (Bandit) Information 15 / 48

17 Algorithm for the realizability case Realizability assumption: h H s.t. t y t = h (x t ). Also H < Halving 1: Initialize V 0 = H 2: for t = 0,... do 3: observe x t 4: choose by majority vote h = arg max r {0,1} h V t : h(x t ) = r 5: predict ŷ t = h(x t ) 6: receive true y t = h (x t ) 7: update V t+1 = {h V t : h(x t ) = y t } Analysis: t at least one half of V t is removed if there was an error 1 V t H /2 #errors R T (h ) = #errors log 2 H Online Learning under Weak (Bandit) Information 16 / 48

18 Failure of realizability for infinite H Finiteness of H is crucial Example real line X = (0, 1), thresholds H = { h θ : (0, 1) {0, 1} } h θ (x) = sign(θ x) a sequence of x t, y t generated by some θ on which the Halving will have R T = T Exercise 2: construct such a sequence [Shalev-Shwartz 12] Solution: maintain L t (left) and R t (right) L 0 = 0, R 0 = 1 pick a random x t (L t, R t ) receive ŷ t report y t = 1 ŷ t R t+1 = x t y t + R t ŷ t L t+1 = x t ŷ t + L t y t t R t L t > 0 Online Learning under Weak (Bandit) Information 17 / 48

19 Randomization realizability assumption may be too harsh for our application add an element of surprise to our predictions! remember to require the adversary to commit to y t before seeing ŷ t will change lines 4 and 5 in the Consistent Randomized 1: Initialize V 0 = H 2: for t = 0,... do 3: observe x t 4: choose probability p t 5: predict ŷ t (w t ) = 1 with prob. p t 6: receive true y t = h (x t ) 7: update V t+1 = {h V t : h(x t ) = y t } R T = = E pt [ŷ t y t ] min [h(x t ) y t ] h H p t y t min h(x t ) y t 2T ln H h H note regret changed again proof later Online Learning under Weak (Bandit) Information 18 / 48

20 Summary so far so far we had full information (i.e., we received the true y t ) different adversary restrictions help to get regret bounds realizability + finiteness Consistent R T H 1 Halving R T log 2 H commitment Randomized R T 2T ln H Online Learning under Weak (Bandit) Information 19 / 48

21 Learning with Experts Advice Online Learning under Weak (Bandit) Information 20 / 48

22 Learning with Experts Advice imagine horse-races you know nothing about horses luckily you have knowledgeable friends willing to give you advices apportion a fixed sum of money between them goal: minimize losses / maximize profit Online Learning under Weak (Bandit) Information 21 / 48

23 Experts Advice stateless case (you have friend s identity, but not horses breakfast menu or expert history) N friends loss vector l t [0, 1] N e.g., l t [i] = 0.3 if ith friend lost 30 cents prediction p t [0, 1] N, N p t[i] = 1 your distribution of money loss N p t[i]l t [i] = p t, l t goal R T = p t, l t min l t [i] min,...,n }{{} loss of the best friend note: you don t know how good your friends are note: horses/friends can conspire against you but in the limit you can do as good as the best friend in hindsight! (in terms of average loss per race) Online Learning under Weak (Bandit) Information 22 / 48

24 Hedge algorithm if one of the friends is perfect can get log 2 N mistakes with Halving but making a mistake does not necessarily mean we should disqualify a friend Hedge 1: init vector w 1 R N + s.t. N w 1[i] = 1, learning rate µ > 0 2: for t = 1,... do 3: compute p t = w t N wt[i] 4: receive loss l t 5: update w t+1 [i] = w t [i]e µlt[i] soft disqualification Theorem For any l 1,..., l T and any i {1,..., N} R T = p t, l t min j l t [j] 2T ln N + ln N Online Learning under Weak (Bandit) Information 23 / 48

25 Hedge Analysis We will upper- and lower-bounds the quantity N w t+1[i] Upper bound N N w t+1[i] = w t[i]e µl t[i] ln ln N w t[i](1 (1 e µ )l t[i]) ( N ) = w t[i] (1 (1 e µ ) p t, l t ) use e αx 1 (1 e α )x N ( N ) w t+1[i] ln w t[i] + ln ( 1 (1 e µ ) p t, l ) t use ln(1 x) x ( N ) ln w t[i] (1 e µ ) p t, l t N ( N ) w T +1[i] ln w 1[i] (1 e µ ) p t, l t telescope Online Learning under Weak (Bandit) Information 24 / 48

26 Hedge Analysis Lower bound for any j 1,..., N N w t+1 [i] w t+1 [j] w 1 [j]e µ t s=1 ls[j] Combining =ln(1)=0 {}}{ N N ln w 1 [j] µ l t [j] ln w T +1 [i] ln( w 1 [i]) (1 e µ ) p t, l t rearranging p t, l t ln w 1[j] + µ T l t[j] 1 e µ remember j was arbitrary, and let w 1 [i] = 1/N p t, l t ln N + µmin T j l t[j] 1 e µ Online Learning under Weak (Bandit) Information 25 / 48

27 Choice of µ suppose we can upper bound the minimal loss (we are sure about at least one of our friends) min j l t [j] L ( ) 2 ln N if we set µ = ln 1 + L then p t, l t ln N + µ min T j l t[j] 1 e µ min j after rearranging we get a regret bound: p t, l t min j l t [j] + 2 L ln N + ln N l t [j] 2 L ln N + ln N Online Learning under Weak (Bandit) Information 26 / 48

28 Regret bound trivially: L T (yes, very loose bound) p t, l t min j l t [j] 2T ln N + ln N much better if L T (e.g., there is a friend that almost never errs, L 0) p t, l t min j l t [j] ln N not surprisingly looks similar to the Halving bound (realizability & finiteness hold) Online Learning under Weak (Bandit) Information 27 / 48

29 Exercise Hedge 1: init vector w 1 R N + s.t. N w 1 [i] = 1, learning rate µ > 0 2: for t = 1,... do 3: compute p t = w t N w t [i] 4: receive loss l t 5: update w t+1 [i] = w t [i]e µl t [i] Exercise 3: [Marchetti-Spaccamela 11] 3 experts: 1st playing always Rock, 2nd Scissors, and 3rd Paper your opponent plays Rock T/3 times, then Scissors T/3 times and then Paper T/3 times 0 T/3 2T/3 T Rock Scissors Paper loss: -1 if won, +1 if lost, 0 if tie describe roughly 1) the most probable strategies played by Hedge, 2) when they switch and 3) the final distribution Online Learning under Weak (Bandit) Information 28 / 48

30 Relation to batch learning Hedge inspired Boosting a powerful concept of combining weak algorithms into a strong one idea: treat your training examples as experts changing weights focuses attention on difficult examples Gödel Prize 2003 Online Learning under Weak (Bandit) Information 29 / 48

31 Adversarial Multi-Armed Bandits (how to use only one friend at a time in horse races) Online Learning under Weak (Bandit) Information 30 / 48

32 Why the name? One-armed bandits you are in a casino with slot-machines ( one-armed bandits ) you have to find a machine that gives you most money can try one machine per time on the one hand, you should play the best machine so far exploitation on the other hand, there might be better machines, not yet tried exploration Considered by Allied scientists in WW2, it proved so intractable that the problem was proposed to be dropped over Germany so that German scientists could also waste their time on it [Peter 79] Online Learning under Weak (Bandit) Information 31 / 48

33 Adversarial Multi-Armed Bandits similar setup to expert s advice you can to bid on only one of the friends (experts) at a time consequently, no full loss vector l t is received (you don t know how much lost your other friends) you get only the loss l t [i t ] of the chosen friend i t Online Learning under Weak (Bandit) Information 32 / 48

34 Hedge vs Exp3 Slight change to the Hedge algorithm: Hedge 1: init w 1 s.t. N w1[i] = 1, µ > 0 2: for t = 1,... do 3: play all friends p t = w t N wt[i] Exp3 1: init w 1[i] = 1, γ (0, 1] 2: for t = 1,... do 3: draw a friend i t acc. to w t p t[i] = (1 γ) N wt[i] + γ N 4: receive l t 5: w t+1[i] = w t[i]e µl t[i] 4: receive l t[i t] 5: {w w t+1[i] = t[i]e γ l t [i] p t [i], w t[i], if i = i t else random surprise actions added Exponential-weight algorithm for Exploration and Exploitation Exp3 Online Learning under Weak (Bandit) Information 33 / 48

35 Connection to the importance sampling Exp3 1: init w 1[i] = 1, γ (0, 1] 2: for t = 1,... do 3: draw a friend i t acc. to 4: receive l t[i t] {w t[i]e γ l t [i] 5: w t+1[i] = w t[i], 6: w t+1[i] = w t[i]e γ l t [i] w t p t[i] = (1 γ) N wt[i] + γ N p t [i], if i = i t else { denote l l t [i]/p t [i] if i = i t t [i] = 0 otherwise then E [ lt [i] i 1,..., i t 1 ] = pt [i] l t[i] p t [i] + (1 p t[i]) 0 = l t [i] Online this Learning a recurring under Weak (Bandit) theme: Information unbiased estimates of losses 34 / 48

36 Exp3 Regret Theorem For any γ (0, 1], N > 0 and any sequence of l 1,..., l T E [ T l t [i t ] ] min i l t [i] 2 e 1 T N ln N Comparison to the full-information Hedge: O( T ln N) Exp3: O( T N ln N) the price of bandit info Online Learning under Weak (Bandit) Information 35 / 48

37 Proof very similar proof to Hedge see appendix Exercise 4 (optional): explain why is the exploration necessary? point where the proof will fail if N p t [i] = w t [i]/ w t [i] Online Learning under Weak (Bandit) Information 36 / 48

38 Stochastic Bandits Online Learning under Weak (Bandit) Information 37 / 48

39 Stochastic Bandits restrict somewhat the adversarial setting as we have seen, restrictions lead to nicer regret (possibly under a different definition) N arms as before this time arm s loss l t [i] is sampled i.i.d. from D i (unknown and fixed) l t [i] and l t [j] are independent for i j mean loss µ i = E[l t [i]] µ = min i µ i i = arg min i µ i suffered loss l t [i t ] note: D i,t s may change, keeping µ i,t fixed Online Learning under Weak (Bandit) Information 38 / 48

40 Stochastic Bandits mean loss µ i = E[l t [i]] µ = min i µ i n i (T ) number of pulls of i th arm over first T plays Goal: now it makes sense to speak of expected regret regret: R T = E [ T l t [i t ] ] min E [ T l t [i] ] i = E [ T = = N l t [i][i = i t ] ] min T µ i i N µ i E [ T [i = i t ] ] T µ N µ i E[n i (T )] T µ min Online Learning under Weak (Bandit) Information 39 / 48

41 Stochastic Bandits Gaps: i = µ i µ 0 j i µ 0 µ µ j µ i R T = N µ i E[n i (T )] T µ = N i E[n i (T )] Next Theorem(s) R T C N i=2 ln T i + o(ln T ) smaller gaps bigger regret intuition: takes more time to distinguish between close arms compare to Exp3: O( NT ln N) Online Learning under Weak (Bandit) Information 40 / 48

42 Regret bound R T C N i=2 ln T i + o(ln T ) Why 1 i? Intuitive example: imagine Bernoulli losses with p i mean estimates µ i = 1 T T l t[i] have variance σ 2 = p i (1 p i )/T if T 1 α i then σ 2 α i p i(1 p i ) if α 1, hard to say if there is a real difference between µ and µ i or it s just variance so we need rather α 2 as on every pull we loose about i, for α 2 we have i 1 α i 1 i Online Learning under Weak (Bandit) Information 41 / 48

43 ɛ-greedy Simplest strategy: ɛ-greedy 1: ɛ > 0, µ i = 0 empirical means of rewards 2: for t = 1,... do 3: with prob. 1 ɛ play current best arm i t = arg min i µ i 4: with prob. ɛ play a random arm 5: receive l t [i t ] 6: update empirical means ( µ it = µ i t n it +l t[i t] n it +1 ) Regret: because of the constant ɛ, R T ɛt need to let ɛ to to zero at a certain rate Online Learning under Weak (Bandit) Information 42 / 48

44 ɛ t -decreasing Modified simplest strategy: ɛ t -decreasing 1: c > 0, δ > 0, µ i = 0 2: for t = 1,... do 3: 4: ɛ t = min{1, cn δ 2 t } with prob. 1 ɛ t play current best arm i t = arg min i µ i 5: with prob. ɛ t play a random arm 6: receive l t [i t ] 7: update means Theorem If 0 < δ min i:µi <µ i < 1 and T cn δ P {i t i } c δ 2 t + o( 1) t R T = O(ln T ) Online Learning under Weak (Bandit) Information 43 / 48

45 UCB (upper confidence bound) UCB-style strategies: pull arms and maintain empirical averages of their losses calculate also confidence intervals (a region around our estimate so that the true value is within with high prob.) μ-u μ μ+u repeated plays shrink the confidence bound, the average is becoming more reliable now stick to the principle: optimism in the face of uncertainty play the arm whose mean loss combined with confidence bound promises the least loss eventually the most optimistic arm will change because either that is really better or it wasn t sampled often enough deterministic algorithm unlike ɛ-greedy Online Learning under Weak (Bandit) Information 44 / 48

46 UCB UCB 1: play each arm once 2: for t = 1,... do 3: play the arm i t = arg min i ( µ i 2 ln t n i (t) 4: receive l t [i t ] 5: update averages ) Theorem 1 For N > 0, T > 0 and arbitrary distributions D i,t with fixed means µ i R T = [8 ln T ] + (1 + π2 N i:µ i >µ i 3 ) i Online Learning under Weak (Bandit) Information 45 / 48

47 Lessons learned lot s of applications where traditional batch learning may fail goals are formulated in terms of various regrets the slower the upper bound on regret grows with T the better full information realizability Consistent R T H 1 Halving R T log 2 H commitment + surprise Randomized R T 2T ln H Hedge R T = O( T ln N) adversarial bandits EXP3 R T = O( NT ln N) stochastic bandits ɛ-decreasing R T = O(ln T/δ) UCB R T = O(ln T/ 2 ) Online Learning under Weak (Bandit) Information 46 / 48

48 Conclusion barely scratched the surface with most important algorithms advanced algorithms use EXP3, UCB as building blocks Online Learning under Weak (Bandit) Information 47 / 48

49 Recommended Reading Material for this lecture 1 S. Shalev-Shwartz. Online Learning and Online Convex Optimization, Y. Seldin. The Space of Online Learning Algorithms, P. Auer et al. The nonstochastic multiarmed bandit problem, P. Auer et al. Finite-time Analysis of the Multiarmed Bandit Problem, 2002 Advanced Contextual Algorithms 1 EXP4: P. Auer et al. The nonstochastic multiarmed bandit problem, LinUCB: Li et al. A contextual-bandit approach to personalized news article recommendation, Epoch-greedy: J. Langford The epoch-greedy algorithm for multi-armed bandits with side information, 2003 Online Learning under Weak (Bandit) Information 48 / 48

50 Appendix Online Learning under Weak (Bandit) Information 1 / 5

51 Analysis of EXP3 similar to the Hedge algorithm idea: lower- and upper-bound N w t+1[i]/ N w t[i] { denote ˆl l t [i]/p t [i] if i = i t t [i] = 0 otherwise Online Learning under Weak (Bandit) Information 2 / 5

52 Analysis of EXP3 Upper bound N wt+1[i] N wt[i] = = N N N 1 1 w t[i]e γ ˆl t [i] N (use w t+1 [i] definition; note the bound ˆl t [i] γ N ) p t[i] γ γ N ˆl t [i] 1 γ e N (using e x 1 + x + (e 2)x 2 for x 1) p t[i] γ N 1 γ γ N 1 γ N [ 1 γ ˆl t[i] N + (e 2)( γ ˆl t[i] ) ] 2 N p t[i]ˆl t[i] + (e 2)( γ N )2 (1 γ) γ N 1 γ lt[it] + (e 2)( γ N )2 (1 γ) N ln wt+1[i] γ N N wt[i] 1 γ lt[it] + (e 2)( γ N )2 (1 γ) N p t[i](ˆl t[i]) 2 N ˆl t[i] (use ln(1 + x) x) N ˆl t[i] Online Learning under Weak (Bandit) Information 3 / 5

53 Analysis of EXP3 Upper bound (cont.) sum over t = 1,..., T N ln w γ T +1[i] N N w l t [i t ] + (e 2)( γ N )2 T [i] 1 γ (1 γ) N ˆl t [i] Lower bound for any j = 1,..., N N ln w T +1[i] N w ln w T +1[j] N T [i] w T [i] = γ N ˆl t [j] ln N Combining (and simplifying) l t [i t ] (1 γ) ˆl t [j] + N ln N γ + (e 2) γ N N ˆl t [i] Online Learning under Weak (Bandit) Information 4 / 5

54 Analysis of EXP3 expectation to rescue: E [ˆlt [i] i 1,..., i t 1 ] = pt [i] l t[i] p t [i] + (1 p t[i]) 0] = l t [i] Take expectation E [ T l ] t[i t] (1 γ) l t[j] + N ln N γ + (e 2) γ N since j was arbitrary and by assumption N l t[i] (1 γ) min l j t[j] + N ln N + (e 2) γ γ N N L choose γ = min { N ln N } 1, (e 1) L 2 e 1 LN ln N N l t[i] N L Online Learning under Weak (Bandit) Information 5 / 5

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Evaluation of multi armed bandit algorithms and empirical algorithm

Evaluation of multi armed bandit algorithms and empirical algorithm Acta Technica 62, No. 2B/2017, 639 656 c 2017 Institute of Thermomechanics CAS, v.v.i. Evaluation of multi armed bandit algorithms and empirical algorithm Zhang Hong 2,3, Cao Xiushan 1, Pu Qiumei 1,4 Abstract.

More information

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 22 Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 How to balance exploration and exploitation in reinforcement

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview

More information

Hybrid Machine Learning Algorithms

Hybrid Machine Learning Algorithms Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised

More information

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

0.1 Motivating example: weighted majority algorithm

0.1 Motivating example: weighted majority algorithm princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes

More information

An Algorithms-based Intro to Machine Learning

An Algorithms-based Intro to Machine Learning CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Large-scale Information Processing, Summer Recommender Systems (part 2)

Large-scale Information Processing, Summer Recommender Systems (part 2) Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a

More information

Lecture 2: Learning from Evaluative Feedback. or Bandit Problems

Lecture 2: Learning from Evaluative Feedback. or Bandit Problems Lecture 2: Learning from Evaluative Feedback or Bandit Problems 1 Edward L. Thorndike (1874-1949) Puzzle Box 2 Learning by Trial-and-Error Law of Effect: Of several responses to the same situation, those

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

The Multi-Armed Bandit Problem

The Multi-Armed Bandit Problem Università degli Studi di Milano The bandit problem [Robbins, 1952]... K slot machines Rewards X i,1, X i,2,... of machine i are i.i.d. [0, 1]-valued random variables An allocation policy prescribes which

More information

Reducing contextual bandits to supervised learning

Reducing contextual bandits to supervised learning Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

Exploration. 2015/10/12 John Schulman

Exploration. 2015/10/12 John Schulman Exploration 2015/10/12 John Schulman What is the exploration problem? Given a long-lived agent (or long-running learning algorithm), how to balance exploration and exploitation to maximize long-term rewards

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Online Learning: Bandit Setting

Online Learning: Bandit Setting Online Learning: Bandit Setting Daniel asabi Summer 04 Last Update: October 0, 06 Introduction [TODO Bandits. Stocastic setting Suppose tere exists unknown distributions ν,..., ν, suc tat te loss at eac

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

Learning with Exploration

Learning with Exploration Learning with Exploration John Langford (Yahoo!) { With help from many } Austin, March 24, 2011 Yahoo! wants to interactively choose content and use the observed feedback to improve future content choices.

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Sparse Linear Contextual Bandits via Relevance Vector Machines

Sparse Linear Contextual Bandits via Relevance Vector Machines Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,

More information

Introduction to Computer Science and Programming for Astronomers

Introduction to Computer Science and Programming for Astronomers Introduction to Computer Science and Programming for Astronomers Lecture 8. István Szapudi Institute for Astronomy University of Hawaii March 7, 2018 Outline Reminder 1 Reminder 2 3 4 Reminder We have

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht

PAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable

More information

Practical Agnostic Active Learning

Practical Agnostic Active Learning Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory

More information

Notes from Week 8: Multi-Armed Bandit Problems

Notes from Week 8: Multi-Armed Bandit Problems CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 8: Multi-Armed Bandit Problems Instructor: Robert Kleinberg 2-6 Mar 2007 The multi-armed bandit problem The multi-armed bandit

More information

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo! Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training

More information

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Minimax risk bounds for linear threshold functions

Minimax risk bounds for linear threshold functions CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability

More information

Algorithms, Games, and Networks January 17, Lecture 2

Algorithms, Games, and Networks January 17, Lecture 2 Algorithms, Games, and Networks January 17, 2013 Lecturer: Avrim Blum Lecture 2 Scribe: Aleksandr Kazachkov 1 Readings for today s lecture Today s topic is online learning, regret minimization, and minimax

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Learning Theory: Basic Guarantees

Learning Theory: Basic Guarantees Learning Theory: Basic Guarantees Daniel Khashabi Fall 2014 Last Update: April 28, 2016 1 Introduction The Learning Theory 1, is somewhat least practical part of Machine Learning, which is most about the

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012 CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

Efficient and Principled Online Classification Algorithms for Lifelon

Efficient and Principled Online Classification Algorithms for Lifelon Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Lecture 6: Non-stochastic best arm identification

Lecture 6: Non-stochastic best arm identification CSE599i: Online and Adaptive Machine Learning Winter 08 Lecturer: Kevin Jamieson Lecture 6: Non-stochastic best arm identification Scribes: Anran Wang, eibin Li, rian Chan, Shiqing Yu, Zhijin Zhou Disclaimer:

More information

THE first formalization of the multi-armed bandit problem

THE first formalization of the multi-armed bandit problem EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

More information

Edward L. Thorndike #1874$1949% Lecture 2: Learning from Evaluative Feedback. or!bandit Problems" Learning by Trial$and$Error.

Edward L. Thorndike #1874$1949% Lecture 2: Learning from Evaluative Feedback. or!bandit Problems Learning by Trial$and$Error. Lecture 2: Learning from Evaluative Feedback Edward L. Thorndike #1874$1949% or!bandit Problems" Puzzle Box 1 2 Learning by Trial$and$Error Law of E&ect:!Of several responses to the same situation, those

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good

More information

The Multi-Armed Bandit Problem

The Multi-Armed Bandit Problem The Multi-Armed Bandit Problem Electrical and Computer Engineering December 7, 2013 Outline 1 2 Mathematical 3 Algorithm Upper Confidence Bound Algorithm A/B Testing Exploration vs. Exploitation Scientist

More information

Stochastic bandits: Explore-First and UCB

Stochastic bandits: Explore-First and UCB CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

Bandit Algorithms. Tor Lattimore & Csaba Szepesvári

Bandit Algorithms. Tor Lattimore & Csaba Szepesvári Bandit Algorithms Tor Lattimore & Csaba Szepesvári Bandits Time 1 2 3 4 5 6 7 8 9 10 11 12 Left arm $1 $0 $1 $1 $0 Right arm $1 $0 Five rounds to go. Which arm would you play next? Overview What are bandits,

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

Online Prediction: Bayes versus Experts

Online Prediction: Bayes versus Experts Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,

More information

Web-Mining Agents Computational Learning Theory

Web-Mining Agents Computational Learning Theory Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Empirical Risk Minimization Algorithms

Empirical Risk Minimization Algorithms Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 23: Perceptrons 11/20/2008 Dan Klein UC Berkeley 1 General Naïve Bayes A general naive Bayes model: C E 1 E 2 E n We only specify how each feature depends

More information

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering CS 188: Artificial Intelligence Fall 2008 General Naïve Bayes A general naive Bayes model: C Lecture 23: Perceptrons 11/20/2008 E 1 E 2 E n Dan Klein UC Berkeley We only specify how each feature depends

More information

2 Upper-bound of Generalization Error of AdaBoost

2 Upper-bound of Generalization Error of AdaBoost COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y

More information

On the tradeoff between computational complexity and sample complexity in learning

On the tradeoff between computational complexity and sample complexity in learning On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Least Mean Squares Regression. Machine Learning Fall 2018

Least Mean Squares Regression. Machine Learning Fall 2018 Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how

More information