Online Learning under Full and Bandit Information
|
|
- Andrea Simpson
- 5 years ago
- Views:
Transcription
1 Online Learning under Full and Bandit Information Artem Sokolov Computerlinguistik Universität Heidelberg 1 Motivation 2 Adversarial Online Learning Hedge EXP3 3 Stochastic Bandits ε-greedy UCB
2 Real world online learning tasks advertising (which ad to display) medical treatment (which drug to prescribe) design/functionality rollouts (works or not) spam/malware filtering (filter or keep) stock market (sell or acquire bonds) network routing (which path to take) compression, weather, etc. in every task there is a decision to be made under missing information Online Learning under Weak (Bandit) Information 1 / 48
3 Goals 1 introduce online learning 2 explain distinction between full and partial information tasks 3 introduce the notion of regret 4 present basic algorithms for those cases Online Learning under Weak (Bandit) Information 2 / 48
4 Relation to batch learning Batch learning many i.i.d examples D = {x i, y i } N define some loss l(d) (e.g. negative log-likelihood, square error) learn a model by l(d) min deploy on a test set Online learning one example x t predict ŷ t get feedback suffer some penalty l t (x t, ŷ t ) improve the model repeat note: no training/testing set distinction Online Learning under Weak (Bandit) Information 3 / 48
5 Examples input x t X input space truth y t Y truth space prediction ŷ t P decision space X Y P penalty/loss online regression R d R R y t ŷ t online classification R d {1,..., K} {1,..., K} [y t ŷ t ] expert advice R N R d {1,..., N} y t [ŷ t ] structured prediction K m K m K m m [yi t ŷt i ] Online Learning under Weak (Bandit) Information 4 / 48
6 Why online learning? early days 50-70s: online learning is a requirement first computers, very low memory, very slow CPUs perceptron from 1957 is originally an online algorithm! later 70-90s: batch learning became possible reasonable CPU power, reasonable memory great convergence guarantees! 2000s-now: computers are very powerful, memory is cheap batch algorithms explode memory and time easy access to data made datasets practically infinite discarding data is a bad idea, we want it all! some people say that data acquisition outpaced the Moore s law effectively are back into the 50s Online Learning under Weak (Bandit) Information 5 / 48
7 Why online learning? not only a question of resources: the larger the data, the harder it is to guarantee stationarity cannot to be cramped into a fixed size dataset hence algorithms need to be adaptive and frequent re-training is not an option (because resources...) Online Learning under Weak (Bandit) Information 6 / 48
8 Advantages Online learning one example x t predict ŷ t get feedback suffer some penalty l t (x t, ŷ t ) improve the model repeat small memory footprint faster updates faster adaptation better test performance (in certain sense) Online Learning under Weak (Bandit) Information 7 / 48
9 Domain of online learning environment i.i.d assumption is convenient often cannot be guaranteed or is obviously violated sometimes we assume nothing about distribution: adversarial case feedback full information is best but correct labels are expensive and slow to get often partial feedback is all you have: bandit case structure no state (important but rare case) usually there is some state or context structured spaces (actions change the environment) resources batch learning is costly saving everything is impractical learn from one x and discard Online Learning under Weak (Bandit) Information 8 / 48
10 The space of online learning algorithms environment adversarial expert advice adversarial bandits i.i.d stochastic bandits context no state full bandit feedback reinforcement learning structure [Seldin 15] Online Learning under Weak (Bandit) Information 9 / 48
11 Adversarial Environment with Full Information (just means there are no statistical assumptions) Online Learning under Weak (Bandit) Information 10 / 48
12 Measure of success Online learning protocol 1: for t = 0,... do 2: observe x t (if available) 3: predict ŷ t 4: suffer loss l t (ŷ t ) 5: update l t are arbitrary (e.g., does not mean there are uniformly distributed) could be random or non-random, depend on previous history we want algorithms that work in any case What about the goal? no training set, so cannot minimize loss over training set even if we could, does not always make sense as l t can be anything measure of success has to be calculated w.r.t. to the whole interaction, not just some end objective Online Learning under Weak (Bandit) Information 11 / 48
13 Regret What do we want to achieve? in principle we want to minimize our total loss still not ideal, because l t can scale arbitrary so we need a relative measure e.g., with respect to some fixed (but unknown) arm-pulling strategy h = h t or with respect to the best arm-pulling strategy from a set H note: the larger is H the harder it is the task we measure a cost of ignorance or regret for not following that strategy R T = l t(ŷ t) l t(h t) Our ultimate goal: R T = average regret R T /T 0 as fast as possible l t(ŷ t) min l t(h) h H as the learning goes on, our loss is less and less different from the alternative one Online Learning under Weak (Bandit) Information 12 / 48
14 Adversary restriction Why do we need different definitions of regret? on the one hand, it s a tool to analyze a problem, to test it under different assumptions on the other, w/o any restrictions online learning is too hard (or impossible) need to restrict the power of adversary and vary R T accordingly different definitions than reflect our knowledge about the environment: 1 if we believe that true data is generated by some fixed function h, y t = h (x t ), it s reasonable to minimize R T w.r.t. to that function R T (h ) = l t (w t ) l t (h ) 2 if not, the adversary must not at least change his mind at will, i.e. has to commit to some y t before seeing ŷ t ; then it makes sense to optimize R T w.r.t. to the best function from some set H: R T (H) = l t (w t ) min l t (h) h H Online Learning under Weak (Bandit) Information 13 / 48
15 Adversary restriction What happens if we don t have the commitment requirement? Example: online classification 1: for t = 0,... do 2: observe x t 3: predict ŷ t {0, 1} 4: receive true y t 5: suffer loss l t (ŷ t ) = y t ŷ t 6: update w t+1 R T = l(ŷ t ) min h H l t (h(x t ))R T = y t ŷ t min h H y t h(x t ) take simplest H = {h 0, h 1 }, where h a a (constant function) Exercise 1: can you make the learner always lose? [Shalev-Shwartz 12] wait until ŷ t and set y t = 1 ŷ t Online Learning under Weak (Bandit) Information 14 / 48
16 Algorithm for the realizability case Realizability assumption: h H s.t. t y t = h (x t ). Also H < Consistent 1: Initialize V 0 = H 2: for t = 0,... do 3: observe x t 4: choose any h V t 5: predict ŷ t = h(x t ) 6: receive true y t = h (x t ) 7: update V t+1 = {h V t : h(x t ) = y t } Analysis: t at least one h is removed if there was an error (and none if not) 1 V t H #errors R T = #errors 0 = #errors H 1 can we do better? hint: purge hypotheses faster Online Learning under Weak (Bandit) Information 15 / 48
17 Algorithm for the realizability case Realizability assumption: h H s.t. t y t = h (x t ). Also H < Halving 1: Initialize V 0 = H 2: for t = 0,... do 3: observe x t 4: choose by majority vote h = arg max r {0,1} h V t : h(x t ) = r 5: predict ŷ t = h(x t ) 6: receive true y t = h (x t ) 7: update V t+1 = {h V t : h(x t ) = y t } Analysis: t at least one half of V t is removed if there was an error 1 V t H /2 #errors R T (h ) = #errors log 2 H Online Learning under Weak (Bandit) Information 16 / 48
18 Failure of realizability for infinite H Finiteness of H is crucial Example real line X = (0, 1), thresholds H = { h θ : (0, 1) {0, 1} } h θ (x) = sign(θ x) a sequence of x t, y t generated by some θ on which the Halving will have R T = T Exercise 2: construct such a sequence [Shalev-Shwartz 12] Solution: maintain L t (left) and R t (right) L 0 = 0, R 0 = 1 pick a random x t (L t, R t ) receive ŷ t report y t = 1 ŷ t R t+1 = x t y t + R t ŷ t L t+1 = x t ŷ t + L t y t t R t L t > 0 Online Learning under Weak (Bandit) Information 17 / 48
19 Randomization realizability assumption may be too harsh for our application add an element of surprise to our predictions! remember to require the adversary to commit to y t before seeing ŷ t will change lines 4 and 5 in the Consistent Randomized 1: Initialize V 0 = H 2: for t = 0,... do 3: observe x t 4: choose probability p t 5: predict ŷ t (w t ) = 1 with prob. p t 6: receive true y t = h (x t ) 7: update V t+1 = {h V t : h(x t ) = y t } R T = = E pt [ŷ t y t ] min [h(x t ) y t ] h H p t y t min h(x t ) y t 2T ln H h H note regret changed again proof later Online Learning under Weak (Bandit) Information 18 / 48
20 Summary so far so far we had full information (i.e., we received the true y t ) different adversary restrictions help to get regret bounds realizability + finiteness Consistent R T H 1 Halving R T log 2 H commitment Randomized R T 2T ln H Online Learning under Weak (Bandit) Information 19 / 48
21 Learning with Experts Advice Online Learning under Weak (Bandit) Information 20 / 48
22 Learning with Experts Advice imagine horse-races you know nothing about horses luckily you have knowledgeable friends willing to give you advices apportion a fixed sum of money between them goal: minimize losses / maximize profit Online Learning under Weak (Bandit) Information 21 / 48
23 Experts Advice stateless case (you have friend s identity, but not horses breakfast menu or expert history) N friends loss vector l t [0, 1] N e.g., l t [i] = 0.3 if ith friend lost 30 cents prediction p t [0, 1] N, N p t[i] = 1 your distribution of money loss N p t[i]l t [i] = p t, l t goal R T = p t, l t min l t [i] min,...,n }{{} loss of the best friend note: you don t know how good your friends are note: horses/friends can conspire against you but in the limit you can do as good as the best friend in hindsight! (in terms of average loss per race) Online Learning under Weak (Bandit) Information 22 / 48
24 Hedge algorithm if one of the friends is perfect can get log 2 N mistakes with Halving but making a mistake does not necessarily mean we should disqualify a friend Hedge 1: init vector w 1 R N + s.t. N w 1[i] = 1, learning rate µ > 0 2: for t = 1,... do 3: compute p t = w t N wt[i] 4: receive loss l t 5: update w t+1 [i] = w t [i]e µlt[i] soft disqualification Theorem For any l 1,..., l T and any i {1,..., N} R T = p t, l t min j l t [j] 2T ln N + ln N Online Learning under Weak (Bandit) Information 23 / 48
25 Hedge Analysis We will upper- and lower-bounds the quantity N w t+1[i] Upper bound N N w t+1[i] = w t[i]e µl t[i] ln ln N w t[i](1 (1 e µ )l t[i]) ( N ) = w t[i] (1 (1 e µ ) p t, l t ) use e αx 1 (1 e α )x N ( N ) w t+1[i] ln w t[i] + ln ( 1 (1 e µ ) p t, l ) t use ln(1 x) x ( N ) ln w t[i] (1 e µ ) p t, l t N ( N ) w T +1[i] ln w 1[i] (1 e µ ) p t, l t telescope Online Learning under Weak (Bandit) Information 24 / 48
26 Hedge Analysis Lower bound for any j 1,..., N N w t+1 [i] w t+1 [j] w 1 [j]e µ t s=1 ls[j] Combining =ln(1)=0 {}}{ N N ln w 1 [j] µ l t [j] ln w T +1 [i] ln( w 1 [i]) (1 e µ ) p t, l t rearranging p t, l t ln w 1[j] + µ T l t[j] 1 e µ remember j was arbitrary, and let w 1 [i] = 1/N p t, l t ln N + µmin T j l t[j] 1 e µ Online Learning under Weak (Bandit) Information 25 / 48
27 Choice of µ suppose we can upper bound the minimal loss (we are sure about at least one of our friends) min j l t [j] L ( ) 2 ln N if we set µ = ln 1 + L then p t, l t ln N + µ min T j l t[j] 1 e µ min j after rearranging we get a regret bound: p t, l t min j l t [j] + 2 L ln N + ln N l t [j] 2 L ln N + ln N Online Learning under Weak (Bandit) Information 26 / 48
28 Regret bound trivially: L T (yes, very loose bound) p t, l t min j l t [j] 2T ln N + ln N much better if L T (e.g., there is a friend that almost never errs, L 0) p t, l t min j l t [j] ln N not surprisingly looks similar to the Halving bound (realizability & finiteness hold) Online Learning under Weak (Bandit) Information 27 / 48
29 Exercise Hedge 1: init vector w 1 R N + s.t. N w 1 [i] = 1, learning rate µ > 0 2: for t = 1,... do 3: compute p t = w t N w t [i] 4: receive loss l t 5: update w t+1 [i] = w t [i]e µl t [i] Exercise 3: [Marchetti-Spaccamela 11] 3 experts: 1st playing always Rock, 2nd Scissors, and 3rd Paper your opponent plays Rock T/3 times, then Scissors T/3 times and then Paper T/3 times 0 T/3 2T/3 T Rock Scissors Paper loss: -1 if won, +1 if lost, 0 if tie describe roughly 1) the most probable strategies played by Hedge, 2) when they switch and 3) the final distribution Online Learning under Weak (Bandit) Information 28 / 48
30 Relation to batch learning Hedge inspired Boosting a powerful concept of combining weak algorithms into a strong one idea: treat your training examples as experts changing weights focuses attention on difficult examples Gödel Prize 2003 Online Learning under Weak (Bandit) Information 29 / 48
31 Adversarial Multi-Armed Bandits (how to use only one friend at a time in horse races) Online Learning under Weak (Bandit) Information 30 / 48
32 Why the name? One-armed bandits you are in a casino with slot-machines ( one-armed bandits ) you have to find a machine that gives you most money can try one machine per time on the one hand, you should play the best machine so far exploitation on the other hand, there might be better machines, not yet tried exploration Considered by Allied scientists in WW2, it proved so intractable that the problem was proposed to be dropped over Germany so that German scientists could also waste their time on it [Peter 79] Online Learning under Weak (Bandit) Information 31 / 48
33 Adversarial Multi-Armed Bandits similar setup to expert s advice you can to bid on only one of the friends (experts) at a time consequently, no full loss vector l t is received (you don t know how much lost your other friends) you get only the loss l t [i t ] of the chosen friend i t Online Learning under Weak (Bandit) Information 32 / 48
34 Hedge vs Exp3 Slight change to the Hedge algorithm: Hedge 1: init w 1 s.t. N w1[i] = 1, µ > 0 2: for t = 1,... do 3: play all friends p t = w t N wt[i] Exp3 1: init w 1[i] = 1, γ (0, 1] 2: for t = 1,... do 3: draw a friend i t acc. to w t p t[i] = (1 γ) N wt[i] + γ N 4: receive l t 5: w t+1[i] = w t[i]e µl t[i] 4: receive l t[i t] 5: {w w t+1[i] = t[i]e γ l t [i] p t [i], w t[i], if i = i t else random surprise actions added Exponential-weight algorithm for Exploration and Exploitation Exp3 Online Learning under Weak (Bandit) Information 33 / 48
35 Connection to the importance sampling Exp3 1: init w 1[i] = 1, γ (0, 1] 2: for t = 1,... do 3: draw a friend i t acc. to 4: receive l t[i t] {w t[i]e γ l t [i] 5: w t+1[i] = w t[i], 6: w t+1[i] = w t[i]e γ l t [i] w t p t[i] = (1 γ) N wt[i] + γ N p t [i], if i = i t else { denote l l t [i]/p t [i] if i = i t t [i] = 0 otherwise then E [ lt [i] i 1,..., i t 1 ] = pt [i] l t[i] p t [i] + (1 p t[i]) 0 = l t [i] Online this Learning a recurring under Weak (Bandit) theme: Information unbiased estimates of losses 34 / 48
36 Exp3 Regret Theorem For any γ (0, 1], N > 0 and any sequence of l 1,..., l T E [ T l t [i t ] ] min i l t [i] 2 e 1 T N ln N Comparison to the full-information Hedge: O( T ln N) Exp3: O( T N ln N) the price of bandit info Online Learning under Weak (Bandit) Information 35 / 48
37 Proof very similar proof to Hedge see appendix Exercise 4 (optional): explain why is the exploration necessary? point where the proof will fail if N p t [i] = w t [i]/ w t [i] Online Learning under Weak (Bandit) Information 36 / 48
38 Stochastic Bandits Online Learning under Weak (Bandit) Information 37 / 48
39 Stochastic Bandits restrict somewhat the adversarial setting as we have seen, restrictions lead to nicer regret (possibly under a different definition) N arms as before this time arm s loss l t [i] is sampled i.i.d. from D i (unknown and fixed) l t [i] and l t [j] are independent for i j mean loss µ i = E[l t [i]] µ = min i µ i i = arg min i µ i suffered loss l t [i t ] note: D i,t s may change, keeping µ i,t fixed Online Learning under Weak (Bandit) Information 38 / 48
40 Stochastic Bandits mean loss µ i = E[l t [i]] µ = min i µ i n i (T ) number of pulls of i th arm over first T plays Goal: now it makes sense to speak of expected regret regret: R T = E [ T l t [i t ] ] min E [ T l t [i] ] i = E [ T = = N l t [i][i = i t ] ] min T µ i i N µ i E [ T [i = i t ] ] T µ N µ i E[n i (T )] T µ min Online Learning under Weak (Bandit) Information 39 / 48
41 Stochastic Bandits Gaps: i = µ i µ 0 j i µ 0 µ µ j µ i R T = N µ i E[n i (T )] T µ = N i E[n i (T )] Next Theorem(s) R T C N i=2 ln T i + o(ln T ) smaller gaps bigger regret intuition: takes more time to distinguish between close arms compare to Exp3: O( NT ln N) Online Learning under Weak (Bandit) Information 40 / 48
42 Regret bound R T C N i=2 ln T i + o(ln T ) Why 1 i? Intuitive example: imagine Bernoulli losses with p i mean estimates µ i = 1 T T l t[i] have variance σ 2 = p i (1 p i )/T if T 1 α i then σ 2 α i p i(1 p i ) if α 1, hard to say if there is a real difference between µ and µ i or it s just variance so we need rather α 2 as on every pull we loose about i, for α 2 we have i 1 α i 1 i Online Learning under Weak (Bandit) Information 41 / 48
43 ɛ-greedy Simplest strategy: ɛ-greedy 1: ɛ > 0, µ i = 0 empirical means of rewards 2: for t = 1,... do 3: with prob. 1 ɛ play current best arm i t = arg min i µ i 4: with prob. ɛ play a random arm 5: receive l t [i t ] 6: update empirical means ( µ it = µ i t n it +l t[i t] n it +1 ) Regret: because of the constant ɛ, R T ɛt need to let ɛ to to zero at a certain rate Online Learning under Weak (Bandit) Information 42 / 48
44 ɛ t -decreasing Modified simplest strategy: ɛ t -decreasing 1: c > 0, δ > 0, µ i = 0 2: for t = 1,... do 3: 4: ɛ t = min{1, cn δ 2 t } with prob. 1 ɛ t play current best arm i t = arg min i µ i 5: with prob. ɛ t play a random arm 6: receive l t [i t ] 7: update means Theorem If 0 < δ min i:µi <µ i < 1 and T cn δ P {i t i } c δ 2 t + o( 1) t R T = O(ln T ) Online Learning under Weak (Bandit) Information 43 / 48
45 UCB (upper confidence bound) UCB-style strategies: pull arms and maintain empirical averages of their losses calculate also confidence intervals (a region around our estimate so that the true value is within with high prob.) μ-u μ μ+u repeated plays shrink the confidence bound, the average is becoming more reliable now stick to the principle: optimism in the face of uncertainty play the arm whose mean loss combined with confidence bound promises the least loss eventually the most optimistic arm will change because either that is really better or it wasn t sampled often enough deterministic algorithm unlike ɛ-greedy Online Learning under Weak (Bandit) Information 44 / 48
46 UCB UCB 1: play each arm once 2: for t = 1,... do 3: play the arm i t = arg min i ( µ i 2 ln t n i (t) 4: receive l t [i t ] 5: update averages ) Theorem 1 For N > 0, T > 0 and arbitrary distributions D i,t with fixed means µ i R T = [8 ln T ] + (1 + π2 N i:µ i >µ i 3 ) i Online Learning under Weak (Bandit) Information 45 / 48
47 Lessons learned lot s of applications where traditional batch learning may fail goals are formulated in terms of various regrets the slower the upper bound on regret grows with T the better full information realizability Consistent R T H 1 Halving R T log 2 H commitment + surprise Randomized R T 2T ln H Hedge R T = O( T ln N) adversarial bandits EXP3 R T = O( NT ln N) stochastic bandits ɛ-decreasing R T = O(ln T/δ) UCB R T = O(ln T/ 2 ) Online Learning under Weak (Bandit) Information 46 / 48
48 Conclusion barely scratched the surface with most important algorithms advanced algorithms use EXP3, UCB as building blocks Online Learning under Weak (Bandit) Information 47 / 48
49 Recommended Reading Material for this lecture 1 S. Shalev-Shwartz. Online Learning and Online Convex Optimization, Y. Seldin. The Space of Online Learning Algorithms, P. Auer et al. The nonstochastic multiarmed bandit problem, P. Auer et al. Finite-time Analysis of the Multiarmed Bandit Problem, 2002 Advanced Contextual Algorithms 1 EXP4: P. Auer et al. The nonstochastic multiarmed bandit problem, LinUCB: Li et al. A contextual-bandit approach to personalized news article recommendation, Epoch-greedy: J. Langford The epoch-greedy algorithm for multi-armed bandits with side information, 2003 Online Learning under Weak (Bandit) Information 48 / 48
50 Appendix Online Learning under Weak (Bandit) Information 1 / 5
51 Analysis of EXP3 similar to the Hedge algorithm idea: lower- and upper-bound N w t+1[i]/ N w t[i] { denote ˆl l t [i]/p t [i] if i = i t t [i] = 0 otherwise Online Learning under Weak (Bandit) Information 2 / 5
52 Analysis of EXP3 Upper bound N wt+1[i] N wt[i] = = N N N 1 1 w t[i]e γ ˆl t [i] N (use w t+1 [i] definition; note the bound ˆl t [i] γ N ) p t[i] γ γ N ˆl t [i] 1 γ e N (using e x 1 + x + (e 2)x 2 for x 1) p t[i] γ N 1 γ γ N 1 γ N [ 1 γ ˆl t[i] N + (e 2)( γ ˆl t[i] ) ] 2 N p t[i]ˆl t[i] + (e 2)( γ N )2 (1 γ) γ N 1 γ lt[it] + (e 2)( γ N )2 (1 γ) N ln wt+1[i] γ N N wt[i] 1 γ lt[it] + (e 2)( γ N )2 (1 γ) N p t[i](ˆl t[i]) 2 N ˆl t[i] (use ln(1 + x) x) N ˆl t[i] Online Learning under Weak (Bandit) Information 3 / 5
53 Analysis of EXP3 Upper bound (cont.) sum over t = 1,..., T N ln w γ T +1[i] N N w l t [i t ] + (e 2)( γ N )2 T [i] 1 γ (1 γ) N ˆl t [i] Lower bound for any j = 1,..., N N ln w T +1[i] N w ln w T +1[j] N T [i] w T [i] = γ N ˆl t [j] ln N Combining (and simplifying) l t [i t ] (1 γ) ˆl t [j] + N ln N γ + (e 2) γ N N ˆl t [i] Online Learning under Weak (Bandit) Information 4 / 5
54 Analysis of EXP3 expectation to rescue: E [ˆlt [i] i 1,..., i t 1 ] = pt [i] l t[i] p t [i] + (1 p t[i]) 0] = l t [i] Take expectation E [ T l ] t[i t] (1 γ) l t[j] + N ln N γ + (e 2) γ N since j was arbitrary and by assumption N l t[i] (1 γ) min l j t[j] + N ln N + (e 2) γ γ N N L choose γ = min { N ln N } 1, (e 1) L 2 e 1 LN ln N N l t[i] N L Online Learning under Weak (Bandit) Information 5 / 5
New Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationBandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University
Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationOnline Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More informationEvaluation of multi armed bandit algorithms and empirical algorithm
Acta Technica 62, No. 2B/2017, 639 656 c 2017 Institute of Thermomechanics CAS, v.v.i. Evaluation of multi armed bandit algorithms and empirical algorithm Zhang Hong 2,3, Cao Xiushan 1, Pu Qiumei 1,4 Abstract.
More informationCOS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 22 Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 How to balance exploration and exploitation in reinforcement
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationAdvanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12
Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview
More informationHybrid Machine Learning Algorithms
Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised
More informationLecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16
600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an
More information0.1 Motivating example: weighted majority algorithm
princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes
More informationAn Algorithms-based Intro to Machine Learning
CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationLarge-scale Information Processing, Summer Recommender Systems (part 2)
Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a
More informationLecture 2: Learning from Evaluative Feedback. or Bandit Problems
Lecture 2: Learning from Evaluative Feedback or Bandit Problems 1 Edward L. Thorndike (1874-1949) Puzzle Box 2 Learning by Trial-and-Error Law of Effect: Of several responses to the same situation, those
More informationLecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem
Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationThe Multi-Armed Bandit Problem
Università degli Studi di Milano The bandit problem [Robbins, 1952]... K slot machines Rewards X i,1, X i,2,... of machine i are i.i.d. [0, 1]-valued random variables An allocation policy prescribes which
More informationReducing contextual bandits to supervised learning
Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.
More informationExploration. 2015/10/12 John Schulman
Exploration 2015/10/12 John Schulman What is the exploration problem? Given a long-lived agent (or long-running learning algorithm), how to balance exploration and exploitation to maximize long-term rewards
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationOnline Learning: Bandit Setting
Online Learning: Bandit Setting Daniel asabi Summer 04 Last Update: October 0, 06 Introduction [TODO Bandits. Stocastic setting Suppose tere exists unknown distributions ν,..., ν, suc tat te loss at eac
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationLearning with Exploration
Learning with Exploration John Langford (Yahoo!) { With help from many } Austin, March 24, 2011 Yahoo! wants to interactively choose content and use the observed feedback to improve future content choices.
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationSparse Linear Contextual Bandits via Relevance Vector Machines
Sparse Linear Contextual Bandits via Relevance Vector Machines Davis Gilton and Rebecca Willett Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 Email: gilton@wisc.edu,
More informationIntroduction to Computer Science and Programming for Astronomers
Introduction to Computer Science and Programming for Astronomers Lecture 8. István Szapudi Institute for Astronomy University of Hawaii March 7, 2018 Outline Reminder 1 Reminder 2 3 4 Reminder We have
More informationEASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University
EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationPAC Learning. prof. dr Arno Siebes. Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht
PAC Learning prof. dr Arno Siebes Algorithmic Data Analysis Group Department of Information and Computing Sciences Universiteit Utrecht Recall: PAC Learning (Version 1) A hypothesis class H is PAC learnable
More informationPractical Agnostic Active Learning
Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory
More informationNotes from Week 8: Multi-Armed Bandit Problems
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 8: Multi-Armed Bandit Problems Instructor: Robert Kleinberg 2-6 Mar 2007 The multi-armed bandit problem The multi-armed bandit
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationMinimax risk bounds for linear threshold functions
CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability
More informationAlgorithms, Games, and Networks January 17, Lecture 2
Algorithms, Games, and Networks January 17, 2013 Lecturer: Avrim Blum Lecture 2 Scribe: Aleksandr Kazachkov 1 Readings for today s lecture Today s topic is online learning, regret minimization, and minimax
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationLearning Theory: Basic Guarantees
Learning Theory: Basic Guarantees Daniel Khashabi Fall 2014 Last Update: April 28, 2016 1 Introduction The Learning Theory 1, is somewhat least practical part of Machine Learning, which is most about the
More informationCS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm
CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationLecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012
CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationNew bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre
More informationMachine Learning I Reinforcement Learning
Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:
More informationEfficient and Principled Online Classification Algorithms for Lifelon
Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationLecture 6: Non-stochastic best arm identification
CSE599i: Online and Adaptive Machine Learning Winter 08 Lecturer: Kevin Jamieson Lecture 6: Non-stochastic best arm identification Scribes: Anran Wang, eibin Li, rian Chan, Shiqing Yu, Zhijin Zhou Disclaimer:
More informationTHE first formalization of the multi-armed bandit problem
EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem
More informationEdward L. Thorndike #1874$1949% Lecture 2: Learning from Evaluative Feedback. or!bandit Problems" Learning by Trial$and$Error.
Lecture 2: Learning from Evaluative Feedback Edward L. Thorndike #1874$1949% or!bandit Problems" Puzzle Box 1 2 Learning by Trial$and$Error Law of E&ect:!Of several responses to the same situation, those
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationThe Multi-Armed Bandit Problem
The Multi-Armed Bandit Problem Electrical and Computer Engineering December 7, 2013 Outline 1 2 Mathematical 3 Algorithm Upper Confidence Bound Algorithm A/B Testing Exploration vs. Exploitation Scientist
More informationStochastic bandits: Explore-First and UCB
CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationLecture 4: Lower Bounds (ending); Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from
More informationBandit Algorithms. Tor Lattimore & Csaba Szepesvári
Bandit Algorithms Tor Lattimore & Csaba Szepesvári Bandits Time 1 2 3 4 5 6 7 8 9 10 11 12 Left arm $1 $0 $1 $1 $0 Right arm $1 $0 Five rounds to go. Which arm would you play next? Overview What are bandits,
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationOnline Prediction: Bayes versus Experts
Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,
More informationWeb-Mining Agents Computational Learning Theory
Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)
More informationName (NetID): (1 Point)
CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationEmpirical Risk Minimization Algorithms
Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 23: Perceptrons 11/20/2008 Dan Klein UC Berkeley 1 General Naïve Bayes A general naive Bayes model: C E 1 E 2 E n We only specify how each feature depends
More informationGeneral Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering
CS 188: Artificial Intelligence Fall 2008 General Naïve Bayes A general naive Bayes model: C Lecture 23: Perceptrons 11/20/2008 E 1 E 2 E n Dan Klein UC Berkeley We only specify how each feature depends
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationOn the tradeoff between computational complexity and sample complexity in learning
On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationLeast Mean Squares Regression. Machine Learning Fall 2018
Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationOutline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual
Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how
More information