THE WEIGHTED MAJORITY ALGORITHM
|
|
- Kathleen Wilcox
- 5 years ago
- Views:
Transcription
1 THE WEIGHTED MAJORITY ALGORITHM Csaba Szepesvári University of Alberta CMPUT UofA, October 3, 2006
2 OUTLINE 1 PREDICTION WITH EXPERT ADVICE 2 HALVING: FIND THE PERFECT EXPERT! (0/1 LOSS) 3 NO PERFECT EXPERT? (0/1 LOSS) 4 PREDICTING CONTINUOUS OUTCOMES 5 BIBLIOGRAPHY
3 FRAMEWORK Prediction with Expert Advice Outcomes: y 1, y 2,... Y Decisions: ˆp 1, ˆp 2,... D Loss function: l : D Y R Advice of expert i: f i1, f i2,... D, i J (Total) loss of expert i: L i,n = n t=1 l(f it, y t ) (Total) loss of algorithm: ˆL n = n t=1 l(ˆp t, y t ) (Total) regret (excess loss): R n = ˆL n L i,n Goal: Design algorithm that keeps the regret small
4 A PERFECT WORLD y t {0, 1}, ˆp t {0, 1} (Y = D = {0, 1}) Loss: l(p, y) = I {p y} (0/1, binary or classification loss) N experts (J = {1,..., N}) Expert predictions: f i1, f i2,... {0, 1} Assumption: There is an expert that never makes a mistake. How to keep the regret small?
5 HALVING ALGORITHM Keep regret small Find the perfect expert quickly: with few mistakes Idea: Eliminate immediately experts that make a mistake Take majority vote of remaining experts Halving Algorithm [Barzdin and Freivalds, 1972, Angluin, 1988] Claim: Whenever the alg. makes a mistake, at least half of the experts are eliminated! There is a perfect expert, hence cannot halve more than log 2 N times! Theorem: Regret never grows above log 2 N (finite!) Holds for any sequence y 1, y 2,...!
6 FORMAL ANALYSIS Weights w it {0, 1}: Is expert i alive at time t? (after y t is received) w i0 = 1. W t = N i=1 w it: Number of alive experts at time t ˆL t : number of mistakes up to time t (including time t) Claim: If mistake (l(ˆp t, y t ) = 1) then W t W t 1 /2. Also: W t never grows. W t W 0 /2ˆL t = N/2ˆL t. Lower bound: 1 W t. Putting together: 1 N/2ˆL t, hence ˆL t log 2 N.
7 NO PERFECT EXPERT: WEIGHTED MAJORITY Elimination: too strong if there is no perfect expert! Keep weights positive! Have weights of experts making a mistake decay: w it = βw i,t 1, if f it y t (0 < β < 1) Keep majority vote: ˆp t = I n P i w i,t 1I {fit =0} <P o i w i,t 1I {fit =1} = I P { i w i,t 1(1 f it )< P i w i,t 1f it} = I P { i w i,t 1<2 P i w i,t 1f it} = I P. i w i,t 1 f it Pi w > 1 i,t 1 2 Weighted Majority [Littlestone and Warmuth, 1994]
8 WEIGHTED MAJORITY: ANALYSIS/1 Notation: J t,bad = {i f it y t }, J t,good = {i f it = y t } W t,j = i J W it W t = W t 1,Jt,good + βw t 1,Jt,bad Claim: W t W t 1 and if ˆp t y t then W t (1 + β)/2w t 1 Proof: W t = W t 1,Jt,good + βw t 1,Jt,bad. Since β < 1, W t W t 1. Assume ˆp t y t. W t 1,Jt,good W t 1 /2 (majority vote). W t = W t 1,Jt,good + βw t 1,Jt,bad = W t 1,Jt,good + β(w t 1 W t 1,Jt,good ) = (1 β)w t 1,Jt,good + βw t 1 (1 β)w t 1 /2 + βw t 1
9 WEIGHTED MAJORITY: ANALYSIS/2 CLAIM W t W t 1 and if ˆp t y t then W t (1 + β)/2w t 1 Lower bound: For any i, β L it = w it W t. Putting together: β L it ( 1 + β )ˆLt W t W 0. 2 Take log, reorder: log2 ( ˆL 1 β t )L it + log 2 N log 2 ( 2 1+β ).
10 PREDICTING CONTINUOUS OUTCOMES What if Y = D = [0, 1] or R d? More generally: let Y = D be convex subsets of some vector space λ 1 y 1 + λ 2 y 2 Y whenever λ 1, λ 2 0, λ 1 + λ 2 = 1, y 1, y 2 Y). Loss: l : D Y [0, 1] (bounded) Example: D = Y = [0, 1], (p, y) = 1 2 p y. Can we generalize the previous idea? Combine the advice of the experts! ˆp t = N i=1 w i,t 1f it N i=1 w it How to set the weights? Let them decay exponentially as a function of the losses! w i,t = w i,t 1 e ηl(f it,y t ).
11 PREDICTING CONTINUOUS OUTCOMES/2 For numerical stability we might want to normalize the weights: w i,t 1 e ηl(f it,y t ) w i,t = N i=1 w i,t 1e. ηl(f it,y t ) Note: resembles Bayes updates! For the analysis we do not normalize Analysis?? Plan?? Lower bound the sum of weights using individual total losses of the experts Upper bound the sum of weights in terms of the total loss
12 ANALYSIS/1 Lower bound: W n = N i=1 w in = N i=1 e ηl in e ηl in. Upper bound: Bound W t /W t 1 in terms of l(ˆp t, y t )! (W t const W t 1, const =?) W t W t 1 = = i i e ηl it w i,t 1 W t 1 ŵ i,t 1 e ηl it (l it def = l(ˆp t, y t )) (ŵ i,t 1 def = w i,t 1 /W t 1 )
13 ANALYSIS/2 W t W t 1 = i ŵ i,t 1 e ηl it Looks like an expectation! Let Then Observe: P (I = i) = ŵ i,t 1, I J. W t W t 1 [ ] = E e ηl I,t. l(ˆp t, y t ) = l(e [ ] f I,t, yt ), E [ ] l I,t = E [ l(f I,t, y t ) ]
14 ANALYSIS/3 [ ] W t W t 1 = E e ηl I,t?? l(ˆp t, y t ) = l(e [ f I,t ], yt ), E [ l I,t ] = E [ l(f I,t, y t ) ] What if l(p, y) = 1 2 p y, p, y [0, 1]? l(, y) is convex for any y E [ l(f I,t, y t ) ] l(e [ f I,t ], yt ) = l(ˆp t, y t ) (Jensen s inequality)
15 ANALYSIS/4 [ ] W t W t 1 = E e ηl I,t?? E [ l(f I,t, y t ) ] l(e [ f I,t ], yt ) = l(ˆp t, y t ). LEMMA (HOEFFDING S INEQUALITY) Let 0 X 1. Then s R, E [ e sx ] e se[x]+s2 /8. [ ] W t W t 1 E e ηl I,t e ηe[l I,t]+η 2 /8 e ηl(ˆp t,y t )+η 2 /8 (line 2 above) (Hoeffding s inequality) (line 2 above)
16 ANALYSIS/3 W n e ηl in, i J W t W t 1 e ηl(ˆp t,y t )+ η 2 8. Hence, using W 0 = N (w 0i = 1), 1 N e ηl in W n W 0 = W n W n 1... W 1 e ηˆl n+ η 2 W 0 8 n THEOREM (LOSS BOUND FOR THE EWA FORECASTER) Assume that D is a convex subset of some vector-space. Let l : D Y [0, 1] be convex in its first argument. Then, for EWA forecaster it holds: With η = ˆL n min L in + ln N i J η + η 8 n. 8 ln N n, ˆL n min i J L in + n/2 ln N.
17 NOTES Small losses Loss bound for WM, 0/1-predictions: log2 ( ˆL 1 β n )L in + log 2 N log 2 ( 2 1+β ). If L in = 0 for some expert then regret is finite! Continuous prediction spaces (EWA): ˆL n min i J L in + n/2 ln N. The bound grows to infinite even if for some i, L in = 0! :-( Can this be improved? If there is a perfect expert, the regret should be finite! How to select η if horizon (n) is not given a priori? Would η t = 8(ln N)/t work? (yes) Cheap solution: doubling trick Related: Can use the doubling trick to improve bound in case of small losses? (yes)
18 REFERENCES Angluin, D. (1988). Queries and concept learning. Journal of Machine Learning, 2: Barzdin, Y. and Freivalds, R. (1972). On the prediction of general recursive functions. Soviet Mathematics (Doklady), 13: Littlestone, N. and Warmuth, M. (1994). The weighted majority algorithm. Information and Computation, 108:
Online Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More informationOnline learning CMPUT 654. October 20, 2011
Online learning CMPUT 654 Gábor Bartók Dávid Pál Csaba Szepesvári István Szita October 20, 2011 Contents 1 Shooting Game 4 1.1 Exercises...................................... 6 2 Weighted Majority Algorithm
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationDISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION
DISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION Csaba Szepesvári Uiversity of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 10-12-14, 2006 OUTLINE 1 DISCRETE PREDICTION PROBLEMS 2 RANDOMIZED
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationComputational Learning Theory. Definitions
Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationOnline Prediction: Bayes versus Experts
Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,
More informationClassification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13
Classification Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY Slides adapted from Mohri Jordan Boyd-Graber UMD Classification 1 / 13 Beyond Binary Classification Before we ve talked about
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationOnline Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationWeighted Majority and the Online Learning Approach
Statistical Techniques in Robotics (16-81, F12) Lecture#9 (Wednesday September 26) Weighted Majority and the Online Learning Approach Lecturer: Drew Bagnell Scribe:Narek Melik-Barkhudarov 1 Figure 1: Drew
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationApplications of on-line prediction. in telecommunication problems
Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some
More informationExtended dynamic programming: technical details
A Extended dynamic programming: technical details The extended dynamic programming algorithm is given by Algorithm 2. Algorithm 2 Extended dynamic programming for finding an optimistic policy transition
More informationOn-line Variance Minimization
On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06
More informationA simple algorithmic explanation for the concentration of measure phenomenon
A simple algorithmic explanation for the concentration of measure phenomenon Igor C. Oliveira October 10, 014 Abstract We give an elementary algorithmic argument that sheds light on the concentration of
More informationOnline prediction with expert advise
Online prediction with expert advise Jyrki Kivinen Australian National University http://axiom.anu.edu.au/~kivinen Contents 1. Online prediction: introductory example, basic setting 2. Classification with
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU10701 11. Learning Theory Barnabás Póczos Learning Theory We have explored many ways of learning from data But How good is our classifier, really? How much data do we
More informationLecture 2: Weighted Majority Algorithm
EECS 598-6: Prediction, Learning and Games Fall 3 Lecture : Weighted Majority Algorithm Lecturer: Jacob Abernethy Scribe: Petter Nilsson Disclaimer: These notes have not been subjected to the usual scrutiny
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationThe Ellipsoid Algorithm
The Ellipsoid Algorithm John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA 9 February 2018 Mitchell The Ellipsoid Algorithm 1 / 28 Introduction Outline 1 Introduction 2 Assumptions
More informationCS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm
CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the
More informationMultitask Learning With Expert Advice
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-28-2007 Multitask Learning With Expert Advice Jacob D. Abernethy University of California - Berkeley Peter Bartlett
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationOnline Prediction Peter Bartlett
Online Prediction Peter Bartlett Statistics and EECS UC Berkeley and Mathematical Sciences Queensland University of Technology Online Prediction Repeated game: Cumulative loss: ˆL n = Decision method plays
More informationOn-line Prediction and Conversion Strategies
Machine Learning, 25, 71 110 (1996) c 1996 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. On-line Prediction and Conversion Strategies NICOLÒ CESA-BIANCHI DSI, Università di Milano,
More informationUsing Additive Expert Ensembles to Cope with Concept Drift
Jeremy Z. Kolter and Marcus A. Maloof {jzk, maloof}@cs.georgetown.edu Department of Computer Science, Georgetown University, Washington, DC 20057-1232, USA Abstract We consider online learning where the
More informationComputational Learning Theory
09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16
600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationThe Weighted Majority Algorithm. Nick Littlestone. Manfred K. Warmuth y UCSC-CRL Revised. October 26, 1992.
The Weighted Majority Algorithm Nick Littlestone Manfred K. Warmuth y UCSC-CRL-9-8 Revised October 6, 99 Baskin Center for Computer Engineering & Information Sciences University of California, Santa Cruz
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationAlgorithms, Games, and Networks January 17, Lecture 2
Algorithms, Games, and Networks January 17, 2013 Lecturer: Avrim Blum Lecture 2 Scribe: Aleksandr Kazachkov 1 Readings for today s lecture Today s topic is online learning, regret minimization, and minimax
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationCS261: Problem Set #3
CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationAn Algorithms-based Intro to Machine Learning
CMU 15451 lecture 12/08/11 An Algorithmsbased Intro to Machine Learning Plan for today Machine Learning intro: models and basic issues An interesting algorithm for combining expert advice Avrim Blum [Based
More informationWorst-Case Analysis of the Perceptron and Exponentiated Update Algorithms
Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Tom Bylander Division of Computer Science The University of Texas at San Antonio San Antonio, Texas 7849 bylander@cs.utsa.edu April
More informationLecture 23: Online convex optimization Online convex optimization: generalization of several algorithms
EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationMachine Learning Theory (CS 6783)
Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Kimball, B-11 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)
More information1 MDP Value Iteration Algorithm
CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using
More informationComputational Learning Theory
1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationOnline learning with feedback graphs and switching costs
Online learning with feedback graphs and switching costs A Proof of Theorem Proof. Without loss of generality let the independent sequence set I(G :T ) formed of actions (or arms ) from to. Given the sequence
More information1 Review of the Perceptron Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds
More informationComputational Learning Theory
Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationMachine Learning Theory Lecture 4
Machine Learning Theory Lecture 4 Nicholas Harvey October 9, 018 1 Basic Probability One of the first concentration bounds that you learn in probability theory is Markov s inequality. It bounds the right-tail
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationMistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron
Stat 928: Statistical Learning Theory Lecture: 18 Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron Instructor: Sham Kakade 1 Introduction This course will be divided into 2 parts.
More informationNew bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Online Learning Repeated game: Aim: minimize ˆL n = Decision method plays a t World reveals l t L n l t (a
More informationU Logo Use Guidelines
Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity
More informationEfficient Tracking of Large Classes of Experts
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 11, NOVEMBER 2012 6709 Efficient Tracking of Large Classes of Experts András György, Member, IEEE, Tamás Linder, Senior Member, IEEE, and Gábor Lugosi
More informationA Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997
A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationLecture 16: FTRL and Online Mirror Descent
Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationIntroduction to Machine Learning
Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationMinimax risk bounds for linear threshold functions
CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability
More informationLecture 3: Lower Bounds for Bandit Algorithms
CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and
More informationLearning with Large Number of Experts: Component Hedge Algorithm
Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T
More informationLecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds
Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationCalibrated Surrogate Losses
EECS 598: Statistical Learning Theory, Winter 2014 Topic 14 Calibrated Surrogate Losses Lecturer: Clayton Scott Scribe: Efrén Cruz Cortés Disclaimer: These notes have not been subjected to the usual scrutiny
More informationOnline Learning: Bandit Setting
Online Learning: Bandit Setting Daniel asabi Summer 04 Last Update: October 0, 06 Introduction [TODO Bandits. Stocastic setting Suppose tere exists unknown distributions ν,..., ν, suc tat te loss at eac
More informationDefensive forecasting for optimal prediction with expert advice
Defensive forecasting for optimal prediction with expert advice Vladimir Vovk $25 Peter Paul $0 $50 Peter $0 Paul $100 The Game-Theoretic Probability and Finance Project Working Paper #20 August 11, 2007
More informationComputational Learning Theory (VC Dimension)
Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationPutting the Bayes update to sleep
Putting the Bayes update to sleep Manfred Warmuth UCSC AMS seminar 4-13-15 Joint work with Wouter M. Koolen, Dmitry Adamskiy, Olivier Bousquet Menu How adding one line of code to the multiplicative update
More informationThe Blessing and the Curse
The Blessing and the Curse of the Multiplicative Updates Manfred K. Warmuth University of California, Santa Cruz CMPS 272, Feb 31, 2012 Thanks to David Ilstrup and Anindya Sen for helping with the slides
More informationOnline Learning versus Offline Learning*
Machine Learning, 29, 45 63 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Online Learning versus Offline Learning* SHAI BEN-DAVID Computer Science Dept., Technion, Israel.
More informationLogistic regression and linear classifiers COMS 4771
Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random
More informationComputational Learning Theory
0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions
More informationLearning Binary Relations Using Weighted Majority Voting *
Machine Learning, 20, 245-271 (1995) 1995 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Learning Binary Relations Using Weighted Majority Voting * SALLY A. GOLDMAN Dept. of Computer
More informationYevgeny Seldin. University of Copenhagen
Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationLecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More information0.1 Motivating example: weighted majority algorithm
princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes
More informationOnline Learning with Costly Features and Labels
Online Learning with Costly Features and Labels Navid Zolghadr Department of Computing Science University of Alberta zolghadr@ualberta.ca Gábor Bartók Department of Computer Science TH Zürich bartok@inf.ethz.ch
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More information