Optimal and Adaptive Algorithms for Online Boosting
|
|
- Brian Cannon
- 5 years ago
- Views:
Transcription
1 Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2 1 Yahoo! Labs, NYC 2 Computer Science Department, Princeton University Jul 8, 2015
2 Boosting: An Example Idea: combine weak rules of thumb to form a highly accurate predictor. 2 / 16
3 Boosting: An Example Idea: combine weak rules of thumb to form a highly accurate predictor. Example: spam detection. 2 / 16
4 Boosting: An Example Idea: combine weak rules of thumb to form a highly accurate predictor. Example: spam detection. Given: a set of training examples. ( Wanna make fast money?..., spam) ( How is research going? Rob, not spam) 2 / 16
5 Boosting: An Example Idea: combine weak rules of thumb to form a highly accurate predictor. Example: spam detection. Given: a set of training examples. ( Wanna make fast money?..., spam) ( How is research going? Rob, not spam) Obtain a classifier by asking a weak learning algorithm : e.g. contains the word money spam. 2 / 16
6 Boosting: An Example Idea: combine weak rules of thumb to form a highly accurate predictor. Example: spam detection. Given: a set of training examples. ( Wanna make fast money?..., spam) ( How is research going? Rob, not spam) Obtain a classifier by asking a weak learning algorithm : e.g. contains the word money spam. Reweight the examples so that difficult ones get more attention. e.g. spam that doesn t contain money. 2 / 16
7 Boosting: An Example Idea: combine weak rules of thumb to form a highly accurate predictor. Example: spam detection. Given: a set of training examples. ( Wanna make fast money?..., spam) ( How is research going? Rob, not spam) Obtain a classifier by asking a weak learning algorithm : e.g. contains the word money spam. Reweight the examples so that difficult ones get more attention. e.g. spam that doesn t contain money. Obtain another classifier: e.g. empty to address spam. 2 / 16
8 Boosting: An Example Idea: combine weak rules of thumb to form a highly accurate predictor. Example: spam detection. Given: a set of training examples. ( Wanna make fast money?..., spam) ( How is research going? Rob, not spam) Obtain a classifier by asking a weak learning algorithm : e.g. contains the word money spam. Reweight the examples so that difficult ones get more attention. e.g. spam that doesn t contain money. Obtain another classifier: e.g. empty to address spam / 16
9 Boosting: An Example Idea: combine weak rules of thumb to form a highly accurate predictor. Example: spam detection. Given: a set of training examples. ( Wanna make fast money?..., spam) ( How is research going? Rob, not spam) Obtain a classifier by asking a weak learning algorithm : e.g. contains the word money spam. Reweight the examples so that difficult ones get more attention. e.g. spam that doesn t contain money. Obtain another classifier: e.g. empty to address spam.... At the end, predict by taking a (weighted) majority vote. 2 / 16
10 Online Boosting: Motivation Boosting is well studied in the batch setting, but become infeasible when the amount of data is huge. 3 / 16
11 Online Boosting: Motivation Boosting is well studied in the batch setting, but become infeasible when the amount of data is huge. Online learning has proven extremely useful: one pass of the data, make prediction on the fly. 3 / 16
12 Online Boosting: Motivation Boosting is well studied in the batch setting, but become infeasible when the amount of data is huge. Online learning has proven extremely useful: one pass of the data, make prediction on the fly. works even in an adversarial environment. e.g. spam detection. 3 / 16
13 Online Boosting: Motivation Boosting is well studied in the batch setting, but become infeasible when the amount of data is huge. Online learning has proven extremely useful: one pass of the data, make prediction on the fly. works even in an adversarial environment. e.g. spam detection. An natural question: how to extend boosting to the online setting? 3 / 16
14 Related Work Several algorithms exist (Oza and Russell, 2001; Grabner and Bischof, 2006; Liu and Yu, 2007; Grabner et al., 2008). mimic offline counterparts. achieve great success in many real-world applications. no theoretical guarantees. 4 / 16
15 Related Work Several algorithms exist (Oza and Russell, 2001; Grabner and Bischof, 2006; Liu and Yu, 2007; Grabner et al., 2008). mimic offline counterparts. achieve great success in many real-world applications. no theoretical guarantees. Chen et al. (2012): first online boosting algorithms with theoretical guarantees. online analogue of weak learning assumption. connecting online boosting and smooth batch boosting. 4 / 16
16 Batch Boosting Given a batch of T examples, (x t, y t ) X { 1, 1} for t = 1,..., T. Learner A predicts A(x t ) { 1, 1} for example x t. 5 / 16
17 Batch Boosting Given a batch of T examples, (x t, y t ) X { 1, 1} for t = 1,..., T. Learner A predicts A(x t ) { 1, 1} for example x t. Weak learner A (with edge γ): T t=1 1{A(x t) y t } ( 1 2 γ)t 5 / 16
18 Batch Boosting Given a batch of T examples, (x t, y t ) X { 1, 1} for t = 1,..., T. Learner A predicts A(x t ) { 1, 1} for example x t. Weak learner A (with edge γ): T t=1 1{A(x t) y t } ( 1 2 γ)t Strong learner A (with any target error rate ɛ): T t=1 1{A (x t ) y t } ɛt 5 / 16
19 Batch Boosting Given a batch of T examples, (x t, y t ) X { 1, 1} for t = 1,..., T. Learner A predicts A(x t ) { 1, 1} for example x t. Weak learner A (with edge γ): T t=1 1{A(x t) y t } ( 1 2 γ)t Boosting (Schapire, 1990; Freund, 1995) Strong learner A (with any target error rate ɛ): T t=1 1{A (x t ) y t } ɛt 5 / 16
20 Online Boosting Examples (x t, y t ) X { 1, 1} arrive online, for t = 1,..., T. Learner A observes x t and predicts A(x t ) { 1, 1} before seeing y t. Weak Online learner A (with edge γ): T t=1 1{A(x t) y t } ( 1 2 γ)t Strong Online learner A (with any target error rate ɛ): T t=1 1{A (x t ) y t } ɛt 5 / 16
21 Online Boosting Examples (x t, y t ) X { 1, 1} arrive online, for t = 1,..., T. Learner A observes x t and predicts A(x t ) { 1, 1} before seeing y t. Weak Online learner A (with edge γ and excess loss S): T t=1 1{A(x t) y t } ( 1 2 γ)t + S Strong Online learner A (with any target error rate ɛ and excess loss S ) T t=1 1{A (x t ) y t } ɛt + S 5 / 16
22 Online Boosting Examples (x t, y t ) X { 1, 1} arrive online, for t = 1,..., T. Learner A observes x t and predicts A(x t ) { 1, 1} before seeing y t. Weak Online learner A (with edge γ and excess loss S): T t=1 1{A(x t) y t } ( 1 2 γ)t + S Online Boosting (our result) Strong Online learner A (with any target error rate ɛ and excess loss S ) T t=1 1{A (x t ) y t } ɛt + S 5 / 16
23 Online Boosting Examples (x t, y t ) X { 1, 1} arrive online, for t = 1,..., T. Learner A observes x t and predicts A(x t ) { 1, 1} before seeing y t. Weak Online learner A (with edge γ and excess loss S): T t=1 1{A(x t) y t } ( 1 2 γ)t + S Online Boosting (our result) Strong Online learner A (with any target error rate ɛ and excess loss S ) T t=1 1{A (x t ) y t } ɛt + S this talk: S = 1 γ (corresponds to T regret) 5 / 16
24 Main Results Parameters of interest: N = number of weak learners (of edge γ) needed to achieve error rate ɛ. T ɛ = minimal number of examples s.t. error rate is ɛ. Algorithm N T ɛ Optimal? Adaptive? Online BBM O( 1 ln 1 γ 2 ɛ ) Õ( 1 ) ɛγ 2 AdaBoost.OL O( 1 ) Õ( 1 ) ɛγ 2 ɛ 2 γ 4 Chen et al. (2012) O( 1 ) Õ( 1 ) ɛγ 2 ɛγ 2 6 / 16
25 Structure of Online Boosting x 1 Booster
26 Structure of Online Boosting x 1 WL 1 predict WL 2 predict x 1 ŷ1 1 x 1 ŷ Booster WL N predict x 1 ŷ N 1
27 Structure of Online Boosting x 1 ŷ 1 y 1 WL 1 predict WL 2 predict x 1 ŷ1 1 x 1 ŷ Booster WL N predict x 1 ŷ N 1
28 Structure of Online Boosting x 1 ŷ 1 y 1 WL 1 predict x 1 ŷ1 1 w.p. p1 1 (x 1, y 1 ) WL 1 update WL 2 predict x 1 ŷ1 2 w.p. p1 2 (x 1, y 1 ) WL 2 update... Booster... WL N predict x 1 ŷ N 1 w.p. p N 1 (x 1, y 1 ) WL N update 7 / 16
29 Structure of Online Boosting x 2 ŷ 2 y 2 WL 1 predict x 2 ŷ2 1 w.p. p2 1 (x 2, y 2 ) WL 1 update WL 2 predict x 2 ŷ2 2 w.p. p2 2 (x 2, y 2 ) WL 2 update... Booster... WL N predict x 2 ŷ N 2 w.p. p N 2 (x 2, y 2 ) WL N update 8 / 16
30 Structure of Online Boosting x t ŷ t y t WL 1 predict x t ŷt 1 w.p. pt 1 (x t, y t ) WL 1 update WL 2 predict x t ŷt 2 w.p. pt 2 (x t, y t ) WL 2 update... Booster... WL N predict x t ŷ N t w.p. p N t (x t, y t ) WL N update 9 / 16
31 Boosting as a Drifting Game (Schapire, 2001; Luo and Schapire, 2014) Batch boosting can be analyzed using drifting game. 10 / 16
32 Boosting as a Drifting Game (Schapire, 2001; Luo and Schapire, 2014) Batch boosting can be analyzed using drifting game. Online version: sequence of potentials Φ i (s) s.t. Φ N (s) 1{s 0}, Φ i 1 (s) ( 1 2 γ 2 )Φ i(s 1) + ( γ 2 )Φ i(s + 1). 10 / 16
33 Boosting as a Drifting Game (Schapire, 2001; Luo and Schapire, 2014) Batch boosting can be analyzed using drifting game. Online version: sequence of potentials Φ i (s) s.t. Φ N (s) 1{s 0}, Φ i 1 (s) ( 1 2 γ 2 )Φ i(s 1) + ( γ 2 )Φ i(s + 1). Online boosting algorithm using Φ i : prediction: majority vote. 10 / 16
34 Boosting as a Drifting Game (Schapire, 2001; Luo and Schapire, 2014) Batch boosting can be analyzed using drifting game. Online version: sequence of potentials Φ i (s) s.t. Φ N (s) 1{s 0}, Φ i 1 (s) ( 1 2 γ 2 )Φ i(s 1) + ( γ 2 )Φ i(s + 1). Online boosting algorithm using Φ i : prediction: majority vote. update: p i t = Pr[(x t, y t ) sent to ith weak learner] w i t where w i t = difference in potentials if example is misclassified or not. 10 / 16
35 Mistake Bound Generalized drifting games analysis implies T t=1 1{A (x t ) y t } Φ 0 (0) T + (S + 1 }{{} γ ) i wi. }{{} ɛ =S 11 / 16
36 Mistake Bound Generalized drifting games analysis implies T t=1 1{A (x t ) y t } Φ 0 (0) T + (S + 1 }{{} γ ) i wi. }{{} ɛ =S So we want small w i. exponential potential (corresponding to AdaBoost) does not work. 11 / 16
37 Mistake Bound Generalized drifting games analysis implies T t=1 1{A (x t ) y t } Φ 0 (0) T + (S + 1 }{{} γ ) i wi. }{{} ɛ =S So we want small w i. exponential potential (corresponding to AdaBoost) does not work. Boost-by-Majority (Freund, 1995) potential works well! 11 / 16
38 Mistake Bound Generalized drifting games analysis implies T t=1 1{A (x t ) y t } Φ 0 (0) T + (S + 1 }{{} γ ) i wi. }{{} ɛ =S So we want small w i. exponential potential (corresponding to AdaBoost) does not work. Boost-by-Majority (Freund, 1995) potential works well! w i t = Pr[kt i heads in N i flips of a γ 2 -biased coin] 11 / 16
39 Mistake Bound Generalized drifting games analysis implies T t=1 1{A (x t ) y t } Φ 0 (0) T + (S + 1 }{{} γ ) i wi. }{{} ɛ =S So we want small w i. exponential potential (corresponding to AdaBoost) does not work. Boost-by-Majority (Freund, 1995) potential works well! w i t = Pr[kt i heads in N i flips of a γ 4 2 -biased coin] N i 11 / 16
40 Mistake Bound Generalized drifting games analysis implies T t=1 1{A (x t ) y t } Φ 0 (0) T + (S + 1 }{{} γ ) i wi. }{{} ɛ =S So we want small w i. exponential potential (corresponding to AdaBoost) does not work. Boost-by-Majority (Freund, 1995) potential works well! w i t = Pr[kt i heads in N i flips of a γ 4 2 -biased coin] N i Online BBM: to get ɛ error rate, needs N = O( 1 γ 2 ln( 1 ɛ )) weak learners and T ɛ = O( 1 ɛγ 2 ) examples. (Optimal) 11 / 16
41 Drawback of Online BBM The draw back of BBM (or Chen et al. (2012)) is the lack of adaptivity. requires γ as a parameter. 12 / 16
42 Drawback of Online BBM The draw back of BBM (or Chen et al. (2012)) is the lack of adaptivity. requires γ as a parameter. treats each weak learner equally: predicts via simple majority vote. 12 / 16
43 Drawback of Online BBM The draw back of BBM (or Chen et al. (2012)) is the lack of adaptivity. requires γ as a parameter. treats each weak learner equally: predicts via simple majority vote. Adaptivity is the key advantage of AdaBoost! different weak learners weighted differently based on their performance. 12 / 16
44 Adaptivity via Online Loss Minimization Batch boosting finds a combination of weak learners to minimize some loss function using coordinate descent. (Breiman, 1999) 13 / 16
45 Adaptivity via Online Loss Minimization Batch boosting finds a combination of weak learners to minimize some loss function using coordinate descent. (Breiman, 1999) AdaBoost: exponential loss AdaBoost.L: logistic loss 13 / 16
46 Adaptivity via Online Loss Minimization Batch boosting finds a combination of weak learners to minimize some loss function using coordinate descent. (Breiman, 1999) AdaBoost: exponential loss AdaBoost.L: logistic loss We generalize it to the online setting: replace line search with online gradient descent. 13 / 16
47 Adaptivity via Online Loss Minimization Batch boosting finds a combination of weak learners to minimize some loss function using coordinate descent. (Breiman, 1999) AdaBoost: exponential loss AdaBoost.L: logistic loss We generalize it to the online setting: replace line search with online gradient descent. exponential loss does not work again, use logistic loss to get adaptive online boosting algorithm AdaBoost.OL. 13 / 16
48 Mistake Bound If WL i has edge γ i, then T t=1 1{A (x t ) y t } 2T i γ2 i + Õ ( N 2 i γ2 i ) 14 / 16
49 Mistake Bound If WL i has edge γ i, then T t=1 1{A (x t ) y t } 2T i γ2 i + Õ ( N 2 i γ2 i ) Suppose γ i γ, then to get ɛ error rate AdaBoost.OL needs N = O( 1 ) weak learners and T ɛγ 2 ɛ = O( 1 ) examples. ɛ 2 γ 4 14 / 16
50 Mistake Bound If WL i has edge γ i, then T t=1 1{A (x t ) y t } 2T i γ2 i + Õ ( N 2 i γ2 i ) Suppose γ i γ, then to get ɛ error rate AdaBoost.OL needs N = O( 1 ) weak learners and T ɛγ 2 ɛ = O( 1 ) examples. ɛ 2 γ 4 Not optimal but adaptive. 14 / 16
51 Results Available in Vowpal Wabbit 8.0. command line option: --boosting. VW as the default weak learner (a rather strong one!) Dataset VW baseline Online BBM AdaBoost.OL Chen et al news a9a activity adult bio census covtype letter maptaskcoref nomao poker rcv vehv2binary / 16
52 Conclusions We propose A natural framework of online boosting. An optimal algorithm Online BBM. An adaptive algorithm AdaBoost.OL. 16 / 16
53 Conclusions We propose A natural framework of online boosting. An optimal algorithm Online BBM. An adaptive algorithm AdaBoost.OL. Future directions: Open problem: optimal and adaptive algorithm? Beyond classification: online gradient boosting for regression (see arxiv: ). 16 / 16
Optimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationCOMS 4771 Lecture Boosting 1 / 16
COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?
More informationBoosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationThe Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016
The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 24
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal
More informationOnline Advertising is Big Business
Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA
More informationLossless Online Bayesian Bagging
Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy Recommended for Acceptance by the Department
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationPractical Agnostic Active Learning
Practical Agnostic Active Learning Alina Beygelzimer Yahoo Research based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Francesco Orabona, Chicheng Zhang, and Tong Zhang * * introductory
More informationA Brief Introduction to Adaboost
A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman. 1 Outline Background Adaboost Algorithm Theory/Interpretations 2 What s So Good
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix
More informationAdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationOnline Advertising is Big Business
Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationA Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University
A Gentle Introduction to Gradient Boosting Cheng Li chengli@ccs.neu.edu College of Computer and Information Science Northeastern University Gradient Boosting a powerful machine learning algorithm it can
More informationEfficient and Principled Online Classification Algorithms for Lifelon
Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,
More informationWhat makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity
What makes good ensemble? CS789: Machine Learning and Neural Network Ensemble methods Jakramate Bootkrajang Department of Computer Science Chiang Mai University 1. A member of the ensemble is accurate.
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationOnline Non-Linear Gradient Boosting in Multi-Latent Spaces
Online Non-Linear Gradient Boosting in Multi-Latent Spaces Jordan Frery, Amaury Habrard, Marc Sebban, Olivier Caelen, Liyun He-Guelton To cite this version: Jordan Frery, Amaury Habrard, Marc Sebban, Olivier
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationCOMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017
COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University BOOSTING Robert E. Schapire and Yoav
More informationThe AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m
) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ
More informationBoosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi
Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier
More informationEnsembles. Léon Bottou COS 424 4/8/2010
Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationBoosting. 1 Boosting. 2 AdaBoost. 2.1 Intuition
Boosting 1 Boosting Boosting refers to a general and effective method of producing accurate classifier by combining moderately inaccurate classifiers, which are called weak learners. In the lecture, we
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationBinary Classification / Perceptron
Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data
More informationBoosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationData Warehousing & Data Mining
13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationBackground. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan
Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationLinear Probability Forecasting
Linear Probability Forecasting Fedor Zhdanov and Yuri Kalnishkan Computer Learning Research Centre and Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, TW0 0EX, UK {fedor,yura}@cs.rhul.ac.uk
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More information1 Review of Winnow Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm
More informationSTK-IN4300 Statistical Learning Methods in Data Science
Outline of the lecture STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no AdaBoost Introduction algorithm Statistical Boosting Boosting as a forward stagewise additive
More informationEnsemble Methods: Jay Hyer
Ensemble Methods: committee-based learning Jay Hyer linkedin.com/in/jayhyer @adatahead Overview Why Ensemble Learning? What is learning? How is ensemble learning different? Boosting Weak and Strong Learners
More informationAnnouncements Kevin Jamieson
Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationName (NetID): (1 Point)
CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationChapter 6. Ensemble Methods
Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationBoos$ng Can we make dumb learners smart?
Boos$ng Can we make dumb learners smart? Aarti Singh Machine Learning 10-601 Nov 29, 2011 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Why boost weak learners? Goal: Automa'cally categorize type
More informationGeneralized Boosting Algorithms for Convex Optimization
Alexander Grubb agrubb@cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 523 USA Abstract Boosting is a popular way to derive powerful
More informationChapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees
CSE 473 Chapter 18 Decision Trees and Ensemble Learning Recall: Learning Decision Trees Example: When should I wait for a table at a restaurant? Attributes (features) relevant to Wait? decision: 1. Alternate:
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationHybrid Machine Learning Algorithms
Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationi=1 = H t 1 (x) + α t h t (x)
AdaBoost AdaBoost, which stands for ``Adaptive Boosting", is an ensemble learning algorithm that uses the boosting paradigm []. We will discuss AdaBoost for binary classification. That is, we assume that
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationPerceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015
Perceptron Subhransu Maji CMPSCI 689: Machine Learning 3 February 2015 5 February 2015 So far in the class Decision trees Inductive bias: use a combination of small number of features Nearest neighbor
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationTotally Corrective Boosting Algorithms that Maximize the Margin
Totally Corrective Boosting Algorithms that Maximize the Margin Manfred K. Warmuth 1 Jun Liao 1 Gunnar Rätsch 2 1 University of California, Santa Cruz 2 Friedrich Miescher Laboratory, Tübingen, Germany
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationEnsemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods
The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which
More informationGame Theory, On-line prediction and Boosting (Freund, Schapire)
Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline
More informationRandom Classification Noise Defeats All Convex Potential Boosters
Philip M. Long Google Rocco A. Servedio Computer Science Department, Columbia University Abstract A broad class of boosting algorithms can be interpreted as performing coordinate-wise gradient descent
More information1 Review of the Perceptron Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds
More information2D1431 Machine Learning. Bagging & Boosting
2D1431 Machine Learning Bagging & Boosting Outline Bagging and Boosting Evaluating Hypotheses Feature Subset Selection Model Selection Question of the Day Three salesmen arrive at a hotel one night and
More information1 Boosting. COS 511: Foundations of Machine Learning. Rob Schapire Lecture # 10 Scribe: John H. White, IV 3/6/2003
COS 511: Foundations of Machine Learning Rob Schapire Lecture # 10 Scribe: John H. White, IV 3/6/003 1 Boosting Theorem 1 With probablity 1, f co (H), and θ >0 then Pr D yf (x) 0] Pr S yf (x) θ]+o 1 (m)ln(
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationThe Boosting Approach to Machine Learning. Rob Schapire Princeton University. schapire
The Boosting Approach to Machine Learning Rob Schapire Princeton University www.cs.princeton.edu/ schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples
More informationPerceptron (Theory) + Linear Regression
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A
More informationCSC321 Lecture 4 The Perceptron Algorithm
CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationMulticlass Boosting with Repartitioning
Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationBBM406 - Introduc0on to ML. Spring Ensemble Methods. Aykut Erdem Dept. of Computer Engineering HaceDepe University
BBM406 - Introduc0on to ML Spring 2014 Ensemble Methods Aykut Erdem Dept. of Computer Engineering HaceDepe University 2 Slides adopted from David Sontag, Mehryar Mohri, Ziv- Bar Joseph, Arvind Rao, Greg
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More information