An overview of Boosting. Yoav Freund UCSD
|
|
- Angel Adams
- 5 years ago
- Views:
Transcription
1 An overview of Boosting Yoav Freund UCSD
2 Plan of talk Generative vs. non-generative modeling Boosting Alternating decision trees Boosting and over-fitting Applications 2
3 Toy Example Computer receives telephone call Measures Pitch of voice Decides gender of caller Male Human Voice Female 3
4 Generative modeling mean1 mean2 Probability var1 var2 Voice Pitch
5 Discriminative approach [Vapnik 85] No. of mistakes Voice Pitch
6 Ill-behaved data No. of mistakes Probability mean1 mean2 Voice Pitch
7 Traditional Statistics vs. Machine Learning Machine Learning Data Statistics Estimated world state Decision Theory Predictions Actions
8 Comparison of methodologies Model Generative Discriminative Goal Performance measure Mismatch problems Probability estimates Likelihood Outliers Classification rule Misclassification rate Misclassifications 8
9 Boosting
10 A weighted training set Feature vectors Binary labels {-1,+1} Positive weights ( x 1,y 1,w 1 ),( x 2,y 2,w 2 ),, ( x m,y m,w m ) 10
11 A weak learner Weighted training set ( ) ( x 1,y 1,w 1 ), x 2,y 2,w 2 ( ),, x m,y m,w m Weak Learner A weak rule h instances x 1,x 2,,x m h predictions ŷ 1, ŷ 2,, ŷ m ; ŷ i { 1,+1} The weak requirement: m y i ˆ y i w i i =1 m w i=1 i > γ > 0
12 The boosting process ( x 1,y 1,1), ( x 2, y 2,1),,( x n, y n,1) weak learner h ( x 1,y 1,w 1 ), ( x 2,y 2,w 2 ),,( x n, y n,w n ) ( x 1,y 1,w 1 ), ( x 2,y 2,w 2 ),,( x n,y n,w n ) weak learner h 2 h 3 T 1 T 1 T 1 ( x 1, y 1,w 1 ), ( x 2,y 2,w 2 ),,( x n,y n,w n ) h T... Prediction(x) = sign(f T (x))
13 AFormalDescriptionofBoosting [Slides from Rob Schapire] given training set (x 1, y 1 ),...,(x m, y m ) y i { 1, +1} correct label of instance x i X for t =1,...,T : construct distribution D t on {1,...,m} find weak classifier ( rule of thumb ) h t : X { 1, +1} with small error ϵ t on D t : ϵ t =Pr i Dt [h t (x i ) y i ] output final classifier H final
14 AdaBoost [Slides from Rob Schapire] [with Freund] constructing D t : D 1 (i) =1/m given D t and h t : D t+1 (i) = D t(i) Z t { e α t if y i = h t (x i ) e α t if y i h t (x i ) = D t(i) Z t exp( α t y i h t (x i )) where Z t =normalizationfactor ( ) 1 α t = 1 2 ln ϵt > 0 final classifier: ( ) H final (x) =sign α t h t (x) t ϵ t
15 Toy Example [Slides from Rob Schapire] D 1 weak classifiers = vertical or horizontal half-planes
16 Round 1 [Slides from Rob Schapire] h 1 D 2 ε 1 =0.30 α 1 =0.42
17 Round 2 [Slides from Rob Schapire] h 2 D 3 ε 2 =0.21 α 2 =0.65
18 Round 3 [Slides from Rob Schapire] h 3 ε 3 =0.14 α 3 =0.92
19 Final Classifier [Slides from Rob Schapire] H = sign final =
20 Analyzing the Training Error [with Freund] Theorem: write ϵ t as 1 2 γ t [ γ t = edge ] then [ 2 ] ϵ t (1 ϵ t ) training error(h final ) t = t exp 1 4γ 2 t ( 2 t [Slides from Rob Schapire] γ 2 t ) so: if t : γ t γ > 0 then training error(h final ) e 2γ2 T AdaBoost is adaptive: does not need to know γ or T apriori can exploit γ t γ
21 Boosting block diagram Strong Learner Accurate Rule Weak Learner Weak rule Example weights Booster
22 Boosting with specialists Specialists predict only when they are confident. In addition to {-1,+1} specialists use 0 to indicate no-prediction. As boosting allows both positive and negative weights: we restrict attention to specialists that output {0,1}. 22
23 Adaboost as variational optimization Weak Rule: h t : X {0,1} Label: y { 1,+1} Training set: {(x 1, y 1,1),(x 2, y 2,1),,(x n, y n + - α t = 1 2 ln ε + i:h t (x i )=1,y i =1 w t i t ε + w i i:h t (x i )=1,y i = 1 F t = F t 1 + α t h t
24 AdaBoost as variational optimization. x = input, a scalar, y = output, -1 or +1 (x1,y1),(x2,y2),,(xn,yn) = training set Ft-1 = Strong rule after t-1 boosting iterations. ht = Weak rule produced at iteration t ht Ft-1 x 24
25 AdaBoost as variational optimization. Ft-1 = Strong rule after t-1 boosting iterations. ht = Weak rule produced at iteration t F t 1 (x) + 0h t (x) ht Ft-1 x 25
26 AdaBoost as variational optimization. Ft-1 = Strong rule after t-1 boosting iterations. ht = Weak rule produced at iteration t F t 1 (x) + 0.4h t (x) ht Ft-1 x 26
27 AdaBoost as variational optimization. Ft-1 = Strong rule after t-1 boosting iterations. ht = Weak rule produced at iteration t F t (x) = F t 1 (x) + 0.8h t (x) ht Ft-1 x 27
28 Margins Fix a set of weak rules:! h(x) = (h 1 (x),h 2 (x),,h T (x)) - - Represent the i'th example x i with outputs of the weak rules:! h i Labels: y { 1,+1} Training set: (! h 1, y 1 ),(! h 2, y 2 ),,(! h n, y n ) Goal : Find a weight vector! α = (α 1,,α T ) that minimizes number of training mistakes! α (! h i, y i ) ! α reflect Mistake (y i! hi, y i ) - - Correct Project Cumulative # examples Mistakes margin! y i " hi " α Correct Margin
29 Boosting as gradient descent Our goal is to minimize the number of mistakes (0-1 loss) Unfortunately, the step function has deriv. zero at all points other than at zero where the derivative is undefined. Ada boost uses an upper bound on the 0-1 loss. Adaboost : e margin Loss Adaboost minimizes the exponential loss which is an upper bound. Finds the vector \alpha through coordinate-wise gradient descent. Weak rules are added one at a time. 0-1 loss Mistakes Correct Margin
30 Adaboost as gradient descent Weak rules are added one at a time. Fixing the alphas for 1..t-1, find alpha for the new rule (ht) Weak rule defines how each example would move = increase or decrease the margin. 0-1 loss new margin of (x i, y i ) is y i F t 1 (x i ) + α y i h t (x i ) new total loss = n exp y i F t 1 (x i ) + αh t (x i ) i=1 derivative of total loss wrt α : n α α =0 i=1 ( ( )) ( ( )) exp y i F t 1 (x i ) + αh t (x i ) weight of example (x i,y i ) n!## "## $ = y i h t (x i ) exp( -y i F t-1 (x i )) i=1 Mistakes Correct Margin
31 Logitboost as gradient descent Also called Gentle-Boost and Logit Boost, Hastie, Freedman & Tibshirani margin=m Logitboost loss Instead of the loss e m define loss to be log 1+ e m ( ) ( ) e m ( ) m When margin 0 : log 1+ e m When margin 0 :log 1+ e m Logitboost weight Adaboost (loss and weight) new margin of (x i, y i ) is y i F t 1 (x i ) + α y i h t (x i ) new total loss = n log 1+ exp y i ( F t 1 (x i ) + αh t (x i i=1 derivative of total loss wrt α : n α α =0 i=1 log 1+ exp y i F t 1 (x i ) + αh t (x i ) ( ( ( )) weight of example (x i,y i )!### "### $ n exp( -y = y i h t (x i ) i F t-1 (x i )) 1+ exp -y i F t-1 (x i ) i=1 ( ) Mistakes Correct Margin
32 Noise resistance Adaboost: perform well when a achievable error rate is close to zero (almost consistent case). Errors = examples with negative margins, get very large weights, can overfit. Logitboost: Inferior to adaboost when achievable error rate is close to zero. Often better than Adaboost when achievable error rate is high. Weight on any example never larger than 1. 32
33 Gradient-Boost / AnyBoost A general recipe for learning by incremental optimization Applies to any (differentiable) loss function. labeled example is (x, y) (can be anything). t prediction is of the form: F t (x) = α i h i (x) i=1 n Loss: L( F t (x), y) Total Loss so far is: L F t 1 (x j ), y j j=1 Weight of example (x, y) : z L ( F t 1(x) + z, y) ( ) At each iteration: calc example weights call weak learner with weighted example Add generated weak rule to create new rule: F t = F t 1 + α t h t 33
34 Beach Summer Winter Styles of weak learners Simple (stumps)!# Generalist "# $ α -1 Winter Season Summer +1 Specialists!##" ## $ Season Season = α α Complex (fully grown trees) α -1 Prison Location Winter Season +1 Summer Ski Slope Everything in between (neural networks, Nearest neighbors, Naive Bayes,..)
35 Alternating Decision Trees Joint work with Llew Mason
36 Decision Trees Y X>3 +1 no -1 yes Y>5 5-1 no -1 yes X
37 Decision tree as a sum -0.2 Y no X>3 yes sign no Y>5 yes X
38 An alternating decision tree Y sign no Y<1 X>3 0.0 yes +0.7 no -0.1 yes no Y>5 yes X
39 Example: Medical Diagnostics Cleve dataset from UC Irvine database. Heart disease diagnostics (+1=healthy,-1=sick) 13 features from tests (real valued and discrete). 303 instances.
40 Adtree for Cleveland heart-disease diagnostics problem
41 Cross-validated accuracy Learning algorithm Number of splits Average test error Test error variance ADtree % 0.6% C % 0.5% C5.0 + boosting Boost Stumps % 0.5% % 0.8%
42 Applications of Boosting
43 Applications of Boosting Academic research Applied research Commercial deployment
44 Academic research % test error rates Database Other Boosting Error reduction Cleveland 27.2 (DT) % Promoters 22.0 (DT) % Letter 13.8 (DT) % Reuters 4 5.8, 6.0, ~60% Reuters , 12.1, ~40%
45 Schapire, Singer, Gorin 98 Applied research AT&T, How may I help you? Classify voice requests Voice -> text -> category Fourteen categories Area code, AT&T service, billing credit, calling card, collect, competitor - competitor, - dial assistance, - directory, - how to dial, - person to person, - rate, - third party, - time charge
46 Example transcribed sentences Yes I d like to place a collect call long distance please Ø collect Operator I need to make a call but I need to bill it to my office Ø third party Yes I d like to place a call on my master card please Ø calling card I just called a number in Sioux city and I musta rang the wrong number because I got the wrong party and I would like to have that taken off my bill Ø billing credit
47 Weak rules generated by boostexter Categories Calling card Collect call Third party Weak Rule Word occurs Word does not occur
48 Results 7844 training examples hand transcribed 1000 test examples hand / machine transcribed Accuracy with 20% rejected Machine transcribed: 75% Hand transcribed: 90%
49 Freund, Mason, Rogers, Pregibon, Cortes 2000 Commercial deployment Distinguish business/residence customers Using statistics from call-detail records Alternating decision trees Combines very simple rules Can over-fit, cross validation used to stop
50 Massive datasets (for 1997) 260M calls / day 230M telephone numbers Label unknown for ~30% Hancock: software for computing statistical signatures. 100K randomly selected training examples, ~10K is enough Training takes about 2 hours. Generated classifier has to be both accurate and efficient
51 Alternating tree for buizocity
52 Alternating Tree (Detail)
53 Precision/recall graphs Accuracy Score
54 Face Detection Viola and Jones / 2001
55 Face Detection / Viola and Jones Live demo in 2001 conference. Higher accuracy than manually designed state of art Real -time using 2001 laptop Many Uses Standard feature in cameras User Interfaces Security Systems Video Compression Image Database Analysis
56 Face Detection as a Filtering process Larger Scale Smallest Scale 50,000 Locations/Scales Most Negative
57 Classifier is Learned from Labeled Data Example: 28x28 image patch Label: Face / Non face 5000 faces, 10 8 non faces Faces are normalized Scale, translation Rotation remains
58 Image Features Rectangle filters Similar to Haar wavelets Papageorgiou, et al. h t (x i ) = 1 if f t(x i ) > θ t 0 otherwise Very fast to compute using integral image. Unique Binary Features Combined using adaboost
59 Cascaded boosting Features combined using Adaboost Find best weak rule by exhaustive search. Ran for 2 days on a 250 node cluster For detection, features combined in a cascade: IMAGE SUB-WINDOW 50% 1 Feature 5 Features F NON-FACE 20% 2% 20 Features Runs in real time, 15 FPS, on laptop F NON-FACE F NON-FACE FACE 59
60 Paul Viola and Mike Jones 60
61 Summary Boosting is a computational method for learning accurate classifiers Resistance to over-fit explained by margins Boosting has been applied successfully to a variety of classification problems
Boosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationBBM406 - Introduc0on to ML. Spring Ensemble Methods. Aykut Erdem Dept. of Computer Engineering HaceDepe University
BBM406 - Introduc0on to ML Spring 2014 Ensemble Methods Aykut Erdem Dept. of Computer Engineering HaceDepe University 2 Slides adopted from David Sontag, Mehryar Mohri, Ziv- Bar Joseph, Arvind Rao, Greg
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationBoosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi
Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationCOMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017
COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University BOOSTING Robert E. Schapire and Yoav
More informationBackground. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan
Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationBoosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationOutline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual
Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationGradient Boosting (Continued)
Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive
More informationThe Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016
The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationEnsembles. Léon Bottou COS 424 4/8/2010
Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationi=1 = H t 1 (x) + α t h t (x)
AdaBoost AdaBoost, which stands for ``Adaptive Boosting", is an ensemble learning algorithm that uses the boosting paradigm []. We will discuss AdaBoost for binary classification. That is, we assume that
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationCOMS 4771 Lecture Boosting 1 / 16
COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationBoos$ng Can we make dumb learners smart?
Boos$ng Can we make dumb learners smart? Aarti Singh Machine Learning 10-601 Nov 29, 2011 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Why boost weak learners? Goal: Automa'cally categorize type
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationBoosting: Algorithms and Applications
Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition
More informationThe AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m
) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationWhat makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity
What makes good ensemble? CS789: Machine Learning and Neural Network Ensemble methods Jakramate Bootkrajang Department of Computer Science Chiang Mai University 1. A member of the ensemble is accurate.
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006
More informationMidterm Exam Solutions, Spring 2007
1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you
More informationEnsemble Methods: Jay Hyer
Ensemble Methods: committee-based learning Jay Hyer linkedin.com/in/jayhyer @adatahead Overview Why Ensemble Learning? What is learning? How is ensemble learning different? Boosting Weak and Strong Learners
More informationThe Boosting Approach to Machine Learning. Rob Schapire Princeton University. schapire
The Boosting Approach to Machine Learning Rob Schapire Princeton University www.cs.princeton.edu/ schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples
More informationReconnaissance d objetsd et vision artificielle
Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationA Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University
A Gentle Introduction to Gradient Boosting Cheng Li chengli@ccs.neu.edu College of Computer and Information Science Northeastern University Gradient Boosting a powerful machine learning algorithm it can
More informationIntroduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:
More informationPCA FACE RECOGNITION
PCA FACE RECOGNITION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Shree Nayar (Columbia) including their own slides. Goal
More informationA Brief Introduction to Adaboost
A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman. 1 Outline Background Adaboost Algorithm Theory/Interpretations 2 What s So Good
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More information2D Image Processing Face Detection and Recognition
2D Image Processing Face Detection and Recognition Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationOpen Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers
JMLR: Workshop and Conference Proceedings vol 35:1 8, 014 Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers Balázs Kégl LAL/LRI, University
More informationLinear classifiers: Logistic regression
Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More information10701/15781 Machine Learning, Spring 2007: Homework 2
070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,
More informationMIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on
More informationRobotics 2. AdaBoost for People and Place Detection. Kai Arras, Cyrill Stachniss, Maren Bennewitz, Wolfram Burgard
Robotics 2 AdaBoost for People and Place Detection Kai Arras, Cyrill Stachniss, Maren Bennewitz, Wolfram Burgard v.1.1, Kai Arras, Jan 12, including material by Luciano Spinello and Oscar Martinez Mozos
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 24
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationFinal Exam, Fall 2002
15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work
More informationName (NetID): (1 Point)
CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationFinal Exam, Spring 2006
070 Final Exam, Spring 2006. Write your name and your email address below. Name: Andrew account: 2. There should be 22 numbered pages in this exam (including this cover sheet). 3. You may use any and all
More informationA Tutorial on Support Vector Machine
A Tutorial on School of Computing National University of Singapore Contents Theory on Using with Other s Contents Transforming Theory on Using with Other s What is a classifier? A function that maps instances
More informationCS446: Machine Learning Fall Final Exam. December 6 th, 2016
CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationRepresenting Images Detecting faces in images
11-755 Machine Learning for Signal Processing Representing Images Detecting faces in images Class 5. 15 Sep 2009 Instructor: Bhiksha Raj Last Class: Representing Audio n Basic DFT n Computing a Spectrogram
More informationA ROBUST ENSEMBLE LEARNING USING ZERO-ONE LOSS FUNCTION
Journal of the Operations Research Society of Japan 008, Vol. 51, No. 1, 95-110 A ROBUST ENSEMBLE LEARNING USING ZERO-ONE LOSS FUNCTION Natsuki Sano Hideo Suzuki Masato Koda Central Research Institute
More informationABC-Boost: Adaptive Base Class Boost for Multi-class Classification
ABC-Boost: Adaptive Base Class Boost for Multi-class Classification Ping Li Department of Statistical Science, Cornell University, Ithaca, NY 14853 USA pingli@cornell.edu Abstract We propose -boost (adaptive
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationCS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine
CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationAnomaly Detection. Jing Gao. SUNY Buffalo
Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationPerceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015
Perceptron Subhransu Maji CMPSCI 689: Machine Learning 3 February 2015 5 February 2015 So far in the class Decision trees Inductive bias: use a combination of small number of features Nearest neighbor
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMulticlass Boosting with Repartitioning
Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y
More informationCS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.
CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationRobotics 2 AdaBoost for People and Place Detection
Robotics 2 AdaBoost for People and Place Detection Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard v.1.0, Kai Arras, Oct 09, including material by Luciano Spinello and Oscar Martinez Mozos
More informationECE 661: Homework 10 Fall 2014
ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationStat 502X Exam 2 Spring 2014
Stat 502X Exam 2 Spring 2014 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This exam consists of 12 parts. I'll score it at 10 points per problem/part
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More information