COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017
|
|
- Elijah Day
- 5 years ago
- Views:
Transcription
1 COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University
2 BOOSTING Robert E. Schapire and Yoav Freund, Boosting: Foundations and Algorithms, MIT Press, See this textbook for many more details. (I borrow some figures from that book.)
3 BAGGING CLASSIFIERS Algorithm: Bagging binary classifiers Given (x 1, y 1 ),..., (x n, y n ), x X, y { 1, 1} For b = 1,..., B Sample a bootstrap dataset Bb of size n. For each entry in B b, select (x i, y i) with probability 1. Some (xi, yi) will repeat and some won t appear in Bb. n Learn a classifier fb using data in B b. Define the classification rule to be ( B ) f bag (x 0 ) = sign f b (x 0 ). b=1 With bagging, we observe that a committee of classifiers votes on a label. Each classifier is learned on a bootstrap sample from the data set. Learning a collection of classifiers is referred to as an ensemble method.
4 BOOSTING How is it that a committee of blockheads can somehow arrive at highly reasoned decisions, despite the weak judgment of the individual members? Schapire & Freund, Boosting: Foundations and Algorithms Boosting is another powerful method for ensemble learning. It is similar to bagging in that a set of classifiers are combined to make a better one. It works for any classifier, but a weak one that is easy to learn is usually chosen. (weak = accuracy a little better than random guessing) Short history 1984 : Leslie Valiant and Michael Kearns ask if boosting is possible : Robert Schapire creates first boosting algorithm : Yoav Freund creates an optimal boosting algorithm : Freund and Schapire create AdaBoost (Adaptive Boosting), the major boosting algorithm.
5 BAGGING VS BOOSTING (OVERVIEW) Bootstrap sample f 3 (x) Weighted sample f 3 (x) Bootstrap sample f 2 (x) Weighted sample f 2 (x) Bootstrap sample f 1 (x) Weighted sample f 1 (x) Training sample Training sample Bagging Boosting
6 THE ADABOOST ALGORITHM (SAMPLING VERSION) Weighted sample weighted error ε 2 Weighted sample weighted error ε 1 Weighted sample Sample and classify B3 Sample and classify B2 Sample and classify B1 α 3, f 3 (x) α 2, f 2 (x) α 1, f 1 (x) Training sample Boosting ( T ) f boost (x 0 ) = sign α t f t (x 0 ) t=1
7 THE ADABOOST ALGORITHM (SAMPLING VERSION) Algorithm: Boosting a binary classifier Given (x 1, y 1 ),..., (x n, y n ), x X, y { 1, 1}, set w 1 (i) = 1 n for i = 1 : n For t = 1,..., T 1. Sample a bootstrap dataset B t of size n according to distribution w t. Notice we pick (x i, y i) with probability w t(i) and not 1 n. 2. Learn a classifier f t using data in B t. 3. Set ɛ t = ( n i=1 wt(i)1{yi ft(xi)} and αt = 1 ln 1 ɛ t 2 ɛ t ). 4. Scale ŵ t1(i) = w t(i)e αty i f t(x i ) and set w t1(i) = ŵt1(i) j ŵt1(j). Set the classification rule to be f boost (x 0 ) = sign ( T ) t=1 α t f t (x 0 ). Comment: Description usually simplified to learn classifier f t using distribution w t.
8 BOOSTING A DECISION STUMP (EXAMPLE 1) Original data Uniform distribution, w 1 Learn weak classifier Here: Use a decision stump x 1 > 1.7 ŷ = 1 ŷ = 3
9 BOOSTING A DECISION STUMP (EXAMPLE 1) Round 1 classifier Weighted error: ɛ 1 = 0.3 Weight update: α 1 = 0.42
10 BOOSTING A DECISION STUMP (EXAMPLE 1) Weighted data After round 1
11 BOOSTING A DECISION STUMP (EXAMPLE 1) Round 2 classifier Weighted error: ɛ 2 = 0.21 Weight update: α 2 = 0.65
12 BOOSTING A DECISION STUMP (EXAMPLE 1) Weighted data After round 2
13 BOOSTING A DECISION STUMP (EXAMPLE 1) Round 2 classifier Weighted error: ɛ 3 = 0.14 Weight update: α 3 = 0.92
14 BOOSTING A DECISION STUMP (EXAMPLE 1) Classifier after three rounds 0.42 x 0.65 x 0.92 x
15 BOOSTING A DECISION STUMP (EXAMPLE 2) Example problem Random guessing 50% error Decision stump 45.8% error Full decision tree 24.7% error Boosted stump 5.8% error
16 BOOSTING Point = one dataset. Location = error rate w/ and w/o boosting. The boosted version of the same classifier almost always produces better results.
17 BOOSTING (left) Boosting a bad classifier is often better than not boosting a good one. (right) Boosting a good classifier is often better, but can take more time.
18 BOOSTING AND FEATURE MAPS Q: What makes boosting work so well? A: This is a wellstudied question. We will present one analysis later, but we can also give intuition by tying it in with what we ve already learned. The classification for a new x 0 from boosting is ( T ) f boost (x 0 ) = sign t=1 α t f t (x 0 ). Define φ(x) = [ f 1 (x),..., f T (x)], where each f t (x) { 1, 1}. We can think of φ(x) as a high dimensional feature map of x. The vector α = [α 1,..., α T ] corresponds to a hyperplane. So the classifier can be written f boost (x 0 ) = sign(φ(x 0 ) α). Boosting learns the feature mapping and hyperplane simultaneously.
19 APPLICATION: FACE DETECTION
20 FACE DETECTION (VIOLA & JONES, 2001) Problem: Locate the faces in an image or video. Processing: Divide image into patches of different scales, e.g., 24 24, 48 48, etc. Extract features from each patch. Classify each patch as face or no face using a boosted decision stump. This can be done in realtime, for example by your digital camera (at 15 fps). One patch from a larger image. Mask it with many feature extractors. Each pattern gives one number, which is the sum of all pixels in black region minus sum of pixels in white region (total of 45,000 features).
21 FACE DETECTION (EXAMPLE RESULTS)
22 ANALYSIS OF BOOSTING
23 ANALYSIS OF BOOSTING Training error theorem We can use analysis to make a statement about the accuracy of boosting on the training data. Theorem: Under the AdaBoost framework, if ɛ t is the weighted error of classifier f t, then for the classifier f boost (x 0 ) = sign( T t=1 α tf t (x 0 )), training error = 1 n n i=1 ( 1{y i f boost (x i )} exp 2 T ( 1 2 ɛ t) ). 2 t=1 Even if each ɛ t is only a little better than random guessing, the sum over T classifiers can lead to a large negative value in the exponent when T is large. For example, if we set: ɛ t = 0.45, T = 1000 training error
24 PROOF OF THEOREM Setup We break the proof into three steps. It is an application of the fact that if a < b }{{} Step 2 and b < c }{{} Step 3 then a < c }{{} conclusion Step 1 calculates the value of b. Steps 2 and 3 prove the two inequalities. Also recall the following step from AdaBoost: Update ŵ t1 (i) = w t (i)e αtyi ft(xi). Normalize w t1 (i) = ŵt1(i) j ŵt1(j) Define Z t = j ŵt1(j).
25 PROOF OF THEOREM (a b c) Step 1 We first want to expand the equation of the weights to show that w T1 (i) = 1 n T e yi t=1 αt ft(xi) T t=1 Z := 1 t n yi ht (xi) e T t=1 Z t h T (x) := T α t f t (x i ) t=1 Derivation of Step 1: Notice the update rule: w t1 (i) = 1 αtyi ft(xi) w t (i)e Z t Do the same expansion for w t (i) and continue until reaching w 1 (i) = 1 n, f1(xi) e α1yi w T1 (i) = w 1 (i) e αt yi ft (xi) Z 1 Z T The product T t=1 Z t is b above. We use this form of w T1 (i) in Step 2.
26 PROOF OF THEOREM (a b c) Step 2 Next show the training error of f (T) boost (boosting after T steps) is T t=1 Z t. Currently we know w T1 (i) = 1 e y i h T (x i ) n T t=1 Zt w T1 (i) T t=1 Z t = 1 n e y i h T (x i ) & f (T) boost (x) = sign(h T(x)) Derivation of Step 2: Observe that 0 < e z1 and 1 < e z2 for any z 1 < 0 < z 2. Therefore 1 n 1{y i f (T) boost n (x i)} i=1 }{{} a = 1 n i=1 n i=1 yi ht (xi) e n T w T1 (i) Z t = t=1 a is the training error the quantity we care about. T t=1 Z t }{{} b
27 PROOF OF THEOREM (a b c) Step 3 The final step is to calculate an upper bound on Z t, and by extension T t=1 Z t. Derivation of Step 3: This step is slightly more involved. It also shows why α t := 1 2 ln ( 1 ɛt ɛ t ). Z t = = n i=1 αtyi ft(xi) w t (i)e i : y i=f t(x i) e αt w t (i) i : y i f t(x i) e αt w t (i) = e αt (1 ɛ t ) e αt ɛ t Remember we defined ɛ t = i : y i f t(x i) w t(i), the probability of error for w t.
28 PROOF OF THEOREM (a b c) Derivation of Step 3 (continued): Remember from Step 2 that training error = 1 n n 1{y i f boost (x i )} i=1 T Z t. t=1 and we just showed that Z t = e αt (1 ɛ t ) e αt ɛ t. We want the training error to be small, so we pick α t to minimize Z t. Minimizing, we get the value of α t used by AdaBoost: α t = 1 ( ) 1 2 ln ɛt. ɛ t Plugging this value back in gives Z t = 2 ɛ t (1 ɛ t ).
29 PROOF OF THEOREM (a b c) Derivation of Step 3 (continued): Next, rewrite Z t as Z t = 2 ɛ t (1 ɛ t ) = 1 4( 1 2 ɛ t) e x 1 x Then, use the inequality 1 x e x to conclude that Z t = ( 1 4( 1 2 ɛ t) 2) 1 2 ( e 4( 1 2 ɛt)2) 1 2 = e 2( 1 2 ɛt)2.
30 PROOF OF THEOREM Concluding the right inequality (a b c) Because both sides of Z t e 2( 1 2 ɛt)2 are positive, we can say that T t=1 Z t T t=1 e 2( 1 2 ɛt)2 = e 2 T t=1 ( 1 2 ɛt)2. This concludes the b c portion of the proof. Combining everything {}}{ 1 n training error = 1{y i f boost (x i )} n i=1 a b {}}{ T t=1 Z t c {}}{ e 2 T t=1 ( 1 2 ɛt)2. We set out to prove a < c and we did so by using b as a steppingstone.
31 TRAINING VS TESTING ERROR Q: Driving the training error to zero leads one to ask, does boosting overfit? A: Sometimes, but very often it doesn t! C4.5 (tree) testing error Error AdaBoost testing error AdaBoost training error Rounds of boosting
Learning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationEnsemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods
The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationCOMS 4771 Lecture Boosting 1 / 16
COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationBackground. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan
Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationOutline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual
Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationThe Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016
The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More informationIntroduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:
More informationBoosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi
Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier
More information10701/15781 Machine Learning, Spring 2007: Homework 2
070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationBoosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationBoos$ng Can we make dumb learners smart?
Boos$ng Can we make dumb learners smart? Aarti Singh Machine Learning 10-601 Nov 29, 2011 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Why boost weak learners? Goal: Automa'cally categorize type
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationBoosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationAdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationi=1 = H t 1 (x) + α t h t (x)
AdaBoost AdaBoost, which stands for ``Adaptive Boosting", is an ensemble learning algorithm that uses the boosting paradigm []. We will discuss AdaBoost for binary classification. That is, we assume that
More informationEnsembles. Léon Bottou COS 424 4/8/2010
Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More information1. Implement AdaBoost with boosting stumps and apply the algorithm to the. Solution:
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 October 31, 2016 Due: A. November 11, 2016; B. November 22, 2016 A. Boosting 1. Implement
More informationWhat makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity
What makes good ensemble? CS789: Machine Learning and Neural Network Ensemble methods Jakramate Bootkrajang Department of Computer Science Chiang Mai University 1. A member of the ensemble is accurate.
More informationAn overview of Boosting. Yoav Freund UCSD
An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative modeling Boosting Alternating decision trees Boosting and over-fitting Applications 2 Toy Example Computer receives telephone
More informationMachine Learning, Fall 2011: Homework 5
0-60 Machine Learning, Fall 0: Homework 5 Machine Learning Department Carnegie Mellon University Due:??? Instructions There are 3 questions on this assignment. Please submit your completed homework to
More informationThe AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m
) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ
More informationi=1 cosn (x 2 i y2 i ) over RN R N. cos y sin x
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 November 16, 017 Due: Dec 01, 017 A. Kernels Show that the following kernels K are PDS: 1.
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More information1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods
More informationA Brief Introduction to Adaboost
A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman. 1 Outline Background Adaboost Algorithm Theory/Interpretations 2 What s So Good
More informationPerceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015
Perceptron Subhransu Maji CMPSCI 689: Machine Learning 3 February 2015 5 February 2015 So far in the class Decision trees Inductive bias: use a combination of small number of features Nearest neighbor
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationVBM683 Machine Learning
VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data
More informationData Warehousing & Data Mining
13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationBias-Variance in Machine Learning
Bias-Variance in Machine Learning Bias-Variance: Outline Underfitting/overfitting: Why are complex hypotheses bad? Simple example of bias/variance Error as bias+variance for regression brief comments on
More information1 Handling of Continuous Attributes in C4.5. Algorithm
.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous
More informationEnsembles of Classifiers.
Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline
More informationAnalysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example
Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Günther Eibl and Karl Peter Pfeiffer Institute of Biostatistics, Innsbruck, Austria guenther.eibl@uibk.ac.at Abstract.
More information1 Handling of Continuous Attributes in C4.5. Algorithm
.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous
More informationIntroduction to Boosting and Joint Boosting
Introduction to Boosting and Learning Systems Group, Caltech 2005/04/26, Presentation in EE150 Boosting and Outline Introduction to Boosting 1 Introduction to Boosting Intuition of Boosting Adaptive Boosting
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationChapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees
CSE 473 Chapter 18 Decision Trees and Ensemble Learning Recall: Learning Decision Trees Example: When should I wait for a table at a restaurant? Attributes (features) relevant to Wait? decision: 1. Alternate:
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationCSC321 Lecture 4 The Perceptron Algorithm
CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:
More informationBig Data Analytics. Special Topics for Computer Science CSE CSE Feb 24
Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal
More informationLecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012
CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,
More informationBoosting with decision stumps and binary features
Boosting with decision stumps and binary features Jason Rennie jrennie@ai.mit.edu April 10, 2003 1 Introduction A special case of boosting is when features are binary and the base learner is a decision
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationBoosting: Algorithms and Applications
Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationRecitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14
Brett Bernstein CDS at NYU March 30, 2017 Brett Bernstein (CDS at NYU) Recitation 9 March 30, 2017 1 / 14 Initial Question Intro Question Question Suppose 10 different meteorologists have produced functions
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationCross Validation & Ensembling
Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning
More informationBBM406 - Introduc0on to ML. Spring Ensemble Methods. Aykut Erdem Dept. of Computer Engineering HaceDepe University
BBM406 - Introduc0on to ML Spring 2014 Ensemble Methods Aykut Erdem Dept. of Computer Engineering HaceDepe University 2 Slides adopted from David Sontag, Mehryar Mohri, Ziv- Bar Joseph, Arvind Rao, Greg
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More information2D1431 Machine Learning. Bagging & Boosting
2D1431 Machine Learning Bagging & Boosting Outline Bagging and Boosting Evaluating Hypotheses Feature Subset Selection Model Selection Question of the Day Three salesmen arrive at a hotel one night and
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationBoosting. 1 Boosting. 2 AdaBoost. 2.1 Intuition
Boosting 1 Boosting Boosting refers to a general and effective method of producing accurate classifier by combining moderately inaccurate classifiers, which are called weak learners. In the lecture, we
More informationEnsemble Methods: Jay Hyer
Ensemble Methods: committee-based learning Jay Hyer linkedin.com/in/jayhyer @adatahead Overview Why Ensemble Learning? What is learning? How is ensemble learning different? Boosting Weak and Strong Learners
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationLinear Classifiers and the Perceptron
Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationMulticlass Boosting with Repartitioning
Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationAnnouncements Kevin Jamieson
Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More information1 Review of the Perceptron Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #15 Scribe: (James) Zhen XIANG April, 008 1 Review of the Perceptron Algorithm In the last few lectures, we talked about various kinds
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationLecture 13: Ensemble Methods
Lecture 13: Ensemble Methods Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu 1 / 71 Outline 1 Bootstrap
More informationIntroduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 13 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Multi-Class Classification Mehryar Mohri - Introduction to Machine Learning page 2 Motivation
More informationOptimal and Adaptive Algorithms for Online Boosting
Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2 1 Yahoo! Labs, NYC 2 Computer Science Department, Princeton University Jul 8, 2015 Boosting: An Example
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix
More information