TDT4173 Machine Learning

Similar documents
TDT4173 Machine Learning

Voting (Ensemble Methods)

What makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity

CS7267 MACHINE LEARNING

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning. Ensemble Methods. Manfred Huber

ECE 5424: Introduction to Machine Learning

Variance Reduction and Ensemble Methods

ECE 5984: Introduction to Machine Learning

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Data Mining und Maschinelles Lernen

Learning with multiple models. Boosting.

FINAL: CS 6375 (Machine Learning) Fall 2014

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Lecture 8. Instructor: Haipeng Luo

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

VBM683 Machine Learning

A Brief Introduction to Adaboost

Bagging and Other Ensemble Methods

Statistical Machine Learning from Data

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Stochastic Gradient Descent

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

TDT 4173 Machine Learning and Case Based Reasoning. Helge Langseth og Agnar Aamodt. NTNU IDI Seksjon for intelligente systemer

Boosting: Foundations and Algorithms. Rob Schapire

2D1431 Machine Learning. Bagging & Boosting

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Boosting & Deep Learning

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Learning Theory Continued

Ensemble Methods: Jay Hyer

CSCI-567: Machine Learning (Spring 2019)

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

Boos$ng Can we make dumb learners smart?

Lossless Online Bayesian Bagging

AdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology

Ensemble Methods for Machine Learning

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Algorithm-Independent Learning Issues

Data Warehousing & Data Mining

PAC-learning, VC Dimension and Margin-based Bounds

Ensembles of Classifiers.

PDEEC Machine Learning 2016/17

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes. December 2, 2012

Learning from Examples

Hierarchical Boosting and Filter Generation

Linear Classifiers: Expressiveness

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Ensembles. Léon Bottou COS 424 4/8/2010

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

Recitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14

1 Handling of Continuous Attributes in C4.5. Algorithm

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017

Learning theory. Ensemble methods. Boosting. Boosting: history

More about the Perceptron

Chapter 14 Combining Models

1 Handling of Continuous Attributes in C4.5. Algorithm

Data Mining and Analysis: Fundamental Concepts and Algorithms

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Neural Networks and Ensemble Methods for Classification

Linear and Logistic Regression. Dr. Xiaowei Huang

StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

Logistic Regression and Boosting for Labeled Bags of Instances

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

Support Vector Machines

Cross Validation & Ensembling

Linear Models for Classification

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

18.9 SUPPORT VECTOR MACHINES

Midterm Exam Solutions, Spring 2007

Computational Learning Theory

Classification of Longitudinal Data Using Tree-Based Ensemble Methods

Midterm, Fall 2003

Computational Learning Theory

Bias-Variance in Machine Learning

Notation P(Y ) = X P(X, Y ) = X. P(Y X )P(X ) Teorema de Bayes: P(Y X ) = CIn/UFPE - Prof. Francisco de A. T. de Carvalho P(X )

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012

Online Learning and Sequential Decision Making

COMS 4771 Lecture Boosting 1 / 16

Logistic Regression. Machine Learning Fall 2018

Numerical Learning Algorithms

Final Exam, Fall 2002

PAC-learning, VC Dimension and Margin-based Bounds

COMS 4771 Introduction to Machine Learning. Nakul Verma

CS 6375 Machine Learning

Transcription:

TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning

Outline 1 Wrap-up from last time 2 Ensemble-methods Background Bagging Boosting 2 TDT4173 Machine Learning

Wrap-up from last time Summary-points from last lesson Evaluating hypothesis Sample error, true error Estimators Confidence intervals for observed hypothesis error The central limit Theorem Comparing hypothesis Comparing learners Computational Learning Theory Bounding the true error PAC learning 3 TDT4173 Machine Learning

Ensemble-methods Background Fundamental setup 1 Generate a collection (ensemble) of classifiers from a given dataset. 2 Each classifier is given a different dataset. The datasets are generated by resampling 3 The final classification is defined by voting Nice property Each classifier in the ensemble may be fairly simple; the ensemble of classifiers can still be made to learn anything well. Only requirement to the weak learner is that it is able to do better than choosing blindly. 4 TDT4173 Machine Learning

Weak learner example Background Consider a two-dimentional dataset of two classes. 1 Data for a two class problem 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.5 0 0.5 1 What is a weak classifier in this case? 5 TDT4173 Machine Learning

Weak learner example Background Consider a two-dimentional dataset of two classes. 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.5 0 0.5 1 One example: The half-space classifier 5 TDT4173 Machine Learning

Weak learner example Background Consider a two-dimentional dataset of two classes. 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.5 0 0.5 1 One example: The half-space classifier Which aggregates well 5 TDT4173 Machine Learning

Background How to learn the half-space classifier Consider a dataset D = {D 1,D 2,...,D m }, each D j is an observation (x 1,x 2 ),c). c is either red or blue. For d in each dimension: For j over each observation: 1 Let b(d, j) be x d from observation D j. 2 r l No. red observations with x d b(d, j) + No. blue observations with x d > b(d, j) 3 quality(d, j) max {r l, m r l } (d 0,j 0 ) argmax d,j quality(d,j) Define the half-plane classifier at x d0 = b(d 0,j 0 ) 1 1 Classifier for data in iteration 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 0.5 0 0.5 1 1 1 0.5 0 0.5 1 6 TDT4173 Machine Learning

Background Two methods of ensemble learning We will consider two methods of ensemble learning: Bagging: Bagging (Bootstrap Aggregating) classifiers means to generate new datasets by drawing at random from original data (with replacement) Boosting: Data for the classifiers are adaptively resampled (re-weighted), so that data-points that classifiers have had problems classifying are given more weight. 7 TDT4173 Machine Learning

Bootstrapping Ensemble-methods Bagging What is a bootstrap sample? Consider a data set D = {D 1,D 2,...,D m } with m datapoints A boostrap sample D i can be created from D by choosing m points from D on random. The selection is done with replacement. 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 0.5 0 0.5 1 1 1 0.5 0 0.5 1 8 TDT4173 Machine Learning

Bagging Ensemble-methods Bagging Bagging (= Bootstrap aggregating) produces replications of the training set by sampling with replacement. Each replication of the training set has the same size as the original set, but some examples can appear more than once while others don t appear at all. A classifier is generated from each replication. All classifiers are used to classify each sample from the test set using a majority voting scheme. 9 TDT4173 Machine Learning

Bagging more formally Bagging 1 Start with dataset D 2 Generate B bootstrap samples D 1,D 2,...,D B. 3 Learn weak classifier for each dataset: c i L(D i ); L( ) is the weak learner; c i ( ) : X { 1,+1} 4 The aggregated classifier of a new instance x simply found by taking a majority vote over classifiers: { B } c(x) sign c b (x) b=1 DEMO 10 TDT4173 Machine Learning

Bagging Summary Ensemble-methods Bagging When is it useful Bagging should only be used if the learner is unstable. A learner is said to be unstable if a small change in the training set yields large variations in the classification. Examples of unstable learners: neural networks, decision trees. Example of a stable learner: k-nearest neighboors. Bagging properties 1 Improves the estimate if the learning algorithm is unstable. 2 Reduces the variance of predictions. 3 Degrades the estimate if the learning algorithm is stable. 11 TDT4173 Machine Learning

Boosting Ensemble-methods Boosting 1 Main idea as for bagging: Invent new datasets, and learn separate classifiers for each dataset 2 Clever trick: The data is now adaptively resampled, so that previously misclassified examples count more in next dataset, previously well-classified observations are weighted less. 3 The classifiers are aggregated by adding them together, but each classifier is weighted differently 12 TDT4173 Machine Learning

Boosting How to learn the WEIGHTED half-space classifier Consider a dataset D = {D 1,D 2,...,D m }, each D j is an observation (x 1,x 2 ),c). c is either red or blue. For d in each dimension: For j over each observation: 1 Let b(d, j) be x d from observation D j. 2 r l Sum of weight of red observations with x d b(d, j) + Sum of weight of blue observations with x d > b(d, j) 3 quality(d, j) max { r l, i w(i) r } l (d 0,j 0 ) argmax d,j quality(d,j) Define the half-plane classifier at x d0 = b(d 0,j 0 ) 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 1 0.5 0 0.5 1 1 1 0.5 0 0.5 1 13 TDT4173 Machine Learning

Boosting Ensemble-methods Boosting Boosting produces replications of the training set by re-weighting the importance of each observation in the original dataset. A classifier is generated from each replication using the weighted data-set as input. All classifiers are used to classify each sample from the test set. Each classifiers get a weight based on its calculated accuracy on the weighted training set. 14 TDT4173 Machine Learning

Boosting more formally Boosting 1 Start with dataset D and weights w 1 = [1/m,1/m,...,1/m]. 2 Repeat for t = 1,...,T: 1 c t is the classifier learned from the weighted data (D, w t ). 2 ǫ t is the error of c t measured on the weighted data (D, w t ). ( 3 Calculate α t 1 2 ln 1 ǫt ǫ t ). 4 Update weights (1): { exp( αt ) if x w t+1 (i) w t (i) i was classified correctly exp(+α t ) otherwise 5 Update weights (2): Normalize w t+1. 3 The aggregated classifier of a new instance x is found by taking a weighted vote over classifiers: { T } c(x) sign α t c t (x) DEMO t=1 15 TDT4173 Machine Learning

Boosting Summary Ensemble-methods Boosting Boosting properties The error rate of c on the un-weighted training instances approaches zero exponentially quickly as T increases Boosting forces the classifier to have at least 50% accuracy on the re-weighted training instances. This causes the learning system to produce a quite different classifier on the following trial (as next time, we focus on the mis-classifications of this round). This leads to an extensive exploration of the classifier space. Boosting is more clever than bagging, as it will not reduce classification performance. 16 TDT4173 Machine Learning

Boosting Plans for the remainder of the course Next week off Since we have covered both boosting and SVMs today we are ahead of schedule. Therefore, the lecture on November 14th is cancelled, and the next time we meet is for the last lecture (November 21st). Plans for Last Lecture: Wrap-up of course Q & A I will go through last years examination which is also the last assignment. 17 TDT4173 Machine Learning