What makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity

Similar documents
CS7267 MACHINE LEARNING

Ensembles. Léon Bottou COS 424 4/8/2010

Learning with multiple models. Boosting.

Voting (Ensemble Methods)

TDT4173 Machine Learning

Ensembles of Classifiers.

Learning Ensembles. 293S T. Yang. UCSB, 2017.

VBM683 Machine Learning

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

A Brief Introduction to Adaboost

ECE 5424: Introduction to Machine Learning

Boosting & Deep Learning

Bagging and Other Ensemble Methods

i=1 = H t 1 (x) + α t h t (x)

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

TDT4173 Machine Learning

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Lecture 8. Instructor: Haipeng Luo

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

1 Handling of Continuous Attributes in C4.5. Algorithm

Stochastic Gradient Descent

Bias-Variance in Machine Learning

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

Machine Learning. Ensemble Methods. Manfred Huber

Data Mining und Maschinelles Lernen

Learning theory. Ensemble methods. Boosting. Boosting: history

1 Handling of Continuous Attributes in C4.5. Algorithm

COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017

Boos$ng Can we make dumb learners smart?

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

Ensemble Methods for Machine Learning

Ensemble Methods: Jay Hyer

Cross Validation & Ensembling

AdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research

Combining Classifiers

Statistical Machine Learning from Data

Hierarchical Boosting and Filter Generation

Neural Networks and Ensemble Methods for Classification

ECE 5984: Introduction to Machine Learning

Recitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Data Warehousing & Data Mining

Harrison B. Prosper. Bari Lectures

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Multivariate statistical methods and data mining in particle physics Lecture 4 (19 June, 2008)

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

2D1431 Machine Learning. Bagging & Boosting

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018

PDEEC Machine Learning 2016/17

CS798: Selected topics in Machine Learning

Notation P(Y ) = X P(X, Y ) = X. P(Y X )P(X ) Teorema de Bayes: P(Y X ) = CIn/UFPE - Prof. Francisco de A. T. de Carvalho P(X )

Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes. December 2, 2012

CS229 Supplemental Lecture notes

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Ensemble Methods and Random Forests

2 Upper-bound of Generalization Error of AdaBoost

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Learning in the Presence of Noise

Lecture 3: Decision Trees

Classifier Performance. Assessment and Improvement

Introduction to Boosting and Joint Boosting

An Empirical Study of Building Compact Ensembles

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example

CS534 Machine Learning - Spring Final Exam

Statistics and learning: Big Data

RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY

Multiclass Boosting with Repartitioning

Diversity Regularized Machine

An overview of Boosting. Yoav Freund UCSD

Boosting: Foundations and Algorithms. Rob Schapire

Decision Trees: Overfitting

COMS 4771 Lecture Boosting 1 / 16

FINAL: CS 6375 (Machine Learning) Fall 2014

Variance Reduction and Ensemble Methods

BBM406 - Introduc0on to ML. Spring Ensemble Methods. Aykut Erdem Dept. of Computer Engineering HaceDepe University

Linear Classifiers: Expressiveness

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Monte Carlo Theory as an Explanation of Bagging and Boosting

Optimal and Adaptive Algorithms for Online Boosting

Learning Theory Continued

Getting Lost in the Wealth of Classifier Ensembles?

Numerical Learning Algorithms

6.036 midterm review. Wednesday, March 18, 15

Computational Learning Theory (VC Dimension)

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

Perceptron. Subhransu Maji. CMPSCI 689: Machine Learning. 3 February February 2015

Chapter 14 Combining Models

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Diversity-Based Boosting Algorithm

Transcription:

What makes good ensemble? CS789: Machine Learning and Neural Network Ensemble methods Jakramate Bootkrajang Department of Computer Science Chiang Mai University 1. A member of the ensemble is accurate. An accurate classifier is one that has error rate of better than random guessing ϵ < 0.5 2. The ensemble is composed of diverse classifiers. Two classifiers are diverse if they make differrent errors on new data points. Jakramate Bootkrajang CS789: Machine Learning and Neural Network 1 / 20 Introduction Jakramate Bootkrajang CS789: Machine Learning and Neural Network 3 / 20 More on diversity We ve seen the working of a single classifier. We will now explore the possibility of combining outputs of those classifiers to make (more accurate) prediction. The method is call ensemble learning To see why diversity is important, imagine there are three classifier in the ensemble h 1, h 2, h 3 If the three classifiers predict the same thing (not diverse) then when h 1 makes a mistake the others will too. But if the classifiers are uncorrelated (diverse) when h1 makes a mistake, h 2, h 3 might not and by majority voting the final prediction is still correct. Jakramate Bootkrajang CS789: Machine Learning and Neural Network 2 / 20 Jakramate Bootkrajang CS789: Machine Learning and Neural Network 4 / 20

Reasons why ensemble often be more accurate [1/3] It solves statistical problem related to learning from limited number of training data. Ensemble reduces the risk of choosing wrong hypothesis. Reasons why ensemble often be more accurate [3/3] Ensemble alleviates the wrong choice of choosing hypothesis space. That is data is not linearly-separable but linear hypothesis class is chosen. An ensemble of linear classifiers can have non-linear decision boundary. Jakramate Bootkrajang CS789: Machine Learning and Neural Network 5 / 20 Reasons why ensemble often be more accurate [2/3] Jakramate Bootkrajang CS789: Machine Learning and Neural Network 7 / 20 How to construct an ensemble? Even in the abundance of data, the problem might have several local optima. Ensemble reduces the risk of stucking in local optima. Ensemble of G members in general is given by: f(x) = w i h i (x) Methods for constructing an ensemble differ in How to determine wi, the contribution of h i (x) How to get diverse set of h i (x). Introduce some randomness to the problem or learner Jakramate Bootkrajang CS789: Machine Learning and Neural Network 6 / 20 Jakramate Bootkrajang CS789: Machine Learning and Neural Network 8 / 20

Boostrap Aggregating: Bagging Bagging: Example (2/3) Manipulating the input data. How to get diverse h i (x)? Sample m examples from the training set randomly, with replacement. Train a classifier on the bootstrap replicate. For each boostrap, a classifier only see part of the whole data. What are the weights w i s? Classifiers are combined using identical weights. f bagging (x) = h i (x) Jakramate Bootkrajang CS789: Machine Learning and Neural Network 9 / 20 Bagging: Example (1/3) Jakramate Bootkrajang CS789: Machine Learning and Neural Network 11 / 20 Bagging: Example (3/3) Jakramate Bootkrajang CS789: Machine Learning and Neural Network 10 / 20 Jakramate Bootkrajang CS789: Machine Learning and Neural Network 12 / 20

Adaptive Boosting: AdaBoost AdaBoost: reweighting Place more weight on difficult examples. Instead of sampling, uses trainining set re-weighting. Place more weight on difficult examples. Classifiers are combined using f ada (x) = α i h i (x) α i is set according to h i s accuracy on the weighted training set. h i (x) is called a weak learner. D t+1 (i) = { Dt (i) exp( α t ) if y = h t (x i ) D t (i) exp(α t ) if y h t (x i ) α t is set according to h t s accuracy (1 ϵ t ) on the weighted training set. ϵ = 0.5, α = 0 ϵ = 0.4, α = 0.20 ϵ = 0.3, α = 0.42 (1) α t = 1 2 ln 1 ϵ t ϵ t (2) ϵ = 0.1, α = 1.09 Jakramate Bootkrajang CS789: Machine Learning and Neural Network 13 / 20 AdaBoost algorithm Jakramate Bootkrajang CS789: Machine Learning and Neural Network 15 / 20 Effect of ϵ on α (contribution) Data: S = {x i, y i } N, x i X and y i { 1, 1} initialization: uniform weight for initial data D 1 (i) = 1 N ; for t = 1... T do Learn a classifier h t : X { 1, 1} that minimises training error, ϵ j = N D t(i)[y i h j (x i )] ; if ϵ t > 0.5 then STOP; else Set α t = 1 2 ln 1 ϵ t ϵ t ; Reweighting by D t+1 (i) = D t(i) exp( α t y i h t (x i )) ; ( is a normaliser making N D t+1(i) = 1); end end Result: f ada (x) = T t=1 α th t (x) Contribution 4 3 2 1 0 0 0.1 0.2 0.3 0.4 0.5 Error Jakramate Bootkrajang CS789: Machine Learning and Neural Network 14 / 20 Jakramate Bootkrajang CS789: Machine Learning and Neural Network 16 / 20

Boosting: Example (1/2) Summary Ensembles are method for obtaining highly accurate classifiers by combining less accurate ones. This is another approach to solve non-linear problem. Well known algorithms include, bagging and boosting. Jakramate Bootkrajang CS789: Machine Learning and Neural Network 17 / 20 Boosting: Example (2/2) Jakramate Bootkrajang CS789: Machine Learning and Neural Network 19 / 20 References Ensemble Methods in Machine Learning by Tom Dietterich. web.engr.oregonstate.edu/~tgd/publications/ mcs-ensembles.pdf Freund; Schapire (1999). A Short Introduction to Boosting (PDF): introduction to AdaBoost Jakramate Bootkrajang CS789: Machine Learning and Neural Network 18 / 20 Jakramate Bootkrajang CS789: Machine Learning and Neural Network 20 / 20