Bagging and Other Ensemble Methods

Similar documents
Parameter Norm Penalties. Sargur N. Srihari

Learning with multiple models. Boosting.

Machine Learning. Ensemble Methods. Manfred Huber

Ensembles of Classifiers.

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Voting (Ensemble Methods)

What makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity

TDT4173 Machine Learning

VBM683 Machine Learning

ECE 5424: Introduction to Machine Learning

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Regularization and Optimization of Backpropagation

Neural Networks: Optimization & Regularization

Boosting & Deep Learning

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

CS7267 MACHINE LEARNING

Statistical Machine Learning from Data

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Regularization in Neural Networks

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

A Brief Introduction to Adaboost

Cross Validation & Ensembling

TDT4173 Machine Learning

1 Handling of Continuous Attributes in C4.5. Algorithm

Holdout and Cross-Validation Methods Overfitting Avoidance

Deep Feedforward Networks

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

1 Handling of Continuous Attributes in C4.5. Algorithm

Variance Reduction and Ensemble Methods

6.867 Machine learning

ECE 5984: Introduction to Machine Learning

Data Mining und Maschinelles Lernen

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

Chapter 14 Combining Models

Internal Covariate Shift Batch Normalization Implementation Experiments. Batch Normalization. Devin Willmott. University of Kentucky.

Ensembles. Léon Bottou COS 424 4/8/2010

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CS534 Machine Learning - Spring Final Exam

Introduction to Machine Learning

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Neural Networks and Ensemble Methods for Classification

Convolution and Pooling as an Infinitely Strong Prior

Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes. December 2, 2012

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Stochastic Gradient Descent

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

Classifier Performance. Assessment and Improvement

6.036 midterm review. Wednesday, March 18, 15

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

CSC321 Lecture 9: Generalization

Lecture 9: Generalization

Introduction to Neural Networks

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

CS 590D Lecture Notes

Ensemble Methods for Machine Learning

Linear & nonlinear classifiers

Linear and Logistic Regression. Dr. Xiaowei Huang

BAGGING PREDICTORS AND RANDOM FOREST

Recitation 9. Gradient Boosting. Brett Bernstein. March 30, CDS at NYU. Brett Bernstein (CDS at NYU) Recitation 9 March 30, / 14

Comparison of Modern Stochastic Optimization Algorithms

CSC321 Lecture 9: Generalization

Boos$ng Can we make dumb learners smart?

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Notes on Adversarial Examples

The Perceptron Algorithm 1

Algorithm-Independent Learning Issues

Source localization in an ocean waveguide using supervised machine learning

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Linear & nonlinear classifiers

Learning Ensembles. 293S T. Yang. UCSB, 2017.

StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech

A practical theory for designing very deep convolutional neural networks. Xudong Cao.

Logistic Regression and Boosting for Labeled Bags of Instances

Multivariate Analysis Techniques in HEP

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex

Machine Learning And Applications: Supervised Learning-SVM

How do we compare the relative performance among competing models?

Lecture 5 Neural models for NLP

OPTIMIZATION METHODS IN DEEP LEARNING

CSCI-567: Machine Learning (Spring 2019)

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Neural Network Training

Statistics and learning: Big Data

Midterm Exam Solutions, Spring 2007

Introduction to Deep Learning CMPT 733. Steven Bergner

Introduction to Machine Learning Midterm Exam

Oliver Dürr. Statistisches Data Mining (StDM) Woche 11. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

Hypothesis Evaluation

Second Order Cone Programming, Missing or Uncertain Data, and Sparse SVMs

Introduction to Gaussian Process

FINAL: CS 6375 (Machine Learning) Fall 2014

Transcription:

Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1

Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained Problems 4. Data Set Augmentation 5. Noise Robustness 6. Semi-supervised learning 7. Multi-task learning 8. Early Stopping 9. Parameter tying and parameter sharing 10. Sparse representations 11. Bagging and other ensemble methods 12. Dropout 13. Adversarial training 14. Tangent methods 2

What is bagging? It is short for Bootstrap Aggregating It is a technique for reducing generalization error by combining several models Idea is to train several models separately, then have all the models vote on the output for test examples This strategy is called model averaging Techniques employing this strategy are known as ensemble methods Model averaging works because different models will not make the same mistake 3

Ex: Ensemble error rate Consider set of k regression models Each model makes error ε i on each example, i=1,..n Errors drawn from a zero-mean multivariate normal with variance E[ε i2 ]=v and covariance E[ε i ε j ]=c Error of average prediction of all ensemble models: Expected squared error of ensemble prediction is E 1 k ε i i 2 = 1 k E 2 i ε 2 + ε i i ε j j i = 1 k v + k 1 k c If errors are perfectly correlated, c=v, and mean squared error reduces to v, so model averaging does not help If errors are perfectly uncorrelated and c=0, expected squared error of ensemble is only v/k Ensemble error decreases linearly with ensemble size 1 k i ε i 4

Ensemble vs Bagging Different ensemble methods construct the ensemble of models in different ways Ex: each member of ensemble could be formed by training a completely different kind of model using a different algorithm or objective function Bagging is a method that allows the same kind of model, training algorithm and objective function to be reused several times 5

The Bagging Technique Given training set D of size N, generate k data sets of same no of examples as original by sampling with replacement Some observations may be repeated in each D i the rest being duplicates. This is known as a bootstrap sample The differences in examples will result in differences between trained models The k models are combined by averaging the output (for regression) or voting (for classification) An example is given next 6

Example of Bagging Principle Task of training an 8 detector Bagging training procedure make different data sets by resampling the given data set Omit 9 Repeat 8 Omit 6 Repeat 9 Each detector is brittle Their average is robust achieving maximum confidence when both loops are present Detector learns that loop on top Is an 8 Detector learns that loop at bottom is an 8 7

Neural nets and bagging Neural nets reach a wide variety of solution points Thus they benefit from model averaging when trained on the same dataset Differences in: random initializations random selection of minibatches, in hyperparameters, cause different members of the ensemble to make partially independent errors 8

Model averaging is powerful Model averaging is a reliable method for reducing generalization error Machine learning contests are usually won by model averaging over dozens of models Ex: Netflix grand prize Since model averaging performance comes at the expense of increased computation and memory, benchmark comparisons are made using a single model 9

Boosting Incrementally adding models to the ensemble After a weak learner is added, the data are reweighted: examples that are misclassified gain weight and examples that are classified correctly lose weight Has been applied to ensembles of neural networks, by incrementally adding neural netowrks to the ensemble Also interpreting a neural network as an ensemble, incrementally adding hidden units to the network 10