Low Bias Bagged Support Vector Machines

Similar documents
Low Bias Bagged Support Vector Machines

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1

Dipartimento di Informatica e Scienze dell Informazione

Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods

Bias-Variance in Machine Learning

An introduction to Support Vector Machines

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Variance Reduction and Ensemble Methods

PAC-learning, VC Dimension and Margin-based Bounds

Holdout and Cross-Validation Methods Overfitting Avoidance

Classifier Complexity and Support Vector Classifiers

Data Mining und Maschinelles Lernen

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

FEATURE SELECTION COMBINED WITH RANDOM SUBSPACE ENSEMBLE FOR GENE EXPRESSION BASED DIAGNOSIS OF MALIGNANCIES

Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines

Machine Learning. Ensemble Methods. Manfred Huber

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Machine Learning, Midterm Exam

Learning with multiple models. Boosting.

Review: Support vector machines. Machine learning techniques and image analysis

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric

Pattern Recognition 2018 Support Vector Machines

Introduction to Machine Learning Midterm Exam

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Support Vector Machine (continued)

Sparse Kernel Machines - SVM

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Understanding Generalization Error: Bounds and Decompositions

Introduction to Machine Learning

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition

CS534 Machine Learning - Spring Final Exam

FINAL: CS 6375 (Machine Learning) Fall 2014

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Support Vector Machine (SVM) and Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Introduction to Machine Learning Midterm Exam Solutions

TDT4173 Machine Learning

Introduction to Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Linear classifiers Lecture 3

Microarray Data Analysis: Discovery

Support Vector Machines

ECE 5984: Introduction to Machine Learning

Introduction to Support Vector Machines

Brief Introduction to Machine Learning

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

Jeff Howbert Introduction to Machine Learning Winter

Support Vector Machines

CS7267 MACHINE LEARNING

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Discriminative Learning and Big Data

Classifier Performance. Assessment and Improvement

Computational learning theory. PAC learning. VC dimension.

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Machine Learning. Lecture 9: Learning Theory. Feng Li.

EE 511 Online Learning Perceptron

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Statistical Methods for SVM

18.9 SUPPORT VECTOR MACHINES

Advanced Introduction to Machine Learning

6.867 Machine learning

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

CS 188: Artificial Intelligence. Outline

ESS2222. Lecture 3 Bias-Variance Trade-off

Statistical and Computational Learning Theory

Ensembles. Léon Bottou COS 424 4/8/2010

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines.

Voting (Ensemble Methods)

4 Bias-Variance for Ridge Regression (24 points)

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1

Learning with kernels and SVM

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Stat 502X Exam 2 Spring 2014

Machine Learning Practice Page 2 of 2 10/28/13

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

PDEEC Machine Learning 2016/17

Midterm: CS 6375 Spring 2015 Solutions

Discriminative Models

Discriminative Models

PAC-learning, VC Dimension and Margin-based Bounds

Lecture 10: Support Vector Machine and Large Margin Classifier

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Support Vector and Kernel Methods

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Adaptive Sampling Under Low Noise Conditions 1

CS145: INTRODUCTION TO DATA MINING

Support Vector Machines

Transcription:

Low Bias Bagged Support Vector Machines Giorgio Valentini Dipartimento di Scienze dell Informazione Università degli Studi di Milano, Italy valentini@dsi.unimi.it Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 USA http://www.cs.orst.edu/~tgd 8/22/2003 ICML 2003 1

Two Questions: Can bagging help SVMs? If so, how should SVMs be tuned to give the best bagged performance? 8/22/2003 ICML 2003 2

The Answers Can bagging help SVMs? Yes If so, how should SVMs be tuned to give the best bagged performance? Tune to minimize the bias of each SVM 8/22/2003 ICML 2003 3

Soft Margin Classifier SVMs minimize: w 2 + C i i subject to: y i (w x i + b) + i 1 Maximizes VC dimension subject to soft separation of the training data Dot product can be generalized using kernels K(x j,x i ; ) Set C and using an internal validation set Excellent control of the bias/variance tradeoff: Is there any room for improvement? 8/22/2003 ICML 2003 4

Bias/Variance Error Decomposition for Squared Loss For regression problems, loss is (ŷ y) 2 error 2 = bias 2 + variance + noise E S [(ŷ-y) 2 ] = (E S [ŷ] f(x)) 2 + E S [(ŷ E S [ŷ]) 2 ] + E[(y f(x)) 2 ] Bias: Systematic error at data point x averaged over all training sets S of size N Variance: Variation around the average Noise: Errors in the observed labels of x 8/22/2003 ICML 2003 5

Example: 20 points y = x + 2 sin(1.5x) + N(0,0.2) 8/22/2003 ICML 2003 6 Oregon State University Intelligent Systems Group

Example: 50 fits (20 examples each) 8/22/2003 ICML 2003 7 Oregon State University Intelligent Systems Group

Bias 8/22/2003 ICML 2003 8 Oregon State University Intelligent Systems Group

Variance 8/22/2003 ICML 2003 9 Oregon State University Intelligent Systems Group

Noise 8/22/2003 ICML 2003 10 Oregon State University Intelligent Systems Group

Variance Reduction and Bagging Bagging attempts to simulate a large number of training sets and compute the average prediction y m of those training sets It then predicts y m If the simulation is good enough, this eliminates all of the variance 8/22/2003 ICML 2003 11

Bias and Variance for 0/1 Loss (Domingos, 2000) At each test point x, we have 100 estimates: ŷ 1,, ŷ 100 { 1,+1} Main prediction: y m = majority vote Bias(x) = 0 if y m is correct and 1 otherwise Variance(x) = probability that ŷ y m Unbiased variance V U (x): variance when Bias = 0 Biased variance V B (x): variance when Bias = 1 Error rate(x) = Bias(x) + V U (x) V B (x) Noise is assumed to be zero 8/22/2003 ICML 2003 12

Good Variance and Bad Variance Error rate(x) = Bias(x) + V U (x) V B (x) V B (x) is good variance, but only when the bias is high V U (x) is bad variance Bagging will reduce both types of variance. This gives good results if Bias(x) is small. Goal: Tune classifiers to have small bias and rely on bagging to reduce variance 8/22/2003 ICML 2003 13

Given: Do: Lobag Training examples {(x i,y i )} N i=1 Learning algorithm with tuning parameters Parameter settings to try { 1, 2, } Apply internal bagging to compute out-of-bag estimates of the bias of each parameter setting. Let * be the setting that gives minimum bias Perform bagging using * 8/22/2003 ICML 2003 14

Experimental Study Seven data sets: P2, waveform, greylandsat, spam, musk, letter2 (letter recognition B vs R ), letter2+noise (20% added noise) Three kernels: dot product, RBF ( = gaussian width), polynomial ( = degree) Training set: 100 examples Bias and variance estimated on test set from 100 replicates 8/22/2003 ICML 2003 15

Example: Letter2, RBF kernel, = 100 minimum error minimum bias 8/22/2003 ICML 2003 16 Oregon State University Intelligent Systems Group

Results: Dot Product Kernel 8/22/2003 ICML 2003 17 Oregon State University Intelligent Systems Group

Results (2): Gaussian Kernel 8/22/2003 ICML 2003 18 Oregon State University Intelligent Systems Group

Results (3): Polynomial Kernel 8/22/2003 ICML 2003 19 Oregon State University Intelligent Systems Group

McNemar s Tests: Bagging versus Single SVM 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Linear Polynomial RBF Wins Ties Losses 8/22/2003 ICML 2003 20

McNemar s Test: Lobag versus Single SVM 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Linear Polynomial RBF Wins Ties Losses 8/22/2003 ICML 2003 21

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% McNemar s Test: Lobag versus Bagging Linear Polynomial RBF Wins Ties Losses 8/22/2003 ICML 2003 22

Kernel Results: McNemar s Test (wins ties losses) Lobag vs Bagging Lobag vs Single Bagging vs Single Linear 3 26 1 23 7 0 21 9 0 Polynomial 12 23 0 17 17 1 9 24 2 Gaussian 17 17 1 18 15 2 9 22 4 Total 32 66 2 58 39 3 39 55 6 8/22/2003 ICML 2003 23

For small training sets Discussion Bagging can improve SVM error rates, especially for linear kernels Lobag is at least as good as bagging and often better Consistent with previous experience Bagging works better with unpruned trees Bagging works better with neural networks that are trained longer or with less weight decay 8/22/2003 ICML 2003 24

Conclusions Lobag is recommended for SVM problems with high variance (small training sets, high noise, many features) Small added cost: SVMs require internal validation to set C and Lobag requires internal bagging to estimate bias for each setting of C and Future research: Smart search for low-bias settings of C and Experiments with larger training sets 8/22/2003 ICML 2003 25