Holdout and Cross-Validation Methods Overfitting Avoidance

Similar documents
CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

C4.5 - pruning decision trees

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Variance Reduction and Ensemble Methods

Notes on Machine Learning for and

Statistical Machine Learning from Data

Day 3: Classification, logistic regression

10-701/ Machine Learning, Fall

CS7267 MACHINE LEARNING

the tree till a class assignment is reached

Nonlinear Classification

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

Machine Learning & Data Mining

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Numerical Learning Algorithms

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Empirical Risk Minimization, Model Selection, and Model Assessment

Introduction to Machine Learning Midterm Exam

PAC-learning, VC Dimension and Margin-based Bounds

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Machine Learning

AE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)

Decision trees COMS 4771

FINAL: CS 6375 (Machine Learning) Fall 2014

Ensembles of Classifiers.

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

1 Handling of Continuous Attributes in C4.5. Algorithm

Machine Learning 2nd Edi7on

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

ECE 5984: Introduction to Machine Learning

Introduction to Machine Learning

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

Machine Learning

Adaptive Crowdsourcing via EM with Prior

VBM683 Machine Learning

Decision Tree Learning Lecture 2

ECE 5424: Introduction to Machine Learning

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

PATTERN CLASSIFICATION

Decision Trees (Cont.)

6.036 midterm review. Wednesday, March 18, 15

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Algorithm-Independent Learning Issues

Machine Leanring Theory and Applications: third lecture

Final Exam, Machine Learning, Spring 2009

Introduction to Gaussian Process

CS 6375 Machine Learning

Lecture 3: Decision Trees

CS145: INTRODUCTION TO DATA MINING

Data Mining und Maschinelles Lernen

Lecture 7: DecisionTrees

Bagging and Other Ensemble Methods

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

day month year documentname/initials 1

Understanding Generalization Error: Bounds and Decompositions

Data Mining Classification Trees (2)

EECS 349:Machine Learning Bryan Pardo

Learning with multiple models. Boosting.

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Lecture 3: Decision Trees

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Support Vector Machines

Learning Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Classification: The rest of the story

Ensemble Methods and Random Forests

TDT4173 Machine Learning

Midterm Exam, Spring 2005

Supporting Information

Supervised Learning (contd) Decision Trees. Mausam (based on slides by UW-AI faculty)

Low Bias Bagged Support Vector Machines

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Introduction to Machine Learning 2008/B Assignment 5

Jeffrey D. Ullman Stanford University

Introduction to Machine Learning Midterm Exam Solutions

Support Vector Machines. Machine Learning Fall 2017

Decision Tree Learning

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

Support Vector Machine (SVM) and Kernel Methods

Machine Learning 3. week

Midterm: CS 6375 Spring 2018

Oliver Dürr. Statistisches Data Mining (StDM) Woche 11. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

1 Handling of Continuous Attributes in C4.5. Algorithm

Mining Classification Knowledge

CSCI 5622 Machine Learning

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Midterm exam CS 189/289, Fall 2015

Decision Trees. CS 341 Lectures 8/9 Dan Sheldon

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Learning Decision Trees

Artificial Intelligence Roman Barták

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

Transcription:

Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest Neighbor Choose number of neighbors Support Vector Machines Choose C Choose σ for Gaussian Kernels

Reduce Error Pruning Given a data sets S Subdivide S into S train and S dev Build tree using S train Pass all of the S dev training examples through the tree and estimate the error rate of each node using S dev Convert a node to a leaf if it would have lower estimated error than the sum of the errors of its children

Reduce Error Pruning Example

Cost-Complexity Complexity Pruning The CART system (Breiman et al, 1984), employs cost-complexity complexity pruning: J(Tree,S) = ErrorRate(Tree,S) + α Tree where Tree is the number of nodes in the tree and α is a parameter that controls the tradeoff between the error rate and the penalty α is set by cross-validation

Determining Important Values of α Goal: Identify a finite set of candidate values for α.. Then evaluate them via cross-validation Set α = α 0 = 0; t = 0 Train S to produce tree T Repeat until T is completely pruned determine next larger value of α = α k+1 that would cause a node to be pruned from T prune this node t := t + 1 This can be done efficiently

Choosing an α by Cross-Validation Divide S into 10 subsets S 0,,, S 9 In fold v Train a tree on U i v S i For each α k, prune the tree to that level and measure the error rate on S v Compute ε k to be the average error rate over the 10 folds when α = α k Choose the α k that minimizes ε k. Call it α * and let ε * be the corresponding error rate Prune the original tree according to α *

The 1-SE 1 Rule for Setting α Compute a confidence interval on ε * and let U be the upper bound of this interval Compute the smallest α k whose ε k U.. If we use Z=1 for the confidence interval computation, this is called the 1-SE rule, because the bound is one standard error above ε *

Notes on Decision Tree Pruning Cost-complexity pruning usually gives best results in experimental studies Pessimistic pruning is the most efficient (does not require holdout or cross-validation) and it is quite robust Reduce-error error pruning is rarely used, because it consumes training data Pruning is more important for regression trees than for classification trees Pruning has relatively little effect for classification trees. There are only a small number of possible prunings of a tree, and usually the serious errors made by the tree- growing process (i.e., splitting on the wrong features) cannot be repaired by pruning. Ensemble methods work much better than pruning

Holdout Methods for Neural Networks Early Stopping using a development set Adjusting Regularizers using a development set or via cross-validation amount of weight decay number of hidden units learning rate number of epochs

Early Stopping using an Evaluation Set Dev Test Split S into S train and S dev Train on S train, after every epoch, evaluate on S dev. If error rate is best observed, save the weights

Reconstituted Early Stopping Recombine S train and S dev to produce S Train on S and stop at the point (# of epochs or mean squared error) identified using S dev

Reconstituted Early Stopping Dev Test We can stop either when MSE on the training set matches the predicted optimal MSE or when the number of epochs matches the predicted optimal number of epochs Experimental studies show little or no advantage for reconstituted ed early stopping. Most people just use simple holdout

Nearest Neighbor: Choosing k Dev Test LOOCV k=9 gives best performance on development set and on test set. k=13 gives best performance based on leave-one one-out out cross-validation

SVM Choosing C and σ (BR Data Set; 100 examples; Valentini 2003)

20% label noise

BR Data Set: Varying σ for fixed C

Summary Holdout methods are the best way to choose a classifier Reduce error pruning for trees Early stopping for neural networks Cross-validation methods are the best way to set a regularization parameter Cost-complexity pruning parameter α Neural network weight decay setting Number k of nearest neighbors in k-nn C and σ for SVMs