C4.5 - pruning decision trees

Similar documents
Machine Learning 2nd Edi7on

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Holdout and Cross-Validation Methods Overfitting Avoidance

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Decision Tree Learning

CS145: INTRODUCTION TO DATA MINING

Decision Tree Learning Lecture 2

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Machine Learning: Symbolische Ansätze. Decision-Tree Learning. Introduction C4.5 ID3. Regression and Model Trees

the tree till a class assignment is reached

Introduction to Machine Learning CMU-10701

Lecture 3: Decision Trees

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

Lecture 7 Decision Tree Classifier

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Data Mining and Knowledge Discovery: Practice Notes

Einführung in Web- und Data-Science

Classification and Prediction

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Trees Entropy, Information Gain, Gain Ratio

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Classification: Decision Trees

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSCI 5622 Machine Learning

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Decision Trees (Cont.)

Decision Tree Learning

Machine Learning 2nd Edition

Decision Trees. CS 341 Lectures 8/9 Dan Sheldon

Improving M5 Model Tree by Evolutionary Algorithm

Knowledge Discovery and Data Mining

Decision Tree Learning

Notes on Machine Learning for and

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision trees COMS 4771

Decision Trees. Introduction. Some facts about decision trees: They represent data-classification models.

Learning Decision Trees

CS 6375 Machine Learning

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Learning Decision Trees

Statistical Machine Learning from Data

Learning Decision Trees

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Lecture 3: Decision Trees

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

Empirical Risk Minimization, Model Selection, and Model Assessment

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Machine Learning & Data Mining

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Decision Trees. Gavin Brown

Decision Trees.

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Classification: Decision Trees

EECS 349:Machine Learning Bryan Pardo

CHAPTER-17. Decision Tree Induction

Machine Learning 3. week

Decision Trees Lecture 12

Lecture 7: DecisionTrees

Informal Definition: Telling things apart

Decision Tree Learning

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon

Dan Roth 461C, 3401 Walnut

Decision Trees.

Classification and Regression Trees

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Chapter 3: Decision Tree Learning

Apprentissage automatique et fouille de données (part 2)

Decision Trees. Tirgul 5

Artificial Intelligence Decision Trees


Machine Learning

Decision Tree And Random Forest

PATTERN CLASSIFICATION

Decision Tree. Decision Tree Learning. c4.5. Example

Chapter 3: Decision Tree Learning (part 2)

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Tree Learning

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Day 3: Classification, logistic regression

Decision trees. Decision tree induction - Algorithm ID3

Midterm, Fall 2003

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Nonlinear Classification

Decision Trees. Danushka Bollegala

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Data classification (II)

Andrew/CS ID: Midterm Solutions, Fall 2006

Regression tree methods for subgroup identification I

Algorithm-Independent Learning Issues

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Transcription:

C4.5 - pruning decision trees

Quiz 1

Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can have? A: No.

Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can have? A: No. This tree is the best classifier on the training set, but possibly not on new and unseen data. Because of overfitting, the tree may not generalize very well.

Pruning Goal: Prevent overfitting to noise in the data Two strategies for pruning the decision tree: Postpruning - take a fully-grown decision tree and discard unreliable parts Prepruning - stop growing a branch when information becomes unreliable

Prepruning Based on statistical significance test Stop growing the tree when there is no statistically significant association between any attribute and the class at a particular node Most popular test: chi-squared test ID3 used chi-squared test in addition to information gain Only statistically significant attributes were allowed to be selected by information gain procedure

Early stopping a b class 1 0 0 0 2 0 1 1 3 1 0 1 Pre-pruning may stop the growth process prematurely: early stopping Classic example: XOR/Parity-problem 4 1 1 0 No individual attribute exhibits any significant association to the class Structure is only visible in fully expanded tree Pre-pruning won t expand the root node But: XOR-type problems rare in practice And: pre-pruning faster than post-pruning

Post-pruning First, build full tree Then, prune it Fully-grown tree shows all attribute interactions Problem: some subtrees might be due to chance effects Two pruning operations: 1. Subtree replacement 2. Subtree raising

Subtree replacement Bottom-up Consider replacing a tree only after considering all its subtrees

Subtree replacement Bottom-up Consider replacing a tree only after considering all its subtrees

Subtree replacement Bottom-up Consider replacing a tree only after considering all its subtrees

Subtree replacement Bottom-up Consider replacing a tree only after considering all its subtrees

Estimating error rates Prune only if it reduces the estimated error Error on the training data is NOT a useful estimator Use hold-out set for pruning ( reduced-error pruning ) C4.5 s method Derive confidence interval from training data Use a heuristic limit, derived from this, for pruning Standard Bernoulli-process-based method Shaky statistical assumptions (based on training data)

Estimating Error Rates

Estimating Error Rates Q: what is the error rate on the training set?

Estimating Error Rates Q: what is the error rate on the training set? A: 0.33 (2 out of 6)

Estimating Error Rates Q: what is the error rate on the training set? A: 0.33 (2 out of 6)

Estimating Error Rates Q: what is the error rate on the training set? A: 0.33 (2 out of 6) Q: will the error on the test set be bigger, smaller or equal?

Estimating Error Rates Q: what is the error rate on the training set? A: 0.33 (2 out of 6) Q: will the error on the test set be bigger, smaller or equal? A: bigger

Estimating the error Assume making an error is Bernoulli trial with probability p p is unknown (true error rate) We observe f, the success rate f = S/N For large enough N, f follows a Normal distribution Mean and variance for f : p, p (1 p)/n p -σ p p +σ

Estimating the error c% confidence interval [ z X z] for random variable with 0 mean is given by: With a symmetric distribution: c

z-transforming f Transformed value for f : (i.e. subtract the mean and divide by the standard deviation) Resulting equation: Solving for p: -1 0 1

C4.5 s method Error estimate for subtree is weighted sum of error estimates for all its leaves Error estimate for a node (upper bound): If c = 25% then z = 0.69 (from normal distribution) Pr[X z] z 1% 2.33 5% 1.65 10% 1.28 20% 0.84 25% 0.69 40% 0.25

C4.5 s method

C4.5 s method f is the observed error

C4.5 s method f is the observed error z = 0.69

C4.5 s method f is the observed error z = 0.69 e > f e = (f + ε 1 )/(1+ ε 2 )

C4.5 s method f is the observed error z = 0.69 e > f e = (f + ε 1 )/(1+ ε 2 ) N, e = f

Example f=0.33 e=0.47 f=0.5 e=0.72 f=0.33 e=0.47

Example f=0.33 e=0.47 f=0.5 e=0.72 f=0.33 e=0.47 Combined using ratios 6:2:6 gives 0.51

Example f = 5/14=0.36 e = 0.46 e < 0.51 so prune! f=0.33 e=0.47 f=0.5 e=0.72 f=0.33 e=0.47 Combined using ratios 6:2:6 gives 0.51

Summary Decision Trees splits binary, multi-way split criteria information gain, gain ratio, pruning No method is always superior experiment!