Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Similar documents
Learning Decision Trees

Learning Decision Trees

Lecture 3: Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Machine Learning 2nd Edi7on

Dan Roth 461C, 3401 Walnut

Decision Tree Learning

the tree till a class assignment is reached

Decision Tree Learning and Inductive Inference

Decision Tree Learning

Decision Tree Learning

Lecture 3: Decision Trees

Machine Learning & Data Mining

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

EECS 349:Machine Learning Bryan Pardo

Lecture 7: DecisionTrees

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Decision Trees.

Classification: Decision Trees

Learning Classification Trees. Sargur Srihari

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Artificial Intelligence Decision Trees

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Decision Trees.

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision Trees. CS 341 Lectures 8/9 Dan Sheldon

CS 6375 Machine Learning

C4.5 - pruning decision trees

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon

Jeffrey D. Ullman Stanford University

Decision Trees Entropy, Information Gain, Gain Ratio

Classification: Decision Trees

Chapter 3: Decision Tree Learning

Informal Definition: Telling things apart

Decision Tree Learning

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees / NLP Introduction

Supervised Learning via Decision Trees

Tutorial 6. By:Aashmeet Kalra

Nonlinear Classification

Decision Tree Learning Lecture 2

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Learning Decision Trees

Machine Learning (CS 419/519): M. Allen, 14 Sept. 18 made, in hopes that it will allow us to predict future decisions

Decision Tree Learning

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Classification II: Decision Trees and SVMs

Notes on Machine Learning for and

Decision Trees. Gavin Brown

Decision Trees (Cont.)

CHAPTER-17. Decision Tree Induction

Decision Trees. Tirgul 5

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Introduction to Data Science Data Mining for Business Analytics

Decision Tree Learning - ID3

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

CSC 411 Lecture 3: Decision Trees

CS145: INTRODUCTION TO DATA MINING

Machine Learning 2010

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Decision Trees. Danushka Bollegala

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Classification and Regression Trees

Classification and Prediction

Decision trees COMS 4771

Induction of Decision Trees

Decision Tree Learning

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Holdout and Cross-Validation Methods Overfitting Avoidance

CSCI 5622 Machine Learning

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Machine Learning

Lecture 7 Decision Tree Classifier

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Search and Lookahead. Bernhard Nebel, Julien Hué, and Stefan Wölfl. June 4/6, 2012

Introduction to Machine Learning CMU-10701

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Machine Learning 2nd Edition

Chapter 3: Decision Tree Learning (part 2)

Classification and regression trees

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

16.4 Multiattribute Utility Functions

Decision Trees. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining and Knowledge Discovery: Practice Notes

Decision trees. Decision tree induction - Algorithm ID3

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

From inductive inference to machine learning

} It is non-zero, and maximized given a uniform distribution } Thus, for any distribution possible, we have:

Transcription:

Decision Trees Lewis Fishgold (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Classification using Decision Trees Nodes test features, there is one branch for each value of the feature, and leaves specify the classification. color red blue green f([color = red, shape = circle, size = big]) = + shape - + circle square triangle + - - Logical view: (red circle) \/ green Geometrical view Represents axis-parallel boundaries circle red blue green + square - - + triangle -

Top-Down Decision Tree Induction Recursively build a tree by finding good splits, and partitioning examples <big, red, circle>: + <small, red, circle>: + <small, red, square>: <big, blue, circle>:

Top-Down Decision Tree Induction Recursively build a tree by finding good splits, and partitioning examples red blue color green <big, red, circle>: + <small, red, circle>: + <small, red, square>: <big, blue, circle>:

Top-Down Decision Tree Induction Recursively build a tree by find good splits, and partitioning examples color <big, red, circle>: + <small, red, circle>: + <small, red, square>: red green blue shape - - <big, blue, circle>: circle square triangle + <big, red, circle>: + <small, red, circle>: + - - <small, red, square>:

Top-Down Decision Tree Induction Pseudocode DTree(examples, features) returns a tree If all examples are in one category, return a leaf node with that category label. Else if the set of features is empty, return a leaf node with the category label that is the most common in examples. Else pick a good feature F and create a node R for it For each possible value v i of F: Let examples i be the subset of examples that have value v i for F Add an out-going edge E to node R labeled with the value v i. If examples i is empty then attach a leaf node to edge E labeled with the category that is the most common in examples. else call DTree(examples i, features {F}) and attach the resulting tree as the subtree under edge E. Return the subtree rooted at R.

Picking good features to split on Goal is to have the resulting tree be as small as possible in accord with Occam s razor Finding a minimal decision tree (nodes, leaves, or depth) is an NP-hard optimization problem. So, we use greedy heuristic search which might find suboptimal solutions Heuristic: Want to pick a feature that creates subsets of examples that are relatively pure in a single class so they are closer to being leaf nodes Sounds like a job for information theory

Entropy Entropy (ie. impurity) of a set of examples, S, for binary classification Entropy where p + is the fraction of positive examples in S and p - is the fraction of negative examples. If all examples are in one category, entropy is zero (we define 0 log(0)=0) If examples are equally mixed (p 1 =p 0 =0.5), entropy is a maximum of 1. For multi-class problems with c categories, entropy generalizes to: ( S) = p + log2( p+ ) p log2( p Entropy( S) = c i= 1 p i log ( p i 2 ) )

Entropy Plot for Binary Classification

Information Gain The information gain of a feature F is the expected reduction in entropy resulting from splitting on this feature. where S v is the subset of S having value v for feature F. Second term is entropy of each resulting subset weighted by its relative size Example: Gain( S, F) = Entropy( S) v Values( F ) Entropy( S <big, red, circle>: + <small, red, circle>: + <small, red, square>: <big, blue, circle>: S S v v ) 2+, 2 : E=1 size big small 1+,1 1+,1 E=1 E=1 Gain=1 (0.5 1 + 0.5 1) = 0 2+, 2 : E=1 color red blue 2+,1 0+,1 E=0.918 E=0 Gain=1 (0.75 0.918 + 0.25 0) = 0.311 2+, 2 : E=1 shape circle square 2+,1 0+,1 E=0.918 E=0 Gain=1 (0.75 0.918 + 10 0.25 0) = 0.311

Decision Trees in the Real World In real world data, we can't expect leaves to be pure The features might not be adequate for perfect classification So, the leaves contain probability distributions When doing classification, we pick the class in the leaf with greatest probability

When to stop growing the tree? Use (chi-squared) statistical significance test Are the post-split distributions significantly different than the pre-split distribution?

Relation to other methods Perceptrons Can learn non-axis-parallel decision boundaries Can't learn nonlinear decision boundaries Better suited for continuous input SVM Can learn nonlinear functions, but you have to pick a good kernel Can efficiently find global solution to the optimization problem Better suited for continuous input Decision Trees Can learn nonlinear decision boundaries (but has trouble with non-axisparallel boundaries) Better suited for discrete input Hill climbing can can get stuck in local optima Automatically performs variable selection Classifier is human-readable (sometimes)