Decision Trees. Gavin Brown

Similar documents
Learning Decision Trees

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Learning Decision Trees

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision Trees. Tirgul 5

Machine Learning 2nd Edi7on

the tree till a class assignment is reached

Machine Learning Recitation 8 Oct 21, Oznur Tastan


Lecture 3: Decision Trees

Administrative notes. Computational Thinking ct.cs.ubc.ca

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

EECS 349:Machine Learning Bryan Pardo

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Decision Tree Learning and Inductive Inference

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision Trees.

Classification: Decision Trees

Lecture 3: Decision Trees

Learning Classification Trees. Sargur Srihari

Machine Learning 3. week

CS 6375 Machine Learning

Decision Tree Learning

Dan Roth 461C, 3401 Walnut

Decision Support. Dr. Johan Hagelbäck.

Decision Trees.

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Machine Learning & Data Mining

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Artificial Intelligence Decision Trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Decision Trees. Danushka Bollegala

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

The Solution to Assignment 6

Classification and Prediction

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Classification and Regression Trees

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Artificial Intelligence. Topic

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Chapter 3: Decision Tree Learning

Chapter 6: Classification

Administrative notes February 27, 2018

Decision Tree Learning

Rule Generation using Decision Trees

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Decision Tree Learning - ID3

10-701/ Machine Learning: Assignment 1

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

CS145: INTRODUCTION TO DATA MINING

Introduction to Machine Learning CMU-10701

Algorithms for Classification: The Basic Methods

Typical Supervised Learning Problem Setting

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Classification Using Decision Trees

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Decision Tree Learning

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision T ree Tree Algorithm Week 4 1

Classification and regression trees

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Decision Trees / NLP Introduction

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Decision Tree Learning

Tutorial 6. By:Aashmeet Kalra

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Bayesian Classification. Bayesian Classification: Why?

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Symbolic methods in TC: Decision Trees

The Naïve Bayes Classifier. Machine Learning Fall 2017

Einführung in Web- und Data-Science

Reminders. HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Decision Tree And Random Forest

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

CC283 Intelligent Problem Solving 28/10/2013

UVA CS 4501: Machine Learning

Introduction Association Rule Mining Decision Trees Summary. SMLO12: Data Mining. Statistical Machine Learning Overview.

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Data classification (II)

Lecture 7 Decision Tree Classifier

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Notes on Machine Learning for and

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Modern Information Retrieval

Introduction to Data Science Data Mining for Business Analytics

Chapter 3: Decision Tree Learning (part 2)

Transcription:

Decision Trees Gavin Brown

Every Learning Method has Limitations Linear model? KNN? SVM?

Explain your decisions Sometimes we need interpretable results from our techniques. How do you explain the above decision?

Different types of data Rugby players - height, weight can be plotted in 2-d. How do you plot hair colour? (Black, Brown, Blonde?) Predicting heart disease - how do you plot blood type? (A, B, O)? In general, how do you deal with categorical data?

The Tennis Problem You are working for the local tennis club. They want a program that will advise inexperienced new members on whether they are likely to enjoy a game today, given the current weather conditions. However they need the program to pop out interpretable rules so they can be sure it s not giving bad advice. They provide you with some historical data...

The Tennis Problem Outlook Temperature Humidity Wind Play Tennis? 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes 14 Rain Mild High Strong No Note: 9 examples say yes, 5 examples say no.

A Decision Tree for the Tennis Problem This tree works for any example in the table try it!

Learning a Decision Tree : Basic recursive algorithm tree learntree( data ) if all examples in data have same label, return leaf node with that label else pick the most important feature, call it F for each possible value v of F data(v) all examples where F == v add branch learntree( data(v) ) endfor return tree endif

Example: partitioning data by wind feature Outlook Temp Humid Wind Play? 2 Sunny Hot High Strong No 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 14 Rain Mild High Strong No Outlook Temp Humid Wind Play? 1 Sunny Hot High Weak No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 13 Overcast Hot Normal Weak Yes 3 examples say yes, 3 say no. 6 examples say yes, 2 examples say no.

Learning a Decision Tree : Basic recursive algorithm tree learntree( data ) if all examples in data have same label, return leaf node with that label else pick the most important feature, call it F for each possible value v of F data(v) all examples where F == v add branch learntree( data(v) ) endfor return tree endif Which is the most important feature?

Thinking in Probabilities... Before the split : 9 yes, 5 no,... p( yes ) = 9 14 0.64 On the left branch : 3 yes, 3 no,... p( yes ) = 3 6 = 0.5 On the right branch : 6 yes, 2 no,... p( yes ) = 6 8 = 0.75 Remember... p( no ) = 1 p( yes )

The Information contained in a variable - Entropy More uncertainty = Less information H(X ) = 1.0

The Information contained in a variable - Entropy Lower uncertainty = More information H(X ) = 0.72193

Entropy The amount of randomness in a variable X is called the entropy. H(X ) = i p(x i ) log p(x i ) (1) The log is base 2, giving us units of measurement bits.

Reducing Entropy = Maximise Information Gain The variable of interest is T (for tennis), taking on yes or no values. Before the split : 9 yes, 5 no,... p( yes ) = 9 14 0.64 In the whole dataset, the entropy is: H(T ) = p(x i ) log p(x i ) i { 5 = 14 log 5 14 + 9 14 14} log 9 = 0.94029 H(T ) is the entropy before we split. See worked example in the supporting material.

Reducing Entropy = Maximise Information Gain H(T ) is the entropy before we split. H(T W = strong) is the entropy of the data on the left branch. H(T W = weak) is the entropy of the data on the right branch. H(T W ) is the weighted average of the two. Choose the feature with maximum value of H(T ) H(T W ). See worked example in the supporting material.

Learning a Decision Tree : the ID3 algorithm tree learntree( data ) if all examples in data have same label, return leaf node with that label else pick the most important feature, call it F for each possible value v of F data(v) all examples where F == v add branch learntree( data(v) ) endfor return tree endif Or, in very simple terms: Step 1. Pick the feature that maximises information gain. Step 2. Recurse on each branch.

The ID3 algorithm function id3( examples ) returns tree T if all the items in examples have the same conclusion, return a leaf node with value = majority conclusion let A be the feature with the largest information gain Create a blank tree T let s(1), s(2), s(3) etc be the data subsets produced by splitting examples on feature A For each subset s(n), tree t(n) = id3( s(n) ) add t(n) as a new branch of T Endfor return T

A Decision Tree for the Tennis Problem Following each path down the tree, we can make up a list of rules. if ( sunny AND high ) NO if ( sunny AND normal ) YES if ( overcast ) YES if ( rain AND strong ) NO if ( rain AND weak ) YES

Overfitting a tree The number of possible paths tells you the number of rules. More rules = more complicated. We could have N rules where N is the size of the dataset. This would mean no generalisation outside of the training data, or the tree is overfitted Overfitting = fine tuning

Overfitting What if it s rainy and hot? Outlook Temperature Humidity Wind Play Tennis? 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes 14 Rain Mild High Strong No

Overfitting How do you know if you ve overfitted? Validation dataset - another dataset that you do not use to train, but just to check whether you ve overfitted or not. How can we avoid it? Stop after a certain depth (i.e. keep the tree short) Post-Prune the final tree... both in order to control validation error

Overfitting

Missing data? Outlook Temperature Humidity Wind Play Tennis? 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Normal No 7 Overcast Cool Normal Yes 8 Sunny High No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 11 Sunny Normal Strong Yes 12 Overcast High Strong Yes 13 Overcast Normal Weak Yes 14 Rain Mild High Strong No Insert average (mean, median or mode) of the available values. Or other more complex strategies such as using Bayes Rule... NEXT WEEK... Ultimately best strategy is problem dependent.

Conclusion Decision Trees provide a flexible and interpretable model. There are many variations on the simple id3 algorithm. Further reading: www.decisiontrees.net. (site written by a former student of this course) Why wasn t the Temperature feature used in the tree? Answer in the next session.