Informal Definition: Telling things apart

Similar documents
Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Decision trees COMS 4771

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

SF2930 Regression Analysis

Learning Decision Trees

the tree till a class assignment is reached

Machine Learning & Data Mining

CS145: INTRODUCTION TO DATA MINING

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Kernel Density Estimation

Learning Decision Trees

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Lecture 7: DecisionTrees

Classification: Decision Trees

Machine Learning 3. week

BAGGING PREDICTORS AND RANDOM FOREST

Randomized Decision Trees

Jeffrey D. Ullman Stanford University

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

CS 6375 Machine Learning

Decision Trees Part 1. Rao Vemuri University of California, Davis

Data Mining Project. C4.5 Algorithm. Saber Salah. Naji Sami Abduljalil Abdulhak

Lecture VII: Classification I. Dr. Ouiem Bchir

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )

Knowledge Discovery and Data Mining

Decision Trees. Introduction. Some facts about decision trees: They represent data-classification models.

Statistics and learning: Big Data

Machine Learning 2nd Edi7on

day month year documentname/initials 1

Induction of Decision Trees

Lecture 7 Decision Tree Classifier

The Solution to Assignment 6

Information Theory & Decision Trees

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Holdout and Cross-Validation Methods Overfitting Avoidance

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Generalization to Multi-Class and Continuous Responses. STA Data Mining I

Data Mining and Analysis: Fundamental Concepts and Algorithms

Classification and Regression Trees

Supervised Learning via Decision Trees

CHAPTER-17. Decision Tree Induction

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

C4.5 - pruning decision trees

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Error Error Δ = 0.44 (4/9)(0.25) (5/9)(0.2)

Introduction to Data Science Data Mining for Business Analytics

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Regression tree methods for subgroup identification I

Midterm, Fall 2003

EECS 349:Machine Learning Bryan Pardo

Lecture 3: Decision Trees

UVA CS 4501: Machine Learning

Classification Using Decision Trees

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Decision Tree Learning Lecture 2

Decision Tree And Random Forest

Decision Tree Learning

Tree-based methods. Patrick Breheny. December 4. Recursive partitioning Bias-variance tradeoff Example Further remarks

26 Chapter 4 Classification

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Decision Tree Learning

Decision Trees Entropy, Information Gain, Gain Ratio

Decision Support. Dr. Johan Hagelbäck.

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Decision Tree Learning

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision Trees (Cont.)

Lecture 3: Decision Trees

Chapter 14 Combining Models

Induction on Decision Trees

Statistical Consulting Topics Classification and Regression Trees (CART)

15-388/688 - Practical Data Science: Decision trees and interpretable models. J. Zico Kolter Carnegie Mellon University Spring 2018

Decision Trees. Tirgul 5

ARBEITSPAPIERE ZUR MATHEMATISCHEN WIRTSCHAFTSFORSCHUNG

Decision T ree Tree Algorithm Week 4 1

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Dan Roth 461C, 3401 Walnut

CSCI 5622 Machine Learning

Tutorial 6. By:Aashmeet Kalra

Decision Tree Learning

Introduction to Machine Learning 2008/B Assignment 5

Decision Tree Learning

Growing a Large Tree

Applied Multivariate and Longitudinal Data Analysis

Harrison B. Prosper. Bari Lectures

Generalization Error on Pruning Decision Trees

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Classification and regression trees

Modern Information Retrieval

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Transcription:

9. Decision Trees

Informal Definition: Telling things apart 2

Nominal data No numeric feature vector Just a list or properties: Banana: longish, yellow Apple: round, medium sized, different colors like red and green but not blue 3

Basic Classification Tree 4

Advantages of Decision Trees Trees can be interpreted Possibility to include prior knowledge 5

Conversion to Binary Tree Every tree can be converted to binary tree Consider binary trees for simplicity 6

Issues when building a decision tree Structure of tree (binary?) What are the right questions to ask? When should be add a node (splitting)? When should we remove a node (pruning)? 7

Types of questions (also for continuous data)

Questions for nominal data (Text!) E.g. for text classification: Is key word x present? Is frequency of word x larger than y? Is a key phrase present? It all depends on your intuition about the task Usually asking for presence/absence or words is a good baseline 9

Application of Decision Trees to Feature Vectors Feature Vector (x 1 x N ) Typ of questions: x i <a a ordinary binary tree 10

Decision boundary induced by ordinary binary tree in 2 dimensions Decision boundaries always parallel to the axis Only very simple decision boundaries can be modelled 11

Decision boundary induced by ordinary binary tree in 3 dimensions 12

Difficult case for ordinary binary tree How does the decision boundary look like? 13

Difficult case for ordinary binary tree Simple questions of type x i <a are not capable of efficiently describing sophisticated decision boundaries 14

Better Decision Boundary What s the corresponding question and the resulting tree? 15

Corresponding Tree and Question However, the number of possible questions of this type is huge? 16

Binary space partition (BSP) trees Type of questions: i 1 a x i i Induced partitioning of feature space: d 17

Sphere trees Questions are d i 1 ( z i x i 2 ) Induced partitioning of feature space: 18

Nonlinear generalizations Type of questions: ( x, a) 0 With Y some nonlinear function May be different at each node Issue: lack of simplicity 19

Adding nodes: splitting

Impurity of a node Do all the data at a leaf belong to 1 class? Example: At one of the present nodes, the distribution is 70% Grapefruit 20% Lemons 10% Apples Introduce P N (w i ) at node N What s a measure to define which node to split next? 21

Misclassification Impurity i( N) 1 max P ( N j j w ) Measures fraction of data that does not belong to the dominant class Turns out to be too greedy for multiple splits 22

Entropy Impurity i( N) P ( w )log P ( w ) N j 2 N j j Very popular 23

Gini Impurity 1 i( N) 1 P N ( w j ) 2 j 2 Very popular as well No practical difference to entropy impurity 24

Comparison of impurity functions: Example for 2-class problem 25

Training Algorithms (Greedy) Until the tree is large enough Test all nodes N: Test all questions Q: Calculate change in impurity i( N) i( N) P i( N ( Q)) (1 P ) i( N ( Q)) Add node/question to achieve largest drop in impurity L L L R N L (Q): new left node N R (Q): new right node P L : fraction of data in N L (Q) 26

Missclassification Rate Training of a CART Tree: Spam-Mail detection (From Hastie et al.) Green: 10-fold cross-validation Red: test set Root node question: frequency of $ sign 27

Influence of Amount of Training Data: Text Classification (From Manning+Schütze) High variability for small amounts of training data Saturation for large amounts of data 28

Pruning a CART-tree (Text Classification) Tune the pruning of a validation set 29

Example for Instability of Tree Example data for a two class problem 30

Example for Instability of Tree 31

Example for Instability of Tree Small changes in the data can results in large changes of the tree however, classification accuracy is usually stable. 32

Summary Decision trees are easy to understand Decision trees can classify both categorical and numerical data, but the output attribute must be categorical Decision tree algorithms are unstable. Trees created from numeric data sets can be quite complex 33