Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Similar documents
Lecture VII: Classification I. Dr. Ouiem Bchir

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Oliver Dürr. Statistisches Data Mining (StDM) Woche 11. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

The Solution to Assignment 6

Introduction to Machine Learning CMU-10701

Machine Learning & Data Mining

Lecture 3: Decision Trees

Bayesian Classification. Bayesian Classification: Why?

Learning Decision Trees

Inductive Learning. Chapter 18. Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker

Classification: Decision Trees

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Learning Decision Trees

Einführung in Web- und Data-Science

Algorithms for Classification: The Basic Methods

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

CS 6375 Machine Learning

Induction on Decision Trees

CS145: INTRODUCTION TO DATA MINING

Machine Learning 2nd Edi7on

Data classification (II)

Rule Generation using Decision Trees

Classification Using Decision Trees


Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Support. Dr. Johan Hagelbäck.

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Lecture 3: Decision Trees

Decision Trees.

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Machine Learning Alternatives to Manual Knowledge Acquisition

Mining Classification Knowledge

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision Trees. Tirgul 5

Decision T ree Tree Algorithm Week 4 1

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision Trees. Gavin Brown

Learning Classification Trees. Sargur Srihari

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision Tree Learning and Inductive Inference

Decision Trees.

the tree till a class assignment is reached

Decision Tree Learning

Administrative notes. Computational Thinking ct.cs.ubc.ca

Midterm: CS 6375 Spring 2018

Classification and Prediction

Chapter 3: Decision Tree Learning

Classification Lecture 2: Methods

Data Mining Part 4. Prediction

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 5. Introduction to Data Mining

EECS 349:Machine Learning Bryan Pardo

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Mining Classification Knowledge

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

DATA MINING LECTURE 10

Lecture 7 Decision Tree Classifier

Classification Part 2. Overview

Decision Tree Learning

Typical Supervised Learning Problem Setting

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

Decision Tree And Random Forest

UVA CS 4501: Machine Learning

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Decision Trees / NLP Introduction

Decision Trees. Danushka Bollegala

Chapter 6: Classification

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification and Regression Trees

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Decision Tree Learning - ID3

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Tools of AI. Marcin Sydow. Summary. Machine Learning

Classification II: Decision Trees and SVMs

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Decision Tree Learning

Symbolic methods in TC: Decision Trees

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Classification and regression trees

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Data Mining vs Statistics

CSCI 5622 Machine Learning

Reminders. HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (

Advanced classifica-on methods

CSCE 478/878 Lecture 6: Bayesian Learning

Modern Information Retrieval

Notes on Machine Learning for and

Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze)

Transcription:

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No Learn Model 10 No Small 90K Yes Tid Attrib1 Attrib2 Attrib3 Class Apply Model 11 No Small 55K? 12 Yes Medium 80K? 13 Yes Large 110K? 14 No Small 95K? 15 No Large 67K?

Examples of Classification Task Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate or fraudulent Classifying secondary structures of protein as alpha helix, beta sheet, or random coil Categorizing news stories as finance, weather, entertainment, sports, etc

Classification Techniques Decision Tree based Methods Rule based Methods Memory based reasoning Neural Networks Naïve Bayes and Bayesian Belief Networks Support Vector Machines

10 Example of a Decision Tree categorical Tid Refund Marital Status categorical Taxable Income continuous Cheat class Splitting Attributes 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes Refund Yes No MarSt Single, Divorced TaxInc < 80K > 80K Married 9 No Married 75K No 10 No Single 90K Yes YES Training Data Model: Decision Tree

10 Another Example of Decision Tree categorical Tid Refund Marital Status categorical Taxable Income continuous 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No Cheat 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes class Married MarSt Yes Single, Divorced Refund No TaxInc < 80K > 80K YES There could be more than one tree that fits the same data!

10 10 Decision Tree Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No Learn Model 10 No Small 90K Yes Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K? 12 Yes Medium 80K? 13 Yes Large 110K? 14 No Small 95K? 15 No Large 67K? Apply Model Decision Tree

10 Apply Model to Test Data Start from the root of tree. Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? Single, Divorced MarSt Married TaxInc < 80K > 80K YES

10 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? Single, Divorced MarSt Married TaxInc < 80K > 80K YES

10 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? Single, Divorced MarSt Married TaxInc < 80K > 80K YES

10 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? Single, Divorced MarSt Married TaxInc < 80K > 80K YES

10 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Yes Refund No No Married 80K? Single, Divorced MarSt Married TaxInc < 80K > 80K YES

10 Apply Model to Test Data Test Data Refund Marital Status Taxable Income Cheat Refund No Married 80K? Yes No Single, Divorced MarSt Married Assign Cheat to No TaxInc < 80K > 80K YES

Decision Trees as a Computer Program Rewrite the previous decision trees as a Ifthen else statement

Decision Boundary Border line between two neighboring regions of different classes is known as decision boundary Decision boundary is parallel to axes because test condition involves a single attribute at a time

Interpretation of a Node in a Tree As a subset of a training data set

Tree Evaluation Test set Ground truth, data labeling, mechanical Turk Confusion Matrix and cost matrix True positive, true negative, false positive, false negative Accuracy Error rates

10 10 Decision Tree Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No Learn Model 10 No Small 90K Yes Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K? 12 Yes Medium 80K? 13 Yes Large 110K? 14 No Small 95K? 15 No Large 67K? Apply Model Decision Tree

Decision Tree Induction Many Algorithms: CART (Classification and Regression Trees) ID3, C4.5, C5.0 Other more scalable algorithms

Tree Induction Greedy strategy. Split the records based on an attribute test that optimizes certain criterion. Issues Determine how to split the records How to specify the attribute test condition? How to determine the best split? Determine when to stop splitting Determine how to cut back if tree is too deep What is wrong with a tree that is too deep?

How to Specify Test Condition? Depends on attribute types Nominal Ordinal Continuous Depends on number of ways to split 2 way split Multi way split Based on the number of discrete values

Entropy Based Evaluation and Splitting Entropy at a given node t: Entropy ( t) = p( j t)log p( j t) j (TE: p( j t) is the relative frequency of class j at node t). Measures impurity of a node: Maximum (log n c ) when records are equally distributed among all classes: implying least information, where n c =the number of classes. Minimum (0): when all records belong to one class, implying most information

Examples for computing Entropy Entropy t) = p( j t)log p( j t) j ( 2 C1 0 C2 6 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Entropy = 0 log 0 1 log 1 = 0 0 = 0 C1 1 C2 5 P(C1) = 1/6 P(C2) = 5/6 Entropy = (1/6) log 2 (1/6) (5/6) log 2 (1/6) = 0.65 C1 2 C2 4 P(C1) = 2/6 P(C2) = 4/6 Entropy = (2/6) log 2 (2/6) (4/6) log 2 (4/6) = 0.92

Splitting Based on INFO... Information Gain: k ni GAIN = Entropy( p) Entropy( i) split i= 1 n Parent Node, p is split into k partitions; n i is number of records in partition i Measures Reduction in Entropy achieved because of the split. Choose the split that achieves most reduction (maximizes GAIN) Used in ID3 and C4.5 Disadvantage: Tends to prefer splits that result in large number of partitions, each being small but pure.

Splitting Based on INFO... Gain Ratio: GainRATIO GAIN Split k = split i= SplitINFO ni SplitINFO = log 1 n ni n Parent Node, p is split into k partitions n i is the number of records in partition i Adjusts Information Gain by the entropy of the partitioning (SplitINFO). Higher entropy partitioning (large number of small partitions) is penalized! Used in C4.5 Designed to overcome the disadvantage of Information Gain

Comparison among Splitting Criteria For a 2 class problem:

Example: Build a Decision Tree Outlook Tempreature Humidity W indy Class Sunny hot high false N sunny hot high true N overcast hot high false P rain mild high false P rain cool normal false P rain cool normal true N overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N