Classification and Prediction

Similar documents
Decision Tree Learning

Lecture 3: Decision Trees

Lecture 3: Decision Trees

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Machine Learning 2nd Edi7on

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Decision Tree Learning

Decision Trees.

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

CS 6375 Machine Learning

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision Trees.

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

CSE 5243 INTRO. TO DATA MINING

Chapter 3: Decision Tree Learning

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Learning Decision Trees

Decision Trees. Tirgul 5

CS145: INTRODUCTION TO DATA MINING

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Decision Tree Learning and Inductive Inference

Chapter 3: Decision Tree Learning (part 2)

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Learning Decision Trees

Typical Supervised Learning Problem Setting

EECS 349:Machine Learning Bryan Pardo

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision Tree Learning

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Lecture 7 Decision Tree Classifier

Decision Tree Learning - ID3

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

the tree till a class assignment is reached

Machine Learning & Data Mining

Learning Classification Trees. Sargur Srihari

Decision Tree And Random Forest

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Dan Roth 461C, 3401 Walnut

Decision Trees. Gavin Brown

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Classification: Decision Trees

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

CSCI 5622 Machine Learning

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

The Solution to Assignment 6

Decision Tree. Decision Tree Learning. c4.5. Example

Rule Generation using Decision Trees

Classification and regression trees

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Classification Using Decision Trees


Decision Trees Part 1. Rao Vemuri University of California, Davis

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Decision Tree Learning

Machine Learning Recitation 8 Oct 21, Oznur Tastan

UNIT 6 Classification and Prediction

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Notes on Machine Learning for and

Induction of Decision Trees

Symbolic methods in TC: Decision Trees

Classification: Decision Trees

Decision Trees / NLP Introduction

Einführung in Web- und Data-Science

Decision Support. Dr. Johan Hagelbäck.

Decision T ree Tree Algorithm Week 4 1

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Apprentissage automatique et fouille de données (part 2)

Data classification (II)

Inductive Learning. Chapter 18. Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker

CHAPTER-17. Decision Tree Induction

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Introduction to Machine Learning CMU-10701

Introduction Association Rule Mining Decision Trees Summary. SMLO12: Data Mining. Statistical Machine Learning Overview.

C4.5 - pruning decision trees

Inductive Learning. Chapter 18. Why Learn?

Algorithms for Classification: The Basic Methods

Decision Trees. Danushka Bollegala

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Artificial Intelligence. Topic

10-701/ Machine Learning: Assignment 1

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Induction on Decision Trees

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

Intuition Bayesian Classification

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Classification and Regression Trees

Administrative notes. Computational Thinking ct.cs.ubc.ca

ARTIFICIAL INTELLIGENCE. Supervised learning: classification

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

Machine Learning 3. week

Decision Trees Entropy, Information Gain, Gain Ratio

Machine Learning in Bioinformatics

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon

Transcription:

Classification

Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model continuous-valued functions Predict the economic growth in 2015 Jian Pei: CMPT 741/459 Classification (1) 2

Classification: A 2-step Process Model construction: describe a set of predetermined classes Training dataset: tuples for model construction Each tuple/sample belongs to a predefined class Classification rules, decision trees, or math formulae Model application: classify unseen objects Estimate accuracy of the model using an independent test set Acceptable accuracy à apply the model to classify tuples with unknown class labels Jian Pei: CMPT 741/459 Classification (1) 3

Model Construction Training Data Classification Algorithms Name Rank Years Tenured Mike Ass. Prof 3 No Mary Ass. Prof 7 Yes Bill Prof 2 Yes Jim Asso. Prof 7 Yes Dave Ass. Prof 6 No Anne Asso. Prof 3 No Classifier (Model) IF rank = professor OR years > 6 THEN tenured = yes Jian Pei: CMPT 741/459 Classification (1) 4

Model Application Classifier Testing Data Name Rank Years Tenured Tom Ass. Prof 2 No Merlisa Asso. Prof 7 No George Prof 5 Yes Joseph Ass. Prof 7 Yes Unseen Data (Jeff, Professor, 4) Tenured? Jian Pei: CMPT 741/459 Classification (1) 5

Supervised/Unsupervised Learning Supervised learning (classification) Supervision: objects in the training data set have labels New data is classified based on the training set Unsupervised learning (clustering) The class labels of training data are unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data Jian Pei: CMPT 741/459 Classification (1) 6

Data Preparation Data cleaning Preprocess data in order to reduce noise and handle missing values Relevance analysis (feature selection) Remove the irrelevant or redundant attributes Data transformation Generalize and/or normalize data Jian Pei: CMPT 741/459 Classification (1) 7

Measurements of Quality Prediction accuracy Speed and scalability Construction speed and application speed Robustness: handle noise and missing values Scalability: build model for large training data sets Interpretability: understandability of models Jian Pei: CMPT 741/459 Classification (1) 8

Decision Tree Induction Decision tree representation Construction of a decision tree Inductive bias and overfitting Scalable enhancements for large databases Jian Pei: CMPT 741/459 Classification (1) 9

Decision Tree A node in the tree a test of some attribute A branch: a possible value of the attribute Classification Start at the root Test the attribute Move down the tree branch Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Strong Weak No Yes No Yes Jian Pei: CMPT 741/459 Classification (1) 10

Training Dataset Outlook Temp Humid Wind PlayTennis Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rain Mild High Strong No Jian Pei: CMPT 741/459 Classification (1) 11

Appropriate Problems Instances are represented by attribute-value pairs Extensions of decision trees can handle realvalued attributes Disjunctive descriptions may be required The training data may contain errors or missing values Jian Pei: CMPT 741/459 Classification (1) 12

Basic Algorithm ID3 Construct a tree in a top-down recursive divideand-conquer manner Which attribute is the best at the current node? Create a node for each possible attribute value Partition training data into descendant nodes Conditions for stopping recursion All samples at a given node belong to the same class No attribute remained for further partitioning Majority voting is employed for classifying the leaf There is no sample at the node Jian Pei: CMPT 741/459 Classification (1) 13

Which Attribute Is the Best? The attribute most useful for classifying examples Information gain and gini index Statistical properties Measure how well an attribute separates the training examples Jian Pei: CMPT 741/459 Classification (1) 14

Entropy Measure homogeneity of examples Entropy ( S) i= 1 S is the training data set, and pi is the proportion of S belong to class i The smaller the entropy, the purer the data set c p i log 2 p i Jian Pei: CMPT 741/459 Classification (1) 15

Information Gain The expected reduction in entropy caused by partitioning the examples according to an attribute Gain( S, A) Entropy ( S) v Values( A) S S v Entropy ( S Value(A) is the set of all possible values for attribute A, and S v is the subset of S for which attribute A has value v v ) Jian Pei: CMPT 741/459 Classification (1) 16

Example 9 9 5 5 Entropy ( S) = log2 log2 14 14 14 14 = 0.94 Gain( S, Wind) = Entropy ( S) = Entropy ( S) = 0.94 8 14 8 14 0.811 Engropy ( S 6 14 1.00 = v { Weak, Strong} Weak ) 0.048 Outlook Temp Humid Wind PlayTenni s Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rain Mild High Strong No Engropy ( S Jian Pei: CMPT 741/459 Classification (1) 17 6 14 S S v Entropy ( S Strong ) v )

Hypothesis Space Search in Decision Tree Building Hypothesis space: the set of possible decision trees ID3: simple-to-complex, hill-climbing search Evaluation function: information gain Jian Pei: CMPT 741/459 Classification (1) 18

Capabilities and Limitations The hypothesis space is complete Maintains only a single current hypothesis No backtracking May converge to a locally optimal solution Use all training examples at each step Make statistics-based decisions Not sensitive to errors in individual example Jian Pei: CMPT 741/459 Classification (1) 19

Natural Bias The information gain measure favors attributes with many values An extreme example Attribute date may have the highest information gain A very broad decision tree of depth one Inapplicable to any future data Jian Pei: CMPT 741/459 Classification (1) 20

Alternative Measures Gain ratio: penalize attributes like date by incorporating split information SplitInfor mation( S, A) log Split information is sensitive to how broadly and uniformly the attribute splits the data Gain ratio can be undefined or very large i= 1 Only test attributes with over average gain c Si S 2 Si S Gain( S, A) GainRatio ( S, A) SplitInformation( S, A) Jian Pei: CMPT 741/459 Classification (1) 21

Measuring Inequality Gini index Lorenz Curve X-axis: quintiles Y-axis: accumulative share of income earned by the plotted quintile Gap between the actual lines and the mythical line: the degree of inequality Gini = 0, even distribution Gini = 1, perfectly unequal The greater the distance, the more unequal the distribution Jian Pei: CMPT 741/459 Classification (1) 22

Gini Index (Adjusted) A data set S contains examples from n classes gini( T) = 1 n p 2 j j= 1 p j is the relative frequency of class j in S A data set S is split into two subsets S 1 and S 2 with sizes N 1 and N 2 respectively ( ) N1 ( ) N 2 gini split T = gini T1 + gini( T 2) N N The attribute provides the smallest ginisplit(t) is chosen to split the node Jian Pei: CMPT 741/459 Classification (1) 23

Extracting Classification Rules Classification rules can be extracted from a decision tree Each path from the root to a leaf à an IF- THEN rule All attribute-value pair along a path form a conjunctive condition The leaf node holds the class prediction IF age = <=30 AND student = no THEN buys_computer = no Rules are easy to understand Jian Pei: CMPT 741/459 Classification (1) 24

Inductive Bias The set of assumptions that, together with the training data, deductively justifies the classification to future instances Preferences of the classifier construction Shorter trees are preferred over longer trees Trees that place high information gain attributes close to the root are preferred Jian Pei: CMPT 741/459 Classification (1) 25

Why Prefer Short Trees? Occam s razor: prefer the simplest hypothesis that fits the data Fewer short trees than long trees A short tree is less likely to be a statistical coincidence One should not increase, beyond what is necessary, the number of entities required to explain anything Also known as the principle of parsimony Jian Pei: CMPT 741/459 Classification (1) 26

Overfitting A decision tree T may overfit the training data if there exists an alternative tree T such that T has a higher accuracy than T over the training examples, but T has a higher accuracy than T over the entire distribution of data Why overfitting? Noise data T Bias in training data All data Training data T Jian Pei: CMPT 741/459 Classification (1) 27

Avoid Overfitting Prepruning: stop growing the tree earlier Difficult to choose an appropriate threshold Postpruning: remove branches from a fully grown tree Use an independent set of data to prune Key: how to determine the correct final tree size Jian Pei: CMPT 741/459 Classification (1) 28

Determine the Final Tree Size Separate training (2/3) and testing (1/3) sets Use cross validation, e.g., 10-fold cross validation Use all the data for training Apply a statistical test (e.g., chi-square) to estimate whether expanding or pruning a node may improve the entire distribution Use minimum description length (MDL) principle halting growth of the tree when the encoding is minimized Jian Pei: CMPT 741/459 Classification (1) 29

Enhancements Allow for attributes of continuous values Dynamically discretize continuous attributes Handle missing attribute values Attribute construction Create new attributes based on existing ones that are sparsely represented Reduce fragmentation, repetition, and replication Jian Pei: CMPT 741/459 Classification (1) 30

To-Do List Read Chapters 8.1-8.2 Figure out how to use decision tree for classification in Weka Jian Pei: CMPT 741/459 Classification (1) 31