Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Similar documents
Algorithms for Classification: The Basic Methods


Machine Learning 2nd Edi7on

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Learning Decision Trees

Decision Trees Entropy, Information Gain, Gain Ratio

Decision Trees. Gavin Brown

Decision Tree Learning and Inductive Inference

CS 6375 Machine Learning

Classification: Decision Trees

Learning Decision Trees

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Tree Learning

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Classification and Prediction

Learning Classification Trees. Sargur Srihari

Decision Trees.

Decision Tree Learning

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Chapter 3: Decision Tree Learning

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Machine Learning & Data Mining

EECS 349:Machine Learning Bryan Pardo

Decision Trees.

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Lecture 3: Decision Trees

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Decision Tree Learning - ID3

Lecture 3: Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

the tree till a class assignment is reached

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Inductive Learning. Chapter 18. Why Learn?

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

Machine Learning Chapter 4. Algorithms

Decision Trees / NLP Introduction

Dan Roth 461C, 3401 Walnut

Slides for Data Mining by I. H. Witten and E. Frank

Inductive Learning. Chapter 18. Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker

CS145: INTRODUCTION TO DATA MINING

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Decision Tree Learning

Decision Support. Dr. Johan Hagelbäck.

Decision Tree And Random Forest

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Decision Trees. Tirgul 5

Decision Trees Part 1. Rao Vemuri University of California, Davis

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Rule Generation using Decision Trees

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Classification and regression trees

Classification Using Decision Trees

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Data classification (II)

Einführung in Web- und Data-Science

Lecture 7 Decision Tree Classifier

Chapter 3: Decision Tree Learning (part 2)

Decision Trees. Danushka Bollegala

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Typical Supervised Learning Problem Setting

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Classification and Regression Trees

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision Tree Learning

Notes on Machine Learning for and

Introduction Association Rule Mining Decision Trees Summary. SMLO12: Data Mining. Statistical Machine Learning Overview.

CSCE 478/878 Lecture 6: Bayesian Learning

Naïve Bayes Lecture 6: Self-Study -----

Apprentissage automatique et fouille de données (part 2)

Bayesian Classification. Bayesian Classification: Why?

Classification II: Decision Trees and SVMs

Induction on Decision Trees

Machine Learning Recitation 8 Oct 21, Oznur Tastan

The Solution to Assignment 6

Artificial Intelligence. Topic

Induction of Decision Trees

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

The Naïve Bayes Classifier. Machine Learning Fall 2017

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Decision Tree Learning

Symbolic methods in TC: Decision Trees

Information Theory & Decision Trees

ARTIFICIAL INTELLIGENCE. Supervised learning: classification

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Decision Tree Learning

Artificial Intelligence Decision Trees

Mining Classification Knowledge

Lecture 9: Bayesian Learning

CHAPTER-17. Decision Tree Induction

Machine Learning 3. week

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Mining Classification Knowledge

Transcription:

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes" Output Knowledge representation! Algorithms: Basic Methods!

Knowledge Representation Input: Concepts, Instances, Attributes!

Preparing for learning! Components of the input:! Concepts: kinds of things that can be learned" Intelligible and operational concept description" Instances: the individual, independent examples of a concept" More complicated forms of input are possible" Attributes: measuring aspects of an instance" Nominal and numeric " Practical issues: input file format!

What is a concept?! Concept: thing to be learned! Concept description: output of learning scheme! 4 major styles of learning:! 1. Classification learning: predicting a discrete class" 2. Association learning: detecting associations between features" 3. Clustering: grouping similar instances into clusters" 4. Numeric prediction: predicting a numeric quantity"

What s in an example?! Instance: specific type of example! Thing to be classified, associated, or clustered" Individual, independent example of target concept" Characterized by a predetermined set of attributes" Input to learning scheme:! Set of instances/dataset" Represented as a single relation/flat file" Most common form in practical data mining!

What is an attribute?! Each instance is described by a fixed predefined! set of features or attributes! Number of attributes may vary in practice! Existence of an attribute may depend of value of another attribute! Possible attribute types! Nominal, ordinal, interval and ratio" Nominal (categorical) vs. numeric (continuous)"

Jargon! Collection of records training set! Each record contains a set of attributes (class)" Model for the class attribute is a function of the values of other attributes! Previously unseen data assigned to the appropriate class! Test set determines the accuracy of the model on unseen data!

Basic Methods in depth! 1R! Statistical Modeling! Decision Tree! Hands-on Decision Trees!

Simplicity first! Simple algorithms often work surprisingly well! Different kinds of structure exist:! One attribute " All attributes - equal importance" A linear combination " An instance-based representation! Simple logical structures! Method depends on the domain!!

Rudimentary rules! 1R: learns a 1-level decision tree! Set of rules that all test on one particular attribute" Basic version! One branch for each of the attribute s values" Each branch assigns most frequent class" Error rate: proportion of instances that don t belong to the majority class of their corresponding branch" Choose attribute with lowest error rate"

Pseudo-code for 1R! For each attribute!!for each value of the attribute make a rule:!!!count how often each class appears!!!find the most frequent class!!!assign that class to this attribute-value!!calculate the error rate of the rules! Choose the rules with the smallest error rate!!

Weather Data Set! Day Outlook Temp Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No

Summary! 1R was described in a paper by Holte (1993)! 16 datasets! Minimum number of instances was set to 6! 1R s simple rules performed not much worse than much more complex decision trees! Simplicity first pays off!!

Statistical modeling! Opposite of 1R: use all the attributes! Two assumptions: Attributes are! equally important " statistically independent" Knowledge about the value of a particular attribute doesn t tell us anything about the value of another attribute (if the class is known)! Assumptions that are almost never correct! scheme works well in practice!"

Weather Data Set! Day Outlook Temp Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No

Weather Data Counts and Probabilities!

A new day to be classified!

Likelihood of the New Day Outcome!

Bayes s rule! Probability of event H given evidence E:!! A priori probability of H:! Probability of event before evidence has been seen " " A posteriori probability of H:! Probability of event after evidence has been seen"

Naïve Bayes for classification! Classification learning: what s the probability of the class given an instance? Evidence E = instance Event H = class value for instance Naïve Bayes assumption: evidence can be split into independent parts

Evidence: Pr[yes E]=! Pr[Outlook=Sunny yes] x Pr[Temperature=Cool yes] x Pr[Humidity=High yes] x Pr[Windy=True yes] x Pr[yes]!!!!!!Pr[E]! Probabilities for class YES

Summary! Naïve Bayes works amazingly well! Violated independence assumption" Because classification doesn t require accurate probability estimates as long as maximum probability is assigned to correct class! Problem: Adding too many redundant attributes! Example: identical attributes" Conditional Probability visualization:!!http://setosa.io/conditional/!

DECISION TREE INDUCTION! Method for approximating discrete-valued functions! robust to noisy/missing data " can learn non-linear relationships" inductive bias towards shorter trees" "

Decision trees! Divide-and-conquer approach! Nodes involve testing a particular attribute! Attribute value is compared to! Constant" Comparing values of two attributes" Using a function of one or more attributes" Leaves assign classification, set of classifications, or probability distribution to instances! Unknown instance is routed down the tree!

Decision Tree Learning! " Applications:! medical diagnosis ex. heart disease" analysis of complex chemical compounds" classifying equipment malfunction" risk of loan applicants" Boston housing project price prediction"!

DECISION TREE FOR THE CONCEPT Sunburn! Name Hair Height Weight Lotion Result Sarah blonde average light no sunburned (positive) Dana blonde tall average yes none (negative) Alex brown short average yes none Annie blonde short average no sunburned Emily red average heavy no sunburned Pete brown tall heavy no none John brown average heavy no none Katie blonde short light yes none

DECISION TREE FOR THE CONCEPT Sunburn! Copyright 20010, Natasha Balac!

DT for Medical Diagnosis and Prognosis Heart Disease! <= 91 Minimum systolic blood pressure over a 24-hour period following admission to the hospital > 91!!!!!!! Class 2: Early death <=62.5 Age of Patient >62.5 Class 1: Survivors Was there sinus tachycardia? NO Class 1: Survivors Class 2: YES Early death Beriman et. al, 1984

Occam s Razor! The world is inherently simple. Therefore the smallest decision tree that is consistent with the samples is the one that is most likely to identify unknown objects correctly!

Decisions Trees Representation! Each internal node tests an attribute! Each branch corresponds to attribute value! Each leaf node assigns a classification!

When to Consider Decision Trees! Instances describable by attribute--value pairs! each attribute takes a small number of disjoint possible values" Target function has discrete output value! Possibly noisy training data! may contain errors" may contain missing attribute values"

Weather Data Set-Make the Tree!! D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 OvercasHot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 OvercasCool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 OvercasMild High Strong Yes D13 OvercasHot Normal Weak Yes D14 Rain Mild High Strong No

DECISION TREE FOR THE CONCEPT Play Tennis Day OutlookTempHumidityWind PlayTenni D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 OvercasHot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 OvercasCool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 OvercasMild High Strong Yes D13 OvercasHot Normal Weak Yes D14 Rain Mild High Strong No Mitchell, 1997 [Mitchell,1997]

Constructing decision trees! Normal procedure: top down in recursive divideand-conquer fashion! First: attribute is selected for root node and branch is created for each possible attribute value! Then: the instances are split into subsets (one for each branch extending from the node)! Finally: procedure is repeated recursively for each branch, using only instances that reach the branch! Process stops if all instances have the same class!

Induction of Decision Trees! Top-down Method! Main loop:! A pick the ``best'' decision attribute for next node " Assign A as decision- split value attribute for node " For each value of A, create new descendant of node" Sort training examples to leaf nodes " If training examples perfectly classified, " Then STOP " Else iterate over new leaf nodes "

Which is the best attribute?!

Attribute selection! How to choose the best attribute?! Smallest tree" Heuristic: Attribute that produces the purest nodes" Impurity criterion:! Information gain" Increases with the average purity of the subsets produced by the attribute split" Choose attribute that results in greatest information gain!

Computing information! Information is measured in bits! Given a probability distribution, the info required to predict an event is the distribution s entropy! Entropy gives the information required in bits (this can involve fractions of bits!)! Formula for computing the entropy:!

Expected information for attribute Outlook! Outlook = Sunny! Outlook = Overcast! Outlook = Rainy! Total expected information:!!

Computing the information gain! Information gain: information before splitting information after splitting! Gain( Outlook )=info([9,5])-info([2,3],[4,0],[3,2])!!!= 0.940-0.693 = 0.247 bits! Information gain for attributes from weather data:! Gain ( Outlook ) = 0.247 bits" Gain ( Temp ) = 0.029 bits" Gain ( Humidity ) = 0.152 bits" Gain ( Windy ) = 0.048 bits"

Further splits! Gain ( Temp )=0.571 bits Gain ( Humidity )=0.971 Gain( Windy )=0.020 bits

Final product!

Purity measure! Desirable properties! Pure Node -> measure = zero" Impurity maximal -> measure = maximal" Multistage property " decisions can be made in several stages" measure([2,3,4])= measure([2,7])+(7/9) measure([3,4])" Entropy is the only function that satisfies all the properties!!

Highly-branching attributes! Attributes with a large number of values! example: ID code" Subsets more likely to be pure if there is a large number of values! Information gain biased towards attributes with a large number of values" Overfitting "

New version of Weather Data!

Info([9,5]) = 0.940 bits ID Code Attribute Split!

Gain ratio! Modification that reduces its bias! Takes number and size of branches into account when choosing an attribute! Taking the intrinsic information of a split into account" Intrinsic information:! Entropy of distribution of instances into branches " How much info do we need to tell which branch an instance belongs to"

Summary! Algorithm for top-down induction of decision trees! ID3 was developed by Ross Quinlan! C4.5 incorporate! numeric attributes, missing values, and noisy data" CART Breiman, Friedman, Olshen, Stone! Many other!!

Avoid Overfitting! How can we avoid Overfitting:! Stop growing when data split not statistically significant" Grow full tree then post-prune" How to select best tree?! Measure performance over training data" Measure performance over separate validation data set"

Pruning! Pruning simplifies a decision tree to prevent overfitting to noise in the data! Post-pruning:! takes a fully-grown decision tree and discards unreliable parts" Pre-pruning:! stops growing a branch when information becomes unreliable" Post-pruning preferred in practice because of early stopping in pre-pruning!

Converting Tree to rules!

Converting Tree to rules!

Thank you!