Induction on Decision Trees

Similar documents
Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Machine Learning Alternatives to Manual Knowledge Acquisition

the tree till a class assignment is reached

Classification: Decision Trees

Learning Decision Trees

Decision Trees. Tirgul 5

Learning Decision Trees

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Rule Generation using Decision Trees

CS 6375 Machine Learning

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification Using Decision Trees

Bayesian Classification. Bayesian Classification: Why?

Decision Trees / NLP Introduction


The Solution to Assignment 6

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Lecture 3: Decision Trees

Algorithms for Classification: The Basic Methods

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Lecture 3: Decision Trees

Decision Tree Learning

Decision Trees.

Bayesian Learning. Bayesian Learning Criteria

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Data classification (II)

Administrative notes. Computational Thinking ct.cs.ubc.ca

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Machine Learning 2nd Edi7on

Decision Trees.

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Machine Learning & Data Mining

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Decision Support. Dr. Johan Hagelbäck.

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Decision Tree Learning - ID3

Mining Classification Knowledge

Inductive Learning. Chapter 18. Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker

Classification and regression trees

Decision Trees Part 1. Rao Vemuri University of California, Davis

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Decision Trees. Danushka Bollegala

Classification and Prediction

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Machine Learning 3. week

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

EECS 349:Machine Learning Bryan Pardo

Bayesian Learning Features of Bayesian learning methods:

Classification and Regression Trees

Decision Tree Learning and Inductive Inference

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Tree. Decision Tree Learning. c4.5. Example

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Mining Classification Knowledge

Dan Roth 461C, 3401 Walnut

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Chapter 6: Classification

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Decision Tree Learning

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

10-701/ Machine Learning: Assignment 1

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Symbolic methods in TC: Decision Trees

Artificial Intelligence. Topic

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

Lecture 9: Bayesian Learning

CC283 Intelligent Problem Solving 28/10/2013

CSCE 478/878 Lecture 6: Bayesian Learning

Decision Tree Learning

Modern Information Retrieval

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Decision Trees. Gavin Brown

Decision Tree And Random Forest

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Learning Classification Trees. Sargur Srihari

Decision T ree Tree Algorithm Week 4 1

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision Tree Learning

Einführung in Web- und Data-Science

Tools of AI. Marcin Sydow. Summary. Machine Learning

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Typical Supervised Learning Problem Setting

Applied Logic. Lecture 4 part 2 Bayesian inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Symbolic methods in TC: Decision Trees

Apprentissage automatique et fouille de données (part 2)

Modern Information Retrieval

Inductive Learning. Chapter 18. Why Learn?

CS145: INTRODUCTION TO DATA MINING

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze)

Chapter 3: Decision Tree Learning

Classification II: Decision Trees and SVMs

Transcription:

Séance «IDT» de l'ue «apprentissage automatique» Bruno Bouzy bruno.bouzy@parisdescartes.fr www.mi.parisdescartes.fr/~bouzy

Outline Induction task ID3 Entropy (disorder) minimization Noise Unknown attribute values Selection criterion

The induction task Formalism: objects with attributes Example: objects = saturday mornings attributes: outlook {sunny, overcast, rain} temperature {cool, mild, hot} humidity {high, normal} windy {true, false}

The induction task One particular saturday: Outlook = overcast Temperature = cool Humidity = normal Windy = false Classes mutually exclusive, here 2 classes: Positive (P) Negative (N)

The induction task Training set: objects whose class is known Goal: Develop a classification rule

A small training set n outlook temperat. humidity windy C 1 sunny hot high false N 2 sunny hot high true N 3 overcast hot high false P 4 rain mild high false P 5 rain cool normal false P 6 rain cool normal true N 7 overcast cool normal true P 8 sunny mild high false N 9 sunny cool normal false P 10 rain mild normal false P 11 sunny mild normal true P 12 overcast mild high true P 13 overcast hot normal false P 14 rain mild high true N

A simple decision tree outlook sunny overcast rain humidity P windy high normal true false N P N P

The induction task If the attributes are adequate, it is possible to build a correct decision tree. Many correct decision trees are possible. Correctly classify unseen objects? (it depends...) Between 2 correct decision trees, choose the simplest one.

ID3 Systematical approach: Generate all decision trees and choose the simplest Possible for small induction tasks only ID3 approach: Many objects, many attributes. A reasonably good decision tree is required. Use the entropy minimization principle to select the «best» attribute

ID3 Result: Correct decision trees are found. Training sets of 30,000 examples Examples with 50 atttributes No convergence garantee

ID3 How to form a DT for a set C of objects? T = test of the value of a given attribute on an object The possible values (outcomes) are: O 1, O 2,..., O w. Partition = {C 1, C 2,..., C w } of C. C i contains objects of C whose value (outcome) is O i.

A structuring tree of C T O 1 O 2... O w C 1 C 2... C w

2 assumptions: Choice of the test (1) the test set is in the proportion of the training set: p: number of positive (+) examples n: number of negative (-) examples P + : probability to be positive = p/(p+n) P - : probability to be negative = n/(p+n) (2) Information gain based on the entropy E(p, n): E(p, n) = - P + log(p + ) - P - log(p - ) (entropy disorder)

Choice of the test A attribute with values in {A 1, A 2,..., A w } C = {C 1, C 2,..., C w } objects in C i have A = A i. C i has p i objects in P and n i objects in N. E(p i, n i ) = entropy of of C i.

Entropy function A measure of disorder For x in ]0, 1[ : E(x) = -xlog(x) (1-x)log(1-x) E(0) = E(1) = 0 No disorder E is a bell function maximum for x=1/2 (maximal disorder) Vertical in 0 and 1. E(1/2) = log(2) 0.7 (... approximate values: log(3) 1.1 log(4) 1.4 log(5) 1.6 log(7) 2)

Entropy function p positive objects and n negative objects... What is the entropy E(p n) of the proportion (p n)? E(p n) = - p/(p+n)log(p/(p+n)) - n/(p+n)log(n/(p+n)) = log(p+n) - p/(p+n)log(p) - n/(p+n)log(n)

Choice of the test «Entropy a priori» (Eap) of attribute A: A measure of what could be the average entropy if we ask the value of attribute A A weighted sum of the entropies associated to each value of A The weight of value Ai is in proportion of the number of objects with value Ai Eap(A) = Σ i E(p i, n i )(p i +n i )/(p+n) Choose attribute A* = argmin b Eap(b) (i.e. looking for the attribute that minimizes disorder...)

Choice of the test Example, the entropy «a priori» of each attribute Eap(outlook) = 0.45 Eap(temperature) = 0.65 Eap(humidity) = 0.55 Eap(windy) = 0.65 ID3 chooses «outlook» as the DT root attribute.

ID3 Complexity: O ( C. A.D) C : size of the training set A : number of attributes D : depth of the decision tree

Noise Error in attribute values Object 1. outlook = overcast 1 and 3 identical, but belong to different classes. Misclassification: Object 3 corrupted to belong to N The DT becomes complex (12 nodes)

Noise Two requirements: (R1) Being able to work with inadequate attributes (R2) Being able to decide that testing further attributes will not improve the predictive accuracy of the DT.

Noise What to do when an attribute is inadequate or irrelevant? Create a leaf with which kind of value? Most numerous class: P or N Probability of belonging to P

Unknown attribute values 2 questions: How to build the DT? How to deal them during classification?

Unknown attribute values How to build the DT? Bayesian approach - DT approach - «most common value» approach - «unknown» as a value - - the «proportion» approach ++

Unknown attribute values Assume the value of A is unkown for few objects (= '?') p u number of objects in P with A unknown n u number of objects in N with A unknown Objects with unknown values are distributed across the values of in proportion the relative frequency of these values in C p i := p i + p u r i where r i = (p i +n i )/((p+n)-(p u - n u )) (number of objects with value Ai: p i +n i ) (Number of objects with A value known: (p+n)-(p u - n u ))

Summary Induction task = find out DT for classification 2 classes, ~1000 attributes, ~50 values Choice of root test based on information theory Minimization of entropy Noise Unknown attribute values Approximate method

Reference J.R. Quinlan, «Induction on decision trees», Machine Learning (1986)