Chapter 3: Decision Tree Learning (part 2)

Similar documents
Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Chapter 3: Decision Tree Learning

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Tree Learning

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Decision Trees.

Decision Trees.

Decision Tree Learning

Typical Supervised Learning Problem Setting

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Notes on Machine Learning for and

Introduction Association Rule Mining Decision Trees Summary. SMLO12: Data Mining. Statistical Machine Learning Overview.

Lecture 3: Decision Trees

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Machine Learning & Data Mining

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Lecture 3: Decision Trees

Machine Learning 2nd Edi7on

Classification and Prediction

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Learning from Observations

Machine Learning in Bioinformatics

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Apprentissage automatique et fouille de données (part 2)

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

CS 6375 Machine Learning

Decision Trees. Tirgul 5

the tree till a class assignment is reached

EECS 349:Machine Learning Bryan Pardo

Decision Tree Learning

Dan Roth 461C, 3401 Walnut

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision Trees / NLP Introduction

Decision Tree Learning and Inductive Inference

Learning Classification Trees. Sargur Srihari

Decision Tree Learning - ID3

Learning Decision Trees

Machine Learning

Learning Decision Trees


Classification: Decision Trees

The Solution to Assignment 6

Decision Tree Learning Lecture 2

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Data Mining in Bioinformatics

Classification: Decision Trees

Classification II: Decision Trees and SVMs

Information Theory & Decision Trees

Machine Learning

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Machine Learning

Decision Tree Learning

CSCI 5622 Machine Learning

Data classification (II)

Decision Trees. Gavin Brown

Tutorial 6. By:Aashmeet Kalra

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Algorithms for Classification: The Basic Methods

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Artificial Intelligence Decision Trees

Classification and regression trees

Decision Trees. Danushka Bollegala

Classification and Regression Trees

Lecture 9: Bayesian Learning

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Decision Tree. Decision Tree Learning. c4.5. Example

Introduction to Machine Learning CMU-10701

Decision Trees Entropy, Information Gain, Gain Ratio

Decision Tree Learning

CSCE 478/878 Lecture 6: Bayesian Learning

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Decision Tree Learning

Decision Trees Part 1. Rao Vemuri University of California, Davis

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Decision Support. Dr. Johan Hagelbäck.

Induction of Decision Trees

Machine Learning 3. week

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

ARTIFICIAL INTELLIGENCE. Supervised learning: classification

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Artificial Intelligence. Topic

10-701/ Machine Learning: Assignment 1

Symbolic methods in TC: Decision Trees

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Lecture 7: DecisionTrees

Einführung in Web- und Data-Science

Inductive Learning. Chapter 18. Why Learn?

CS 498F: Machine Learning Fall 2010

Empirical Risk Minimization, Model Selection, and Model Assessment

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Transcription:

Chapter 3: Decision Tree Learning (part 2) CS 536: Machine Learning Littman (Wu, TA) Administration Books? Two on reserve in the math library. icml-03: instructional Conference on Machine Learning mailing list! (mailing Weka instructions) 1

What is ID3 Optimizing? How would you find a tree that minimizes: misclassified examples? expected entropy? expected number of tests? depth of tree given a fixed accuracy? etc.? How decide if one tree beats another? Hypothesis Space Search by ID3 ID3: representation : trees scoring : entropy search : greedy 2

Hypothesis Space Search by ID3 Hypothesis space is complete! Target function surely in there... Outputs a single hypothesis (which one?) Can't play 20 questions... No back tracking Local minima... Statisically-based search choices Robust to noisy data... Inductive bias prefer shortest tree Inductive Bias in ID3 Note H is the power set of instances X Unbiased? Not really... Preference for short trees, and for those with high information gain attributes near the root Bias is a preference for some hypotheses, rather than a restriction of hypothesis space H Occam s razor: prefer the shortest hypothesis that fits the data 3

Occam's Razor Why prefer short hypotheses? Argument in favor: Fewer short hyps. than long hyps. a short hyp that fits data unlikely to be coincidence a long hyp that fits data might be coincidence Argument opposed: There are many ways to define small sets of hyps e.g., all trees with a prime number of nodes that use attributes beginning with Z What's so special about small sets based on size of hypothesis?? Overfitting Consider adding noisy training example #15: Sunny, Hot, Normal, Strong, PlayTennis = No What effect on earlier tree? 4

Overfitting Consider error of hypothesis h over training data: error train (h) entire distribution D of data: error D (h) Hypothesis h in H overfits training data if there is an alternative hypothesis h in H such that error train (h) < error train (h ), and error D (h) > error D (h ) Overfitting in Learning 5

Overfitting in Learning Avoiding Overfitting How can we avoid overfitting? stop growing when data split not statistically significant grow full tree, then post-prune (DP alg!) How to select best tree: Measure performance over training data Measure performance over separate validation data set MDL: minimize size(tree) + size(misclassifications(tree)) 6

Reduced-Error Pruning Split data into training and validation set Do until further pruning is harmful: 1. Evaluate impact on validation set of pruning each possible node (plus those below it) 2. Greedily remove the one that most improves validation set accuracy produces smallest version of most accurate subtree What if data is limited? Effect of Pruning 7

Rule Post-Pruning 1. Convert tree to equivalent set of rules 2. Prune each rule independently of others 3. Sort final rules into desired sequence for use Perhaps most frequently used method (e.g., C4.5) Converting Tree to Rules 8

The Rules IF (Outlook = Sunny) ^ (Humidity = High) THEN PlayTennis = No IF (Outlook = Sunny) ^ (Humidity = Normal) THEN PlayTennis = Yes Continuous Valued Attributes Create a discrete attribute to test continuous Temp = 82.5 (Temp > 72.3) = t, f Temp: 40 48 60 72 80 90 PlayTennis: No No Yes Yes Yes No 9

Attributes with Many Values Problem: If one attribute has many values compared to the others, Gain will select it Imagine using Date = Jun_3_1996 as attribute One approach: use GainRatio instead GainRatio(S,A) Gain(S,A) / SplitInfo(S,A) SplitInfo(S,A) -S i=1 c S i / S log 2 S i / S where S i is subset of S for which A has value v i Attributes with Costs Consider medical diagnosis, BloodTest has cost $150 robotics, Width_from_1ft has cost 23 sec. How to learn a consistent tree with low expected cost? Find min cost tree. Another approach: replace gain by Tan and Schlimmer (1990) Gain 2 (S,A)/Cost(A) Nunez (1988) [w in [0,1]: importance) (2 Gain(S,A) -1)/(Cost(A)+1) w 10

Unknown Attribute Values Some examples missing values of A? Use training example anyway, sort it If node n tests A, assign most common value of A among other examples sorted to node n assign most common value of A among other examples with same target value assign probability p i to each possible value v i of A (perhaps as above) assign fraction p i of example to each descendant in tree Classify new examples in same fashion 11