EECS 349:Machine Learning Bryan Pardo

Similar documents
the tree till a class assignment is reached

Machine Learning 2nd Edi7on

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Lecture 3: Decision Trees

Learning Decision Trees

Learning Decision Trees

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision Tree Learning

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Machine Learning & Data Mining

Decision Trees.

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision Trees / NLP Introduction

CS 6375 Machine Learning

Decision Trees.

Decision Trees. Gavin Brown

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, )

Chapter 3: Decision Tree Learning

Learning and Neural Networks

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Lecture 3: Decision Trees

Decision Tree Learning

Classification and Prediction

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Classification: Decision Trees

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Algorithms for Classification: The Basic Methods

Learning from Observations. Chapter 18, Sections 1 3 1

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

CS 380: ARTIFICIAL INTELLIGENCE

Decision Trees. CS 341 Lectures 8/9 Dan Sheldon

Classification and Regression Trees

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Decision Trees. Tirgul 5

Decision Tree Learning

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Learning Decision Trees


Decision Tree Learning

Chapter 3: Decision Tree Learning (part 2)

Notes on Machine Learning for and

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Dan Roth 461C, 3401 Walnut

Decision Tree Learning and Inductive Inference

Supervised Learning (contd) Decision Trees. Mausam (based on slides by UW-AI faculty)

Introduction to Machine Learning CMU-10701

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

CSCI 5622 Machine Learning

Decision Tree Learning Lecture 2

Decision Trees. Danushka Bollegala

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Induction of Decision Trees

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

From inductive inference to machine learning

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Learning Classification Trees. Sargur Srihari

Machine Learning 3. week

Classification: Decision Trees

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees Entropy, Information Gain, Gain Ratio

Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze)

CS145: INTRODUCTION TO DATA MINING

Introduction to Artificial Intelligence. Learning from Oberservations

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Machine Learning (CS 419/519): M. Allen, 14 Sept. 18 made, in hopes that it will allow us to predict future decisions

} It is non-zero, and maximized given a uniform distribution } Thus, for any distribution possible, we have:

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision Trees. Introduction. Some facts about decision trees: They represent data-classification models.

Decision Tree And Random Forest

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

UVA CS 4501: Machine Learning

Decision Tree Learning - ID3

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Learning from Examples

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Artificial Intelligence. Topic

Decision Trees Part 1. Rao Vemuri University of California, Davis

Typical Supervised Learning Problem Setting

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

Classification Using Decision Trees

Classification and regression trees

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Empirical Risk Minimization, Model Selection, and Model Assessment

Lecture 7: DecisionTrees

Tutorial 6. By:Aashmeet Kalra

EECS 349: Machine Learning Bryan Pardo

Rule Generation using Decision Trees

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

Decision T ree Tree Algorithm Week 4 1

Bayesian Classification. Bayesian Classification: Why?

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Transcription:

EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1

General Learning Task There is a set of possible examples Each example is an n-tuple of attribute values 1 1 There is a target function that maps X onto some finite set Y The DATA is a set of duples <example, target function values> D Find a hypothesis h such that... X!! = { x,... x } 1 n! x =< a,..., ak f : X Y!!!! x, f ( x ) >,... < x, f ( x ) > } 1 m = { < 1 m!!! x, h( x) f ( x) >

Attribute-based representations Bryan Pardo, EECS 349 Fall 2009 3

Decision Tree Bryan Pardo, EECS 349 Fall 2009 4

Expressiveness of D-Trees Bryan Pardo, EECS 349 Fall 2009 5

Decision Trees represent disjunctions of conjunctions f ( x) = yes iff... (Outlook = Sunny Humidity = Normal) (Outlook = overcast) (Outlook = rain Wind = weak)

Decision Tree Boundaries Bryan Pardo, EECS 349 Fall 2009 7

A learned decision tree Bryan Pardo, EECS 349 Fall 2009 8

Choosing an attribute The more skewed the examples in a bin, the better. We re going to use ENTROPY to as a measure of how skewed each bin is. Bryan Pardo, EECS 349 Fall 2009 9

Counts as probabilities P 1 = probability I will wait for a table P 2 = probability I will NOT wait for a table P 1 = 0.5 P 2 = 0.5 P 1 = 0 P 2 = 1 P 1 = 1 P 2 = 0 P 1 = 0.333 P 2 = 0.667 Bryan Pardo, EECS 349 Fall 2014 10

Information Bryan Pardo, EECS 349 Fall 2009 11

About ID3 A recursive, greedy algorithm to build a decision tree At each step it picks the best variable to split the data on, and then moves on It is greedy because it makes the optimal choice at the current step, without considering anything beyond the current step. This can lead to trouble, if one needs to consider things beyond a single variable (e.g. multiple variables) when making a choice. (Try it on XOR) Bryan Pardo, EECS 349 Fall 2009 12

Decision Tree Learning (ID3) Bryan Pardo, EECS 349 Fall 2009 13

Choosing an attribute in ID3 For each attribute, find the entropy H of the example set AFTER splitting on that example *note, this means taking the entropy of each subset created by splitting on the attribute, and then combining these entropies weighted by the size of each subset. Pick the attribute that creates the lowest overall entropy. Bryan Pardo, EECS 349 Fall 2009 14

Entropy prior to splitting Instances where I waited Instances where I didn t P 1 = probability I will wait for a table P 2 = probability I will NOT wait for a table H 0 P,P 1 2 = P j j log 2 P j = P 1 log 2 P 1 P 2 log 2 P 2 = 1 Bryan Pardo, EECS 349 Fall 2009 15

If we split on Patrons H none H some H full H Patrons = W none H none + W some H some + W full H full = 2 12 0 + 4 12 0 + 6 12 2 6 log 2 Bryan Pardo, EECS 349 Fall 2014 2 6 4 6 log 2 4 6 =.459 16

If we split on Type H Type = W french H french + W italian H italian + W thai H thai + W burger H burger = 2 12 1+ 2 12 1+ 4 12 1+ 4 12 1 = 1 Bryan Pardo, EECS 349 Fall 2009 17

Measuring Performance Bryan Pardo, EECS 349 Fall 2009 18

What the learning curve tells us Bryan Pardo, EECS 349 Fall 2009 19

Rule #2 of Machine Learning The best (i.e. the one that generalizes well) hypothesis almost never achieves 100% accuracy on the training data. (Rule #1 was: you can t learn anything without inductive bias)

Overfitting

Avoiding Overfitting Approaches Stop splitting when information gain is low or when split is not statistically significant. Grow full tree and then prune it when done How to pick the best tree? Performance on training data? Performance on validation data? Complexity penalty? Bryan Pardo, EECS 349 Fall 2009 22

Effect of Reduced Error Pruning Bryan Pardo, EECS 349 Fall 2009 24

C4.5 Algorithm Builds a decision tree from labeled training data Also by Ross Quinlan Generalizes ID3 by Allowing continuous value attributes Allows missing attributes in examples Prunes tree after building to improve generality Bryan Pardo, EECS 349 Fall 2009 25

Used in C4.5 Steps Rule post pruning 1. Build the decision tree 2. Convert it to a set of logical rules 3. Prune each rule independently 4. Sort rules into desired sequence for use Bryan Pardo, EECS 349 Fall 2009 26

Take away about decision trees Used as classifiers Supervised learning algorithms (ID3, C4.5) (mostly) Batch processing Good for situations where The classification categories are finite The data can be represented as vectors of attributes You want to be able to UNDERSTAND how the classifier makes its choices Bryan Pardo, EECS 349 Fall 2009 28