Notes on Machine Learning for and

Similar documents
Decision Tree Learning

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Chapter 3: Decision Tree Learning

Decision Trees.

Decision Trees.

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Chapter 3: Decision Tree Learning (part 2)

Decision Tree Learning

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Lecture 3: Decision Trees

Lecture 3: Decision Trees

Typical Supervised Learning Problem Setting

Decision Tree Learning Lecture 2

Classification: Decision Trees

Machine Learning & Data Mining

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Introduction Association Rule Mining Decision Trees Summary. SMLO12: Data Mining. Statistical Machine Learning Overview.

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Machine Learning 2nd Edi7on

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

the tree till a class assignment is reached

Concept Learning through General-to-Specific Ordering

CS 6375 Machine Learning

Machine Learning 3. week

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

EECS 349:Machine Learning Bryan Pardo

Learning Decision Trees

MODULE -4 BAYEIAN LEARNING

Tutorial 6. By:Aashmeet Kalra

Classification: Decision Trees

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Decision Tree Learning - ID3

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Decision Tree Learning and Inductive Inference

Lecture 7: DecisionTrees

CSCI 5622 Machine Learning

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision Trees. Tirgul 5

Empirical Risk Minimization, Model Selection, and Model Assessment

Induction of Decision Trees

Introduction to Machine Learning CMU-10701

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Tree. Decision Tree Learning. c4.5. Example

Decision Trees Entropy, Information Gain, Gain Ratio

Decision Tree Learning

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Classification and Prediction

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon

Generative v. Discriminative classifiers Intuition

Holdout and Cross-Validation Methods Overfitting Avoidance

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Learning Decision Trees

Computational Learning Theory

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

CSCE 478/878 Lecture 6: Bayesian Learning

Learning with multiple models. Boosting.

Machine Learning

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Machine Learning

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Apprentissage automatique et fouille de données (part 2)

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning

Notes on Machine Learning for and

Decision Trees. Introduction. Some facts about decision trees: They represent data-classification models.

BITS F464: MACHINE LEARNING

Decision Tree Learning

CHAPTER-17. Decision Tree Induction

Decision Trees (Cont.)

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Learning Classification Trees. Sargur Srihari

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Machine Learning

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Version Spaces.

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Dan Roth 461C, 3401 Walnut

Machine Learning in Bioinformatics

Information Theory & Decision Trees

Machine Learning: Exercise Sheet 2

Decision Trees. Gavin Brown

Midterm, Fall 2003

Decision trees COMS 4771

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

Decision Trees / NLP Introduction

Data classification (II)

Selected Algorithms of Machine Learning from Examples

Decision Trees. Robot Image Credit: Viktoriya Sukhanova 123RF.com

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Transcription:

Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect to performance measure P (e.g., accuracy, speed, etc.) based on experience E (direct, indirect, teacher-provided, from exploration). Notation: Instances x 1, x 2,... Target concept C labels instance x with label c(x) Labelled data D is a set of pairs x, c(x) Hypothesis h is a possible target concept that also labels (correctly or incorrectly) each instance x with a label h(x) The inductive learning hypothesis Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. Hypotheses and Version spaces A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example x, c(x) in D. Consistent(h, D) ( x, c(x) D)h(x) = c(x) The version space, V S H,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with all training examples in D. List-Then-Eliminate Algorithm V S H,D {h H Consistent(h, D)} 1. V ersionspace a list containing every hypothesis in H 2. For each training example, x, c(x) remove from V ersionspace any hypothesis h for which h(x) c(x) 3. Output the list of hypotheses in V ersionspace What s wrong with this algorithm? 1

Decision Trees Each internal node tests an attribute Each branch corresponds to attribute value Each leaf node assigns a classification Instances describable by attribute-value pairs Target function is discrete valued Disjunctive hypothesis may be required Possibly noisy training data Top-Down Induction of Decision Trees Main loop: 1. A the best decision attribute for next node n 2. Assign A as decision attribute for node n 3. For each value of A, create new descendant of node n 4. Assign training examples to leaf nodes 5. If training examples are perfectly classified, then stop, else iterate over new leaf nodes ID3 (Quinlan 1986) Best decision attribute maximizes information gain S is a sample of data taken from D Entropy(S) = expected number of bits needed to encode the label c(x) of randomly drawn members of s (under the optimal, shortest-length code) Information theory (Shannon 1951): optimal length code assigns log 2 p bits to message having probability p. Expected number of bits to encode c(x) of random member x of S: H(x) E [number of bits] = E [ log 2 p(c(x))] = c i p(c i (x)) log 2 p(c i (x)) Information theory (Shannon 1951): information gain is the expected reduction in entropy that results from partitioning data (e.g., using attribute A). S v Gain(S, A) Entropy(S) S Entropy(S v) Hypothesis space includes all possible target concepts Outputs a single hypothesis Statisically-based search choices Robust to noisy data Inductive bias: approx prefer shortest tree v V alues(a) 2

Overfitting Consider error of hypothesis h over training data: error train (h) entire distribution D of data: error D (h) Hypothesis h H overfits training data if there is an alternative hypothesis h H such that error train (h) < error train (h ) and How can we avoid overfitting? error D (h) > error D (h ) stop growing when data split not statistically significant grow full tree, then post-prune How to select best tree: Measure performance over training data Measure performance over separate validation data set MDL: minimize size(tree) + size(misclassif ications(tree)) Reduced-Error Pruning Split data into training and validation set Do until further pruning is harmful: 1. Evaluate impact on validation set of pruning each possible node (plus those below it) 2. Greedily remove the one that most improves validation set accuracy produces smallest version of most accurate subtree What if data is limited? Rule Post-Pruning 1. Convert tree to equivalent set of rules 2. Prune each rule independently of others 3. Sort final rules into desired sequence for use Perhaps most frequently used method (e.g., C4.5) Continuous Valued Attributes Create a discrete attribute to test continuous T emperature = 82.5 (T emperature > 72.3) = t, f 3

Attributes with Many Values Problem If attribute has many values, Gain will select it Imagine using Date = Jun 3 1996 as attribute One approach: use GainRatio instead Gain(S, A) GainRatio(S, A) EntropyOf Split(S, A) S v EntropyOf Split(S, A) S log S v 2 S v V alues(a) Unknown Attribute Values What if some examples missing values of A? Use training example anyway, sort through tree If node n tests A, assign most common value of A among other examples sorted to node n assign most common value of A among other examples with same target value assign probability p i to each possible value v i of A assign fraction p i of example to each descendant in tree Classify new examples in same fashion Some Issues in Machine Learning What algorithms can approximate functions well (and when)? How does number of training examples influence accuracy? How does complexity of hypothesis representation impact it? How does noisy data influence accuracy? What are the theoretical limits of learnability? How can prior knowledge of learner help? What clues can we get from biological learning systems? How can systems alter their own representations? 4

Designing a Learning Algorithm Type of training experience games against experts games against self table of correct moves Target Function board move board value Representation of learned function polynomial Learning algorithm linear function of six features artificial neural network gradient descent Completed design linear programming 5