Overview. Machine Learning, Chapter 2: Concept Learning

Similar documents
Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning

Machine Learning 2D1431. Lecture 3: Concept Learning

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Concept Learning through General-to-Specific Ordering

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Concept Learning. Space of Versions of Concepts Learned

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Version Spaces.

Concept Learning.

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University.

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

EECS 349: Machine Learning

Fundamentals of Concept Learning

Concept Learning. Berlin Chen References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s teaching materials.

Lecture 2: Foundations of Concept Learning

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

CSCE 478/878 Lecture 2: Concept Learning and the General-to-Specific Ordering

BITS F464: MACHINE LEARNING

EECS 349: Machine Learning Bryan Pardo

Computational Learning Theory

Computational Learning Theory

Answers Machine Learning Exercises 2

Introduction to Machine Learning

Computational Learning Theory

Introduction to Machine Learning

Computational Learning Theory

Topics. Concept Learning. Concept Learning Task. Concept Descriptions

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Tutorial 6. By:Aashmeet Kalra

Data Mining and Machine Learning

Classification (Categorization) CS 391L: Machine Learning: Inductive Classification. Raymond J. Mooney. Sample Category Learning Problem

Decision Tree Learning and Inductive Inference

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Computational Learning Theory (COLT)

Computational Learning Theory

Lecture 3: Decision Trees

Bayesian Learning Features of Bayesian learning methods:

T Machine Learning: Basic Principles

Decision Tree Learning

Machine Learning

Lecture 3: Decision Trees

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Machine Learning

Lecture 9: Bayesian Learning

Course 395: Machine Learning

Learning Classification Trees. Sargur Srihari

CSCE 478/878 Lecture 6: Bayesian Learning

Computational Learning Theory (VC Dimension)

Learning Theory. Machine Learning B Seyoung Kim. Many of these slides are derived from Tom Mitchell, Ziv- Bar Joseph. Thanks!

Decision Tree Learning

Machine Learning 2007: Slides 1. Instructor: Tim van Erven Website: erven/teaching/0708/ml/

Learning Decision Trees

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

The Naïve Bayes Classifier. Machine Learning Fall 2017

Typical Supervised Learning Problem Setting

Learning Decision Trees

Decision Trees. Tirgul 5

CS 6375: Machine Learning Computational Learning Theory

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

CS 6375 Machine Learning

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Machine Learning: Exercise Sheet 2

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Decision Trees / NLP Introduction

MODULE -4 BAYEIAN LEARNING

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Algorithms for Classification: The Basic Methods

Computational learning theory. PAC learning. VC dimension.

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Decision Trees.

Notes on Machine Learning for and

Proving simple set properties...

The Bayesian Learning

EECS 349:Machine Learning Bryan Pardo

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Decision Trees. Danushka Bollegala

B555 - Machine Learning - Homework 4. Enrique Areyan April 28, 2015

a. See the textbook for examples of proving logical equivalence using truth tables. b. There is a real number x for which f (x) < 0. (x 1) 2 > 0.

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

CS173 Strong Induction and Functions. Tandy Warnow

T Machine Learning: Basic Principles

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Abduction in Classification Tasks

Decision Trees.

Classification and Prediction

Introduction to Machine Learning

Bayesian Classification. Bayesian Classification: Why?

10-701/ Machine Learning, Fall

Linear Classifiers: Expressiveness

Lecture Notes in Machine Learning Chapter 4: Version space learning

Computational Learning Theory

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Transcription:

Overview Concept Learning Representation Inductive Learning Hypothesis Concept Learning as Search The Structure of the Hypothesis Space Find-S Algorithm Version Space List-Eliminate Algorithm Candidate-Elimination Algorithm Using Partially Learned Concepts Inductive Bias An Unbiased Learner The Futility of Bias-Free Learning Inductive and Equivalent Deductive Systems Three Learners with Different Bias 1

The Example Windsurfing can be fun, but this depends on the weather. The following data is collected: Sky AirTemp Humidity Wind Water Forecast EnjoySport Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Rainy Cold High Strong Warm Change No Sunny Warm High Strong Cool Change Yes Is there a pattern here - what is the general concept? Concept Learning Representation Inductive Learning... 2

Concept Learning Concept = boolean-valued function defined over a large set of objects or events. Concept learning = Inferring a boolean-valued function from training examples of input and output of the function. How can we represent a concept? Concept Learning Representation Inductive Learning... 3

Representation (1/2) Many possible representations, but here: Ø = empty set Instance x: A collection of attributes (Sky, AirTemp, Humidity, etc.) Target function c: EnjoySport : X { 0,1} Hypothesis h: A conjunction of constraints on the attributes. A constraint can be: a specific value (e.g. Water = Warm) don t care (e.g. Water =?) no value allowed (e.g. Water = Ø) Training example d: An instance x i paired with the target function c, x c ( x i ) = c ( ) = x i i, c ( x i ) 0 1 negative example positive example Concept Learning Representation Inductive Learning... Concept Learning as... 4

Representation (2/2) Instances X: The set of possible instances x. Hypotheses H: The set of possible hypotheses h. Training examples D: The set of possible training examples d. Concept Learning Representation Inductive Learning... Concept Learning as... 5

Inductive Learning Hypothesis Thus, concept learning is to find a h H so that h(x)=c(x) for all x X. Problem: We might not have access to all x! Solution: Find a h that fits the examples, and take the achieved function as an approximation of the true c(x). Concept learning = Inferring a booleanvalued function from training examples of input and output of the function. - member of This principle is called induction. Inductive learning hypothesis: Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. Concept Learning Representation Inductive Learning... Concept Learning as... The structure of H 6

Concept Learning as Search Concept learning is to find a h H so that h(x)=c(x) for all x X. A search problem: search through the hypothesis space H after the best h. Concept learning = Inferring a booleanvalued function from training examples of input and output of the function. - member of Representation Inductive Learning... Concept Learning as... The Structure of H Find-S Algorithm 7

The Structure of H Many algorithms for concept learning organize the search by using a useful structure in the hypothesis space: general-to-specific ordering! Specific An example: h 1 = <Sunny,?,?, Strong,?,?> h 2 = <Sunny,?,?,?,?,?> h 3 = <Sunny,?,?,?, Cool,?> Instance space x 1 x 3 Because h 2 imposes fewer constraints, it will cover more instances x in X than both h 1 and h 3. h 2 is more general than h 1 and h 3. In a similar manner, we can define more-specific-than. x 2 h 1 Hypothesis space h 3 h 2 General Inductive Learning... Concept Learning as... The Structure of H Find-S Algorithm Version Space 8

Find-S Algorithm Finds the most specific hypothesis matching the training example (hence the name). 1. Initialize h to the most specific hypothesis in H 2. For each positive training instance x For each attribute constraint a i in h If the constraint a i in h is satisfied by x Then do nothing Else replace a i in h by the next more general constraint that is satisfied by x 3. Output hypothesis h Concept Learning as... The Structure of H Find-S Algorithm Version Space List-Eliminate... 9

Find-S algorithm example h,,,,, x = Sunny, Warm, Normal, Strong, Warm, Same h Sunny, Warm, Normal, Strong, Warm, Same x = Sunny, Warm, High, Strong, Warm, Same h Sunny, Warm,?, Strong, Warm, Same x = Rainy, Cold, High, Strong, Warm, Change h Sunny, Warm,?, Strong, Warm, Same x = Sunny, Warm, High, Strong, Cool, Change h Sunny, Warm,?, Strong,?,? Concept Learning as... The Structure of H Find-S Algorithm Version Space List-Eliminate... 10

Problems with Find-S There are several problems with the Find-S algorithm: It cannot determine if it has learnt the concept. There might be several other hypotheses that match as well has it found the only one? It cannot detect when training data is inconsistent. We would like to detect and be tolerant to errors and noise. Why do we want the most specific hypothesis? Some other hypothesis might be more useful. Depending on H, there might be several maximally consistent hypotheses, and there is no way for Find-S to find them. All of them are equally likely. Concept Learning as... The Structure of H Find-S Algorithm Version Space List-Eliminate... 11

Version Space Definition: A hypothesis h is consistent with a set of training examples D if and only if h(x)=c(x) for each example x, c ( x ) in D. Version space = the subset of all hypotheses in H consistent with the training examples D. Definition: The version space, denoted VS H,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D. The Structure of H Find-S Algorithm Version Space List-Eliminate... Candidate-Elimination... 12

Representation of Version Space Option 1: List all of the members in the version space. Works only when the hypothesis space H is finite! Option 2: Store only the set of most general members G and the set of most specific members S. Given these two sets, it is possible to generate any member of the version space as needed. VS H,D =version space The Structure of H Find-S Algorithm Version Space List-Eliminate... Candidate-Elimination... 13

List-Eliminate Algorithm 1. VS H,D a list containing every hypothesis in H 2. For each training example, x, c ( x ) remove from VS H,D any hypothesis that is inconsistent with the training example h(x)?c(x) 3. Output the list of hypotheses in VS H,D VS H,D =version space Advantage: Guaranteed to output all hypotheses consistent with the training examples. But inefficient! Even in this simple example, there are 1+4 3 3 3 3 = 973 semantically distinct hypotheses. Find-S Algorithm Version Space List-Eliminate... Candidate-Elimination... Using Partially... 14

Candidate-Elimination Algorithm G maximally general hypothesis in H S maximally specific hypothesis in H VS H,D =version space For each training example d = x, c ( x ) modify G and S so that G and S are consistent with d Version Space List-Eliminate... Candidate-Elimination... Using Partially... Inductive Bias 15

Candidate-Elimination Algorithm (detailed) G maximally general hypotheses in H S maximally specific hypotheses in H For each training example d = x, c ( x ) VS H,D =version space If d is a positive example Remove from G any hypothesis that is inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S. Add to S all minimal generalizations h of s such that h consistent with d Some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S If d is a negative example Remove from S any hypothesis that is inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G. Add to G all minimal specializations h of g such that h consistent with d Some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G 16

Example of Candidate-Elimination S : G : {,,,,, } {?,?,?,?,?,? } :{ Sunny, Warm, Normal, Strong, Warm Same } {?,?,?,?,?,? } S, G : S :{ Sunny, Warm,?, Strong, Warm, Same } G :{?,?,?,?,?,? } x = Sunny, Warm, Normal, Strong, Warm, Same x = Sunny, Warm, High, Strong, Warm, Same x = Rainy, Cold, High, Strong, Warm, Change { Sunny, Warm,?, Strong, Warm Same } { Sunny,?,?,?,?,?,?, Warm,?,?,?,?,?,?,?,?, Same } S :, G :?, S :{ Sunny, Warm,?, Strong,?,? } G :{ Sunny,?,?,?,?,?,?, Warm,?,?,?,? } x = Sunny, Warm, High, Strong, Cool, Change VS H,D =version space G = set of most general S = set of most specific Version Space List-Eliminate... Candidate-Elimination... Using Partially... Inductive Bias 17

Properties of Candidate-Elimination Converges to target concept when No error in training examples Target concept is in H Converges to an empty version space when Inconsistency in training data Target concept cannot be described by hypothesis representation VS H,D =version space G = set of most general S = set of most specific Version Space List-Eliminate... Candidate-Elimination... Using Partially... Inductive Bias 18

Using Partially Learned Concepts S { Sunny, Warm,?,,?,? } : Strong Sunny,?,?, Strong,?,? Sunny,Warm,?,?,?,??, Warm,?, Strong,?,? New instance { Sunny,?,?,?,?,?,?,,?,?,?,? } G : Warm x = Sunny, Warm, Normal, Strong, Cool, Change x = Rainy, Cold, Normal, Light, Warm, Same x = Sunny, Warm, Normal, Light, Warm, Same x = Sunny, Cold, Normal, Strong, Warm, Same Votes: Pos Neg 6 0 0 6 3 3 2 4 VS H,D =version space G = set of most general S = set of most specific List-Eliminate... Candidate-Elimination... Using Partially... Inductive Bias An Unbiased Learner 19

Inductive Bias New example: Each instance is a single integer [1,4] Training examples: x c 1 1 2 0 3 1 VS H,D =version space G = set of most general S = set of most specific There is no way to represent this simple example by using our representation. Why? The hypotheses space H is biased, since it only allow conjunctions of attribute values. Candidate-Elimination... Using Partially... Inductive Bias An Unbiased Learner The Futility of... 20

An Unbiased Learner Solution(?): Let H be the set of all possible subsets of X the power set of X denoted as P(X). It contains 2 X hypotheses, where, X is the number of distinct instances. For our example, where each instance is a single integer [1,4] : VS H,D =version space G = set of most general S = set of most specific P(X)=, 1, 2, 3, 4, 1,2, 1,3, 1,4, 2,3 1,2,3, 1,2,4, 1,3,4 1,2,3,4,, 2,4, 2,3,4 3,4,, Using Partially... Inductive Bias An Unbiased Learner The Futility of... Inductive Bias Revisited 21

The Futility of Bias-free Learning For our example, we get after three training examples: x c 1 1 2 0 3 1 S :{ 1,3 } { 2 1,3,4 } G : = VS H,D =version space G = set of most general S = set of most specific How do we classify a new instance x=4? The only examples that are classified are the training examples themselves. In order to learn the target concept one would have to present every single instance in X as a training example! Each unobserved instance will be classified positive by precisely half the hypotheses in VS H,D and negative by the other half. Inductive Bias An Unbiased Learner The Futility of... Inductive Bias Revisited Inductive and... 22

Inductive Bias Revisited Inductive bias is necessary in order to be able to generalize! Definition: The inductive bias of L is any minimal set of assertions B such that for any target concept c and corresponding training data D ( x X ) ( B D x ) L ( x, D ) [ ] Where yz means that z deductively follows from y. VS H,D =version space G = set of most general S = set of most specific = for all = member of = logical and = implies An Unbiased Learner The Futility of... Inductive Bias Revisited Inductive and... Three Learners... 23

Inductive and equivalent deductive systems VS H,D =version space G = set of most general S = set of most specific The Futility of... Inductive Bias Revisited Inductive and... Three Learners... Summary 24

Three Learners with different Biases 1. Rote learner simply stores instances No bias unable to generalize 2. Candidate-Elimination Bias: Target concept is in hypothesis space H 3. Find-S Bias: Target concept is in hypothesis space H + All instances are negative unless taught otherwise VS H,D =version space G = set of most general S = set of most specific Inductive Bias Revisited Inductive and... Three Learners... Summary 25

Summary Concept learning as search through H Many algorithms use general-to-specific ordering of H Find-S algorithm Version space and version space representations List-Eliminate algorithm Candidate-Elimination algorithm Inductive leaps possible only if learner is biased Inductive learners can be modelled by equivalent deductive systems Inductive and... Three Learners... Summary 26