[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Similar documents
Concept Learning through General-to-Specific Ordering

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Version Spaces.

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning

Fundamentals of Concept Learning

Concept Learning. Space of Versions of Concepts Learned

Overview. Machine Learning, Chapter 2: Concept Learning

Concept Learning.

Machine Learning 2D1431. Lecture 3: Concept Learning

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Lecture 2: Foundations of Concept Learning

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University.

EECS 349: Machine Learning

Concept Learning. Berlin Chen References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s teaching materials.

CSCE 478/878 Lecture 2: Concept Learning and the General-to-Specific Ordering

Computational Learning Theory

BITS F464: MACHINE LEARNING

Computational Learning Theory

Computational Learning Theory

Classification (Categorization) CS 391L: Machine Learning: Inductive Classification. Raymond J. Mooney. Sample Category Learning Problem

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Introduction to Machine Learning

Answers Machine Learning Exercises 2

EECS 349: Machine Learning Bryan Pardo

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Computational Learning Theory

Data Mining and Machine Learning

Introduction to Machine Learning

Notes on Machine Learning for and

Topics. Concept Learning. Concept Learning Task. Concept Descriptions

CSCE 478/878 Lecture 6: Bayesian Learning

Lecture 3: Decision Trees

Lecture 3: Decision Trees

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

CS 6375 Machine Learning

Decision Tree Learning

Lecture 9: Bayesian Learning

Learning Decision Trees

Tutorial 6. By:Aashmeet Kalra

Decision Tree Learning and Inductive Inference

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Machine Learning

Learning Decision Trees

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Knowledge representation DATA INFORMATION KNOWLEDGE WISDOM. Figure Relation ship between data, information knowledge and wisdom.

Machine Learning

MODULE -4 BAYEIAN LEARNING

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Bayesian Learning Features of Bayesian learning methods:

Computational Learning Theory (COLT)

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Decision Trees.

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)

Decision Tree Learning

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Computational learning theory. PAC learning. VC dimension.

AI Programming CS S-09 Knowledge Representation

The Bayesian Learning

EECS 349:Machine Learning Bryan Pardo

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Introduction to Machine Learning

Linear Classifiers: Expressiveness

Computational Learning Theory. CS534 - Machine Learning

Decision Trees.

Learning Classification Trees. Sargur Srihari

Logical Agents. Knowledge based agents. Knowledge based agents. Knowledge based agents. The Wumpus World. Knowledge Bases 10/20/14

Natural Deduction. Formal Methods in Verification of Computer Systems Jeremy Johnson

Computational Learning Theory

Computational Learning Theory

Dan Roth 461C, 3401 Walnut

Truth-Functional Logic

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Web-Mining Agents Computational Learning Theory

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

CS 6375: Machine Learning Computational Learning Theory

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Computational Learning Theory. Definitions

Computational Learning Theory

Decision Trees. Tirgul 5

Algorithms for Classification: The Basic Methods

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Machine Learning 2007: Slides 1. Instructor: Tim van Erven Website: erven/teaching/0708/ml/

Warm-Up Problem. Is the following true or false? 1/35

1. Courses are either tough or boring. 2. Not all courses are boring. 3. Therefore there are tough courses. (Cx, Tx, Bx, )

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Course 395: Machine Learning

cse541 LOGIC FOR COMPUTER SCIENCE

Deductive Systems. Lecture - 3

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Name (NetID): (1 Point)

Propositional Logic: Models and Proofs

Natural Deduction for Propositional Logic

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Applied Logic. Lecture 4 part 2 Bayesian inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Transcription:

1 CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC ORDERING [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specific ordering over hypotheses Version spaces and candidate elimination algorithm Picking new examples The need for inductive bias Note: simple approach assuming no noise, illustrates key concepts

2 The Concept Learning Problem Concept: Subset of objects defined over a set Boolean-valued function defined over a set Set of instances Syntactic definition Concept Learning Inferring a Boolean-valued function from training examples of its input and output Automatic inference of the general definition of some concept, given examples labelled as members or non-members of the concept Given: A set E = {e 1, e 2,..., e n } of training instances of concepts, each labelled with the name of a concept C 1, C 2,..., C k to which it belongs Determine: The definitions of each of C 1, C 2,..., C k which correctly cover E. Each definition is a concept description

3 Example: Learning Conjunctive Boolean Concepts Instances space: {0,1} n Concept is binary function c : {0,1} n {0,1} Inputs: n-bit patterns Outputs: 0 or 1 C = set of all c which have conjunctive representation Learning task: Identify a conjunctive concept that is consistent with the examples C <X, c(x)> Learner c C

4 Example: Learning Conjunctive Boolean Concepts Learning algorithm: 1. Initialize: L = {x 1, x 1,..., x n, x n } 2. Predict the label on the input X based on the conjunction of literals in L 3. If a mistake is made, eliminate the offending literals from L Theorem: The above algorithm is guaranteed to learn any conjunctive Boolean concept given a non-contradictory sequence of examples in a noise-free environment. The bound on the number of mistakes is n + 1.

Concept Learning Learning Input Output Functions 5 Target function f F: unknown to the learner Hypothesis h H, about what f might be H Hypothesis space Instance space X: domain of f, h Output space Y : range of f, h Example: an ordered pair (x, y), x X and f(x) = y Y F and H may or may not be the same! Training set E: a multi-set of examples Learning algorithm L: a procedure which given some E, outputs an h H

Dimensions of Concept Learning 6 Representation: 1. Instances Symbolic Numeric 2. Hypotheses (i.e., concept description) Attribute-value (propositional logic) Relational (first-order logic) Semantic associated with both representations Level of learning Symbolic Sub-symbolic Method of learning 1. Bottom-up (covering) 2. Top-down

Case Study: Concept of EnjoySport 7 Sky Temp Humid Wind Water Forecast EnjoySport Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Rainy Cold High Strong Warm Change No Sunny Warm High Strong Cool Change Yes What is the general concept? Representing Hypotheses (Many possible representations) Here, h is conjunction of constraints on attributes Each constraint can be a specific value (e.g., Water = Warm) don t care (e.g., Water =? ) no value allowed (e.g., Water= ) For example, Sky AirTemp Humid Wind Water Forecast Sunny?? Strong? Same

8 Prototypical Concept Learning Task Given: Instances X: Possible days, each described by the attributes Sky, AirTemp, Humidity, Wind, Water, Forecast Target function c: EnjoySport : X {0,1} Hypotheses H: Conjunctions of literals. E.g.?, Cold, High,?,?,?. Training examples D: Positive and negative examples of the target function x 1, c(x 1 ),... x m, c(x m ) Determine: A hypothesis h in H such that h(x) = c(x) for all x in D. The inductive learning hypothesis: Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

Instance, Hypotheses, and More-General-Than 9 Instances X Hypotheses H Specific x 1 x 2 h 1 h 2 h 3 General x 1 = <Sunny, Warm, High, Strong, Cool, Same> x = <Sunny, Warm, High, Light, Warm, Same> 2 h 1 = <Sunny,?,?, Strong,?,?> h = <Sunny,?,?,?,?,?> 2 h = <Sunny,?,?,?, Cool,?> 3 Find-S Algorithm 1. Initialize h to the most specific hypothesis in H 2. For each positive training instance x For each attribute constraint a i in h If the constraint a i in h is satisfied by x Then do nothing Else replace a i in h by the next more general constraint that is satisfied by x 3. Output hypothesis h

10 Hypothesis Space Search by Find-S Instances X Hypotheses H - x 3 h 0 h 1 Specific x + 1 x+ 2 h 2,3 x+ 4 h 4 General x = <Sunny Warm Normal Strong Warm Same>, + 1 x 2 = <Sunny Warm High Strong Warm Same>, + x 3 = <Rainy Cold High Strong Warm Change>, - x = <Sunny Warm High Strong Cool Change>, + 4 h = <,,,,, > 0 h 1 = <Sunny Warm Normal Strong Warm Same> h 2 = <Sunny Warm? Strong Warm Same> h = <Sunny Warm? Strong Warm Same> 3 h = <Sunny Warm? Strong?? > 4 Complaints about Find-S Can t tell whether it has learned concept Can t tell when training data inconsistent Picks a maximally specific h (why?) Depending on H, there might be several!

11 Version Spaces A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example x, c(x) in D. Consistent(h, D) ( x, c(x) D) h(x) = c(x) The version space, V S H,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with all training examples in D. V S H,D {h H Consistent(h, D)} The List-Then-Eliminate Algorithm: 1. V ersionspace a list containing every hypothesis in H 2. For each training example, x, c(x) Remove from V ersionspace any hypothesis h for which h(x) c(x) 3. Output the list of hypotheses in V ersionspace

Example Version Space 12 S: { <Sunny, Warm,?, Strong,?,?> } <Sunny,?,?, Strong,?,?> <Sunny, Warm,?,?,?,?> <?, Warm,?, Strong,?,?> G: { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?> } Representing Version Spaces The General boundary, G, of version space V S H,D is the set of its maximally general members The Specific boundary, S, of version space V S H,D is the set of its maximally specific members Every member of the version space lies between these boundaries V S H,D = {h H ( s S)( g G)(g g h g s)} where x g y means x is more general or equal to y

Candidate Elimination Algorithm 13 G maximally general hypotheses in H S maximally specific hypotheses in H For each training example d, do If d is a positive example Remove from G any hypothesis inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that 1. h is consistent with d, and 2. some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S

14 Candidate Elimination Algorithm (Continued) If d is a negative example Remove from S any hypothesis inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that 1. h is consistent with d, and 2. some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G

15 Example Trace S 0 : {<Ø, Ø, Ø, Ø, Ø, Ø>} G 0 : {<?,?,?,?,?,?>}

Selecting New Training Instances 16 We assumed that teacher provided examples to learner What if learner select its own instances for learning? 1. A bunch of new instances are given to learner, without classification as 0/1 2. Learner selects a new instance Such selected instance is called a query How to select new instance? Choose the one that comes closest to matching half of the hypotheses in the version space 3. Learner requests teacher for the correct classification of query 4. Learner updates the version space accordingly S: { <Sunny, Warm,?, Strong,?,?> } <Sunny,?,?, Strong,?,?> <Sunny, Warm,?,?,?,?> <?, Warm,?, Strong,?,?> G: { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?> }

How Should These Be Classified? 17 S: { <Sunny, Warm,?, Strong,?,?> } <Sunny,?,?, Strong,?,?> <Sunny, Warm,?,?,?,?> <?, Warm,?, Strong,?,?> G: { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?> } Sunny Warm Normal Strong Cool Change Rainy Cool Normal Light Warm Same Sunny Warm Normal Light Warm Same What Justifies this Inductive Leap? + Sunny Warm Normal Strong Cool Change + Sunny Warm Normal Light Warm Same S : Sunny Warm Normal??? Why believe we can classify the unseen Sunny Warm Normal Strong Warm Same

Convergence and Bias 18 Candidate-Elimination algorithm converge toward the true target concept, provided 1. There are no errors in the training examples If there is error: The algorithm will remove the true target concept from the version space, because it will eliminate all hypotheses that are inconsistent with each training example. 2. H contains some h that correctly describes the target concept If not: Then see example below for learning disjunctive concepts such as Sky = Sunny or Sky = Cloudy Sky Temp Humid Wind Water Forecast EnjoySport Sunny Warm Normal Strong Cool Change Yes Cloudy Warm Normal Strong Cool Change Yes Rainy Warm Normal Strong Cool Change No S 2 :? Warm Normal Strong Cool Change S 2 is overly general since it covers the third training instance Learner is biased to consider only conjunctive hypotheses. It cannot learn disjunctive concepts.

An Un-Biased Learner 19 Idea: Choose H that expresses every teachable concept (i.e., H is the power set of X) Consider H = disjunctions, conjunctions, negations over previous H. E.g., Sunny Warm Normal???????? Change S G What are S, G in this case? Problem: Learner is unable to generalize beyond the observed training examples 1. S boundary: disjunction of observed positives examples 2. G boundary: negated disjunction of observed negative examples Only the observed training examples will be un-ambiguously classified by S and G Only the observed training examples will be unanimously classified by the version space

20 Learning and Bias Absolute bias: Restricted hypothesis space bias Weaker bias: more open to experience, more expressive hypothesis space, lower generalizability Stronger bias: higher generalizability Implicit vs explicit bias Preferential bias: Selection based on some ordering criteria Occam s Razor: Simpler (shorter) hypotheses preferred Learning in practice requires a tradeoff between complexity of hypothesis space and goodness of fit h Example There is an infinite number of functions that match any finite number of training examples! x Bias free function learning is impossible!

Inductive Bias 21 Consider concept learning algorithm L instances X, target concept c training examples D c = { x, c(x) } let L(x i, D c ) denote the classification assigned to the instance x i by L after training on data D c. Definition: The inductive bias of L is any minimal set of assertions B such that for any target concept c and corresponding training examples D c ( x i X)[(B D c x i ) L(x i, D c )] where A B means A logically entails B

Inductive Systems and Equivalent Deductive Systems 22 Inductive Learning A hypothesis (e.g. a classifier) that is consistent with a sufficiently large number of representative training examples is likely to accurately classify novel instances drawn from the same universe With stronger bias, there is less reliance on the training data Training examples New instance Inductive system Candidate Elimination Algorithm Using Hypothesis Space H Classification of new instance, or "don t know" Training examples New instance Assertion " H contains the target concept" Equivalent deductive system Theorem Prover Classification of new instance, or "don t know" Inductive bias made explicit

23 Three Learners with Different Biases 1. Rote learner: Store examples, Classify x iff it matches previously observed example. No bias 2. Version space candidate elimination algorithm: Stronger bias 3. Find-S: Strongest bias Summary Points 1. Concept learning as search through H 2. General-to-specific ordering over H 3. Version space candidate elimination algorithm 4. S and G boundaries characterize learner s uncertainty 5. Learner can generate useful queries 6. Inductive leaps possible only if learner is biased 7. Inductive learners can be modelled by equivalent deductive systems