Lecture 2: Foundations of Concept Learning

Similar documents
Version Spaces.

Concept Learning. Space of Versions of Concepts Learned

Concept Learning.

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Concept Learning through General-to-Specific Ordering

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

Machine Learning 2D1431. Lecture 3: Concept Learning

Fundamentals of Concept Learning

Overview. Machine Learning, Chapter 2: Concept Learning

Concept Learning. Berlin Chen References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s teaching materials.

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University.

EECS 349: Machine Learning

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

CSCE 478/878 Lecture 2: Concept Learning and the General-to-Specific Ordering

BITS F464: MACHINE LEARNING

Introduction to Machine Learning

Computational Learning Theory

Tutorial 6. By:Aashmeet Kalra

Computational Learning Theory

Lecture 9: Bayesian Learning

Data Mining and Machine Learning

Computational Learning Theory

Lecture 3: Decision Trees

Answers Machine Learning Exercises 2

Lecture 3: Decision Trees

Topics. Concept Learning. Concept Learning Task. Concept Descriptions

Computational Learning Theory

EECS 349: Machine Learning Bryan Pardo

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Decision Tree Learning and Inductive Inference

CSCE 478/878 Lecture 6: Bayesian Learning

B555 - Machine Learning - Homework 4. Enrique Areyan April 28, 2015

Introduction to Machine Learning

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Notes on Machine Learning for and

Decision Tree Learning

Bayesian Learning Features of Bayesian learning methods:

Classification (Categorization) CS 391L: Machine Learning: Inductive Classification. Raymond J. Mooney. Sample Category Learning Problem

The Bayesian Learning

Learning Classification Trees. Sargur Srihari

Computational Learning Theory

Introduction to Machine Learning

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Decision Tree Learning

Linear Classifiers: Expressiveness

Algorithms for Classification: The Basic Methods

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Typical Supervised Learning Problem Setting

Course 395: Machine Learning

Machine Learning: Exercise Sheet 2

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Computational learning theory. PAC learning. VC dimension.

CS 6375 Machine Learning

Decision Trees. Tirgul 5

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

MODULE -4 BAYEIAN LEARNING

EECS 349:Machine Learning Bryan Pardo

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Mining Classification Knowledge

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Computational Learning Theory. Definitions

Mining Classification Knowledge

Decision Trees.

Computational Learning Theory

Learning Decision Trees

Computational Learning Theory (COLT)

Bayesian Classification. Bayesian Classification: Why?

Learning Decision Trees

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Machine Learning 2010

Final Exam, Fall 2002

Advanced Topics in LP and FP

Machine Learning Alternatives to Manual Knowledge Acquisition

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

Decision Trees.

CS 6375: Machine Learning Computational Learning Theory

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Tree Learning

Logical Agents. Knowledge based agents. Knowledge based agents. Knowledge based agents. The Wumpus World. Knowledge Bases 10/20/14

Decision Trees Part 1. Rao Vemuri University of California, Davis

Machine Learning

Qualifying Exam in Machine Learning

Knowledge representation DATA INFORMATION KNOWLEDGE WISDOM. Figure Relation ship between data, information knowledge and wisdom.

Machine Learning 2010

3 Propositional Logic

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Symbolic methods in TC: Decision Trees

Lecture Notes in Machine Learning Chapter 4: Version space learning

Propositional Logic: Models and Proofs

Machine Learning

CSCI 5622 Machine Learning

Transcription:

Lecture 2: Foundations of Concept Learning Cognitive Systems II - Machine Learning WS 2005/2006 Part I: Basic Approaches to Concept Learning Version Space, Candidate Elimination, Inductive Bias Lecture 2: Foundations of Concept Learning p. 32

Definition of Concept Learning Learning involves acquiring general concepts from a specific set of training examples D Each concept c can be thought of as a boolean-valued function defined over a larger set i.e. a function defined over all animals, whose value is true for birds and false for other animals Concept learning: Inferring a boolean-valued function from training examples Lecture 2: Foundations of Concept Learning p. 33

A Concept Learning Task - Informal example target concept Enjoy: days on which Aldo enjoys his favorite sport set of example days D, each represented by a set of attributes Example Sky AirT emp Humidity W ind W ater F orecast Enjoy 1 Sunny Warm Normal Strong Warm Same Yes 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change No 4 Sunny Warm High Strong Cool Change Yes the task is to learn to predict the value of Enjoy for an arbitrary day, based on the values of its other attributes Lecture 2: Foundations of Concept Learning p. 34

A Concept Learning Task - Informal Hypothesis representation Each hypothesis h consists of a conjunction of constraints on the instance attributes, that is, in this case a vector of six attributes Possible constraints:? : any value is acceptable single required value for the attribute : no value is acceptable if some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive example (h(x) = 1) most general hypothesis: <?,?,?,?,?,? > most specific hypothesis: <,,,,, > Lecture 2: Foundations of Concept Learning p. 35

A Concept Learning Task - Formal Given: Instances X: Possible days, each described by the attributes Sky (with values Sunny, Cloudy and Rainy) AirT emp (with values W arm and Cold) Humidity (with values N ormal and High) W ind (with values Strong and W eak) W ater (with values W arm and Cool) F orecast (with values Same and Change) Hypotheses H where each h H is described as a conjunction of constraints on the above attributes Target Concept c : Enjoy : X {0, 1} Training examples D: positive and negative examples of the table above Determine: A hypothesis h H such that ( x X)[h(x) = c(x)] Lecture 2: Foundations of Concept Learning p. 36

A Concept Learning Task - Example example hypothesis h e =< Sunny,?,?,?, W arm,? > According to h e Aldo enjoys his favorite sport whenever the sky is sunny and the water is warm (independent of the other weather conditions!) example 1: < Sunny, W arm, Normal, Strong, Warm, Same > This example satisfies h e, because the sky is sunny and the water is warm. Hence, Aldo would enjoy his favorite sport on this day. example 4: < Sunny, W arm, High, Normal, Cool, Change > This example does not satisfy h e, because the water is cool. Hence, Aldo would not enjoy his favorite sport on this day. h e is not consistent with the training examples D Lecture 2: Foundations of Concept Learning p. 37

Concept Learning as Search concept learning as search through the space of hypotheses H (implicitly defined by the hypothesis representation) with the goal of finding the hypothesis that best fits the training examples most practical learning tasks involve very large, even infinite hypothesis spaces many concept learning algorithms organize the search through the hypothesis space by relying on the general-to-specific ordering Lecture 2: Foundations of Concept Learning p. 38

FIND-S exploits general-to-specific ordering finds a maximally specific hypothesis h consistent with the observed training examples D algorithm: 1. Initialize h to the most specific hypothesis in H 2. For each positive training instance x if the constraint a i is satisfied by x then do nothing else replace a i with the next more general constraint satisfied by x 3. Output hypothesis h Lecture 2: Foundations of Concept Learning p. 39

FIND-S - Example Initialize h <,,,,, > example 1: < Sunny, Warm, Normal, Strong, Warm, Same > h < Sunny, W arm, Normal, Strong, W arm, Same > example 2: < Sunny, W arm, High, Strong, W arm, Same > h < Sunny, W arm,?, Strong, W arm, Same > example 3: < Rainy, Cold, High, Strong, W arm, Change > This example can be omitted because it is negative. Notice that the current hypothesis is already consistent with this example, because it correctly classifies it as negative! example 4: < Sunny, W arm, High, Strong, Cool, Change > h < Sunny, W arm,?, Strong,?,? > Lecture 2: Foundations of Concept Learning p. 40

FIND-S - Example Lecture 2: Foundations of Concept Learning p. 41

Remarks on FIND-S in each step, h is consistent with the training examples observed up to this point unanswered questions: Has the learner converged to the correct target concept? No way to determine whether FIND-S found the only consistent hypothesis h or whether there are many other consistent hypotheses as well Why prefer the most specific hypothesis? Are the training examples consistent? FIND-S is only correct if D itself is consistent. That is, D has to be free of classification errors. What if there are several maximally specific consistent hypotheses? Lecture 2: Foundations of Concept Learning p. 42

CANDIDATE-ELIMINATION CANDIDATE-ELIMINATION addresses several limitations of the FIND-S algorithm key idea: description of the set of all hypotheses consistent with D without explicity enumerating them performs poorly with noisy data useful conceptual framework for introducing fundamental issues in machine learning Lecture 2: Foundations of Concept Learning p. 43

Version Spaces to incorporate the key idea mentioned above, a compact representation of all consistent hypotheses is neccessary Version space V S H,D, with respect to hypothesis space H and training data D, is the subset of hypotheses from H consistent with D. V S H,D {h H Consistent(h, D)} V S H,D can be represented by the most general and the most specific consistent hypotheses in form of boundary sets within the partial ordering Lecture 2: Foundations of Concept Learning p. 44

Version Spaces The general boundary set G, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D. G {g H Consistent(g, D) ( g H)[(g > g g) Consistent(g, D)]} The specific boundary set S, with respect to hypothesis space H and training data D, is the set of minimally general (i.e., maximally specific) members of H consistent with D. S {s H Consistent(s, D) ( s H)[(s > g s ) Consistent(s, D)]} Lecture 2: Foundations of Concept Learning p. 45

Version Spaces S 4 : { <Sunny, Warm,?, Strong,?,?>} <Sunny,?,?, Strong,?,?> <Sunny, Warm,?,?,?,?> <?, Warm,?, Strong,?,?> G 4 : { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?>} Lecture 2: Foundations of Concept Learning p. 46

Algorithm Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d D, do If d is a positive example Remove from G any hypothesis inconsistent with d For each hypothesis s in S that is inconsistent with d Remove s from S Add to S all minimal generalizations h of s such that h is consistent with d and some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S If d is a negative example Remove from S any hypothesis inconsistent with d For each hypothesis g in G that is inconsistent with d Remove g from G Add to G all minimal specializations h of g such that h is consistent with d and some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G Lecture 2: Foundations of Concept Learning p. 47

Illustrative Example Initialization of the Boundary sets G 0 {<?,?,?,?,?,? >} S 0 {<,,,,, >} example 1: < Sunny, Warm, Normal, Strong, Warm, Same > S is overly specific, because it wrongly classifies example 1 as false. So S has to be revised by moving it to the least more general hypothesis that covers example 1 and is still more special than another hypthesis in G. S 1 = {< Sunny, W arm, Normal, Strong, W arm, Same >} G 1 = G 0 example 2: < Sunny, W arm, High, Strong, W arm, Same > S 2 = {< Sunny, W arm,?, Strong, W arm, Same >} G 2 = G 1 = G 0 Lecture 2: Foundations of Concept Learning p. 48

Illustrative Example S 0 : { <,,,,, > } S 1 : { <Sunny, Warm, Normal, Strong, Warm, Same> } S 2 : { <Sunny, Warm,?, Strong, Warm, Same> } G 0, G 1, G 2 : { <?,?,?,?,?,?>} Training examples: 1. <Sunny, Warm, Normal, Strong, Warm, Same>, Enjoy Sport = Yes 2. <Sunny, Warm, High, Strong, Warm, Same>, Enjoy Sport = Yes Lecture 2: Foundations of Concept Learning p. 49

Illustrative Example example 3: < Rainy, Cold, High, Strong, W arm, Change > G is overly general, because it wrongly classifies example 3 as true. So G has to be revised by moving it to the least more specific hypotheses that covers example 3 and is still more general than another hypthesis in S. There are several alternative minimally more specific hypotheses. S 3 = S 2 G 3 = {< Sunny,?,?,?,?,? >, <?, W arm,?,?,?,? >, <?,?,?,?,?, Same >} Lecture 2: Foundations of Concept Learning p. 50

Illustrative Example S 2, S 3 : { <Sunny, Warm,?, Strong, Warm, Same> } G 3 : { <Sunny,?,?,?,?,?> <?, Warm,?,?,?,?> <?,?,?,?,?, Same> } G 2 : { <?,?,?,?,?,?> } Training Example: 3. <Rainy, Cold, High, Strong, Warm, Change>, EnjoySport=No Lecture 2: Foundations of Concept Learning p. 51

Illustrative Example example 4: < Sunny, W arm, High, Strong, Cool, Change > S 4 = {< Sunny, W arm,?, Strong,?,? >} G 4 = {< Sunny,?,?,?,?,? >, <?, W arm,?,?,?,? >} Lecture 2: Foundations of Concept Learning p. 52

Illustrative Example S 4 : { <Sunny, Warm,?, Strong,?,?>} <Sunny,?,?, Strong,?,?> <Sunny, Warm,?,?,?,?> <?, Warm,?, Strong,?,?> G 4 : { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?>} Lecture 2: Foundations of Concept Learning p. 53

Remarks Will the algorithm converge to the correct hypothesis? convergence is assured provided there are no errors in D and the H includes the target concept G and S contain only the same hypothesis How can partially learned concepts be used? some unseen examples can be classified unambiguously as if the target concept had been fully learned positive iff it satisfies every member of S negative iff it doesn t satisfy any member of G otherwise an instance x is classified by majority (if possible) Lecture 2: Foundations of Concept Learning p. 54

Inductive Bias fundamental property of inductive learning a learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying unseen examples inductive bias policy by which the learner generalizes beyond the observed training data to infer the classification of new instances Consider a concept learning algorithm L for the set of instances X. Let c be an arbitrary concept defined over X, and D c = {< x, c(x) >} an arbitrary set of training examples of c. Let L(x i, D c ) denote the classification assigned to the instance x i by L after training on the data D c. The inductive bias of L is any minimal set of assertions B such that ( x i X)[(B D c x i ) L(x i, D c )] Lecture 2: Foundations of Concept Learning p. 55

Kinds of Inductive Bias Restriction Bias (aka Language Bias) entire H is searched by learning algorithm hypothesis representation not expressive enough to encompass all possible concepts e.g. CANDIDATE-ELIMINATION: for the hypothesis language used in the enjoy -example H only includes conjunctive concepts Preference Bias (aka Search Bias) hypothesis representation encompasses all possible concepts learning algorithm does not consider each possible hypothesis e.g. use of heuristics, greedy strategies Preference Bias more desirable, because it assures ( h H)[( x X)[h(x) = c(x)]] Lecture 2: Foundations of Concept Learning p. 56

Un Unbiased Learner an unbiased H = 2 X would contain every teachable function for such a H, G would always contain the negation of the disjunction of observed negative examples S would always contain the disjunction of the observed positive examples hence, only observed examples will be classified correctly in order to converge to a single target concept, every x X has to be in D the learning algorithm is unable to generalize beyond observed training data Lecture 2: Foundations of Concept Learning p. 57

Inductive System vs. Theorem Prover Inductive system Training examples New instance Candidate Elimination Algorithm Using Hypothesis Space H Classification of new instance, or "don t know" Training examples New instance Assertion "H contains" the target concept" Equivalent deductive system Theorem Prover Classification of new instance, or "don t know" Inductive bias made explicit Lecture 2: Foundations of Concept Learning p. 58