Version Spaces.

Similar documents
Concept Learning.

Concept Learning through General-to-Specific Ordering

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Fundamentals of Concept Learning

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning

Concept Learning. Space of Versions of Concepts Learned

Lecture 2: Foundations of Concept Learning

Machine Learning 2D1431. Lecture 3: Concept Learning

Overview. Machine Learning, Chapter 2: Concept Learning

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

EECS 349: Machine Learning

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University.

CSCE 478/878 Lecture 2: Concept Learning and the General-to-Specific Ordering

Concept Learning. Berlin Chen References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s teaching materials.

Decision Trees.

BITS F464: MACHINE LEARNING

Decision Trees.

EECS 349: Machine Learning Bryan Pardo

Machine Learning: Exercise Sheet 2

Computational Learning Theory

Answers Machine Learning Exercises 2

Topics. Concept Learning. Concept Learning Task. Concept Descriptions

Introduction to Machine Learning

Computational Learning Theory

Computational Learning Theory

Data Mining and Machine Learning

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

Computational Learning Theory

Classification (Categorization) CS 391L: Machine Learning: Inductive Classification. Raymond J. Mooney. Sample Category Learning Problem

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Tutorial 6. By:Aashmeet Kalra

Notes on Machine Learning for and

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Computational Learning Theory (COLT)

Lecture 3: Decision Trees

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Computational Learning Theory

Introduction to Machine Learning

Bayesian Learning. Remark on Conditional Probabilities and Priors. Two Roles for Bayesian Methods. [Read Ch. 6] [Suggested exercises: 6.1, 6.2, 6.

Bayesian Learning Features of Bayesian learning methods:

CSCE 478/878 Lecture 6: Bayesian Learning

Foundations of Artificial Intelligence

B555 - Machine Learning - Homework 4. Enrique Areyan April 28, 2015

Foundations of Artificial Intelligence

Learning Decision Trees

Machine Learning. Bayesian Learning.

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 3: Decision Trees

Warm-Up Problem. Is the following true or false? 1/35

Decision Tree Learning

Typical Supervised Learning Problem Setting

Machine Learning

Logical Agents. Knowledge based agents. Knowledge based agents. Knowledge based agents. The Wumpus World. Knowledge Bases 10/20/14

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Course 395: Machine Learning

Linear Classifiers: Expressiveness

Machine Learning 2007: Slides 1. Instructor: Tim van Erven Website: erven/teaching/0708/ml/

Foundations of Artificial Intelligence

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Introduction to Machine Learning

Machine Learning

EECS 349:Machine Learning Bryan Pardo

Computational learning theory. PAC learning. VC dimension.

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

7. Propositional Logic. Wolfram Burgard and Bernhard Nebel

Decision Tree Learning and Inductive Inference

Decision Tree Learning

Applied Logic. Lecture 1 - Propositional logic. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Applied Logic. Lecture 4 part 2 Bayesian inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw

COMP219: Artificial Intelligence. Lecture 19: Logic for KR

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Lecture Notes in Machine Learning Chapter 4: Version space learning

Inference in Propositional Logic

COMP219: Artificial Intelligence. Lecture 19: Logic for KR

Learning Decision Trees

Logic: Bottom-up & Top-down proof procedures

Lecture 9: Bayesian Learning

Bayesian Learning (II)

MODULE -4 BAYEIAN LEARNING

Overview. Knowledge-Based Agents. Introduction. COMP219: Artificial Intelligence. Lecture 19: Logic for KR

Algorithms for Classification: The Basic Methods

Knowledge base (KB) = set of sentences in a formal language Declarative approach to building an agent (or other system):

Propositional Logic: Logical Agents (Part I)

The Bayesian Learning

Introduction to Metalogic

CS 6375: Machine Learning Computational Learning Theory

Models, Data, Learning Problems

COMP219: Artificial Intelligence. Lecture 20: Propositional Reasoning

Knowledge representation DATA INFORMATION KNOWLEDGE WISDOM. Figure Relation ship between data, information knowledge and wisdom.

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Propositional Logic: Bottom-Up Proofs

Chapter 1: The Logic of Compound Statements. January 7, 2008

CS 6375 Machine Learning

7 LOGICAL AGENTS. OHJ-2556 Artificial Intelligence, Spring OHJ-2556 Artificial Intelligence, Spring

Propositional Logic: Syntax

Transcription:

. Machine Learning Version Spaces Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de Acknowledgment Slides were adapted from slides provided by Tom Mitchell, Carnegie-Mellon-University and Peter Geibel, University of Osnabrück Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 1

Overview of Today s Lecture: Version Spaces read T. Mitchell, Machine Learning, chapter 2 Learning from examples General-to-specific ordering over hypotheses Version spaces and candidate elimination algorithm Picking new examples The need for inductive bias Note: simple approach assuming no noise, illustrates key concepts Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 2

Introduction Assume a given domain, e.g. objects, animals, etc. A concept can be seen as a subset of the domain, e.g. birds animals Extension of concept Task: acquire intensional concept description from training examples Generally we can t look at all objects in the domain Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 3

Training Examples for EnjoySport Examples: Days at which my friend Aldo enjoys his favorite water sport Result: classifier for days = description of Aldo s behavior Sky Temp Humid Wind Water Forecst EnjoySpt Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Rainy Cold High Strong Warm Change No Sunny Warm High Strong Cool Change Yes What is the general concept? Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 4

Representing Hypotheses Many possible representations in the following: h is conjunction of constraints on attributes Each constraint can be a specfic value (e.g., Water = Warm) don t care (e.g., Water =? ) no value allowed (e.g., Water= ) For example, Sky AirTemp Humid Wind Water Forecst Sunny?? Strong? Same We write h(x) = 1 for a day x, if x satisfies the description Note that much more expressive languages exists Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 5

Most General/Most Specific Hypothesis Most general hypothesis: (?,?,?,?,?) Most specific hypothesis: (,,,, ) Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 6

Prototypical Concept Learning Task Given: Instances X: Possible days, each described by the attributes Sky, AirTemp, Humidity, Wind, Water, Forecast Target function c: EnjoySport : X {0, 1} Hypotheses H: Conjunctions of literals. E.g.?, Cold, High,?,?,?. Training examples D: Positive and negative examples of the target function x 1, c(x 1 ),... x m, c(x m ) Determine: A hypothesis h in H with h(x) = c(x) for all x in D. Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 7

The Inductive Learning Hypothesis The inductive learning hypothesis: Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. I.e. the training set needs to represent the whole domain (which may be infinite) Even if we have a good training set, we can still construct bad hypotheses! Can this be defined formally? Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 8

Concept Learning as Search The hypothesis representation language defines a potentially large space Learning can be viewed as a task of searching this space Assume, that Sky has three possible values, and each of the remaining attributes has 2 possible values Instance space constains 96 distinct examples Hypothesis space contains 5120 syntactically different hypothesis What about the semantically different ones? Different learning algorithms search this space in different ways! Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 9

General-to-Specific Ordering of Hyothesis Many algorithms rely on ordering of hypothesis Consider and h 1 = (Sunny,?,?, Strong,?,?) h 2 = (Sunny,?,?,?,?,?) h 2 is more general than h 1! How to formalize this? Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 10

General-to-Specific Ordering of Hyothesis Many algorithms rely on ordering of hypothesis Consider and h 1 = (Sunny,?,?, Strong,?,?) h 2 = (Sunny,?,?,?,?,?) h 2 is more general than h 1! How to formalize this? Definition h 2 is more general than h 1, if h 1 (x) = 1 implies h 2 (x) = 1. In symbols h 2 g h 1 Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 10

Instance, Hypotheses, and More-General-Than Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 11

General-to-Specific Ordering of Hyothesis g does not depend on the concept to be learned It defines a partial order over the set of hypotheses strictly-more-general than: > g more-specific-than g Basis for the learning algorithms presented in the following! Find-S: Start with most specific hypothesis (,,,,, ) Generalize if positive example is not covered! Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 12

Find-S Algorithm Initialize h to the most specific hypothesis in H For each positive training instance x For each attribute constraint a i in h If the constraint a i in h is satisfied by x Then do nothing Else replace a i in h by the next more general constraint that is satisfied by x Output hypothesis h Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 13

Hypothesis Space Search by Find-S Instances X Hypotheses H - x 3 h 0 h 1 Specific x + 1 x+ 2 h 2,3 x+ 4 h 4 General x = <Sunny Warm Normal Strong Warm Same>, + 1 x 2 = <Sunny Warm High Strong Warm Same>, + x 3 = <Rainy Cold High Strong Warm Change>, - x = <Sunny Warm High Strong Cool Change>, + 4 h = <,,,,, > 0 h 1 = <Sunny Warm Normal Strong Warm Same> h 2 = <Sunny Warm? Strong Warm Same> h = <Sunny Warm? Strong Warm Same> 3 h = <Sunny Warm? Strong?? > 4 Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 14

The Role of Negative Examples Basically, the negative examples are simply ignored! What about a negative example like (Sunny, W arm, High, Strong, F reezing, Change) Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 15

The Role of Negative Examples Basically, the negative examples are simply ignored! What about a negative example like (Sunny, W arm, High, Strong, F reezing, Change) Example is incorrectly classified as positive by hypothesis (Sunny, Warn,?, Strong,?,?) Correct hypothesis would be (Sunny, Warn,?, Strong, (Warm Cool),?) Is not in hypothesis set H If we assume the existence of a consistent hypothesis in H, then negative examples can be safely ignored. Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 15

Complaints about Find-S Assume a consistent and unknown h that has generated the training set Algorithm can t tell whether it has learned the right concept because it picks one hypothesis out of many possible ones Can t tell when training data inconsistent because it ignores the negative examples: doesn t account for noise Picks a maximally specific h (why?) is this reasonable? Depending on H, there might be several correct hypothesis! Version spaces: Characterize the set of all consistent hypotheses... without enumerating all of them Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 16

Version Spaces Definition A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example x,c(x) in D. Consistent(h, D) ( x, c(x) D) h(x) = c(x) Definition The version space, V S H,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with all training examples in D. V S H,D {h H Consistent(h, D)} Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 17

The List-Then-Eliminate Algorithm: 1. V ersionspace a list containing every hypothesis in H 2. For each training example, x,c(x) : remove from V ersionspace any hypothesis h for which h(x) c(x) 3. Output the list of hypotheses in V ersionspace Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 18

Example Version Space S: { <Sunny, Warm,?, Strong,?,?> } <Sunny,?,?, Strong,?,?> <Sunny, Warm,?,?,?,?> <?, Warm,?, Strong,?,?> G: { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?> } Representing Version Spaces 1. The General boundary, G, of version space V S H,D is the set of its maximally general members that are consistent with the given training set 2. The Specific boundary, S, of version space V S H,D is the set of its maximally specific members that are consistent with the given training set Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 19

3. Every member of the version space lies between these boundaries V S H,D = {h H ( s S)( g G)(g h s)} where x y means x is more general or equal to y Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 20

Candidate Elimination Algorithm Pos. Examples Input: training set Output: G = maximally general hypotheses in H S = maximally specific hypotheses in H Algorithm: For each training example d, do If d is a positive example Remove from G any hypothesis inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that (a) h is consistent with d, and (b) some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 21

Candidate Elimination Algorithm Neg. Examples If d is a negative example Remove from S any hypothesis inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that (a) h is consistent with d, and (b) some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 22

Example Trace S 0 : {<Ø, Ø, Ø, Ø, Ø, Ø>} G 0 : {<?,?,?,?,?,?>} Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 23

Example Trace S 1 : {<Sunny, Warm, Normal, Strong, Warm, Same>} S 2 : {<Sunny, Warm,?, Strong, Warm, Same>} G 1, G 2 : {<?,?,?,?,?,?>} Training examples: 1. <Sunny, Warm, Normal, Strong, Warm, Same>, Enjoy-Sport?=Yes 2. <Sunny, Warm, High, Strong, Warm, Same>, Enjoy-Sport?=Yes Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 24

Example Trace S 2, S 3 : { <Sunny, Warm,?, Strong, Warm, Same> } G 3 : { <Sunny,?,?,?,?,?> <?, Warm,?,?,?,?> <?,?,?,?,?, Same> } G 2 : { <?,?,?,?,?,?> } Training Example: 3. <Rainy, Cold, High, Strong, Warm, Change>, EnjoySport=No Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 25

Example Trace S 3 : { <Sunny, Warm,?, Strong, Warm, Same> } S 4 : { <Sunny, Warm,?, Strong,?,?>} G 4: { <Sunny,?,?,?,?,?> <?, Warm,?,?,?,?>} G 3 : { <Sunny,?,?,?,?,?> <?, Warm,?,?,?,?> <?,?,?,?,?, Same> } Training Example: 4.<Sunny, Warm, High, Strong, Cool, Change>, EnjoySport = Yes Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 26

Properties of the two Sets S can be seen as the summary of the positive examples Any hypothesis more general than S covers all positive examples Other hypotheses fail to cover at least one pos. ex. G can be seen as the summary of the negative examples Any hypothesis more specific than G covers no previous negative example Other hypothesis cover at least one positive example Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 27

Resulting Version Space S: { <Sunny, Warm,?, Strong,?,?> } <Sunny,?,?, Strong,?,?> <Sunny, Warm,?,?,?,?> <?, Warm,?, Strong,?,?> G: { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?> } Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 28

Properties If there is a consistent hypothesis then the algorithm will converge to S = G = {h} when enough examples are provided False examples may cause the removal of the correct h If the examples are inconsistent, S and G become empty This can also happen, when the concept to be learned is not in H Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 29

What Next Training Example? S: { <Sunny, Warm,?, Strong,?,?> } <Sunny,?,?, Strong,?,?> <Sunny, Warm,?,?,?,?> <?, Warm,?, Strong,?,?> G: { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?> } If the algorithm is allowed to select the next example, which is best? Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 30

What Next Training Example? S: { <Sunny, Warm,?, Strong,?,?> } <Sunny,?,?, Strong,?,?> <Sunny, Warm,?,?,?,?> <?, Warm,?, Strong,?,?> G: { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?> } If the algorithm is allowed to select the next example, which is best? ideally, choose an instance that is classified positive by half and negative by the other half of the hypothesis in VS. In either case (positive or negative example), this will eliminate half of the hypothesis. E.g: Sunny W arm N ormal Light W arm Same Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 30

How Should These Be Classified? S: { <Sunny, Warm,?, Strong,?,?> } <Sunny,?,?, Strong,?,?> <Sunny, Warm,?,?,?,?> <?, Warm,?, Strong,?,?> G: { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?> } Sunny Warm Normal Strong Cool Change Rainy Cool N ormal Light W arm Same Sunny W arm N ormal Light W arm Same Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 31

Classification Classify a new example as positive or negative, if all hypotheses in the version space agree in their classification Otherwise: Rejection or Majority vote (used in the following) Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 32

Inductive Bias What if target concept not contained in hypothesis space? Should we include every possible hypothesis? How does this influence the generalisation ability? Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 33

Inductive Leap Induction vs. deduction (=theorem proving) Induction provides us with new knowledge! What Justifies this Inductive Leap? + Sunny W arm N ormal Strong Cool Change + Sunny W arm N ormal Light W arm Same S : Sunny Warm Normal??? Question: Why believe we can classify the unseen Sunny W arm N ormal Strong W arm Same? Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 34

An UNBiased Learner Idea: Choose H that expresses every teachable concept I.e., H corresponds to the power set of X H = 2 X much bigger than before, where H = 937 Consider H = disjunctions, conjunctions, negations over previous H. E.g., Sunny Warm Normal???????? Change It holds h(x) = 1 if x satiesfies the logical expression. What are S, G in this case? Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 35

The Futility of Bias-Free Learning S = {s}, with s = disjunction of positive examples G = {g}, with g = Negated disjunction of negative examples Only training examples will be unambiguously classified A learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances. Inductive bias = underyling assumptions These assumption explain the result of learning The inductive bias explains the inductive leap! Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 36

Inductive Bias Concept learning algorithm L Instances X, target concept c Training examples D c = { x,c(x) } Let L(x i, D c ) denote the classification assigned to the instance x i by L after training on data D c, e.g. EnjoySport = yes Definition The inductive bias of L is any minimal set of assertions B such that for any target concept c and corresponding training examples D c ( x i X) [(B D c x i ) L(x i, D c )] where A B means A logically entails B. Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 37

Inductive Bias for Candidate Elimination Assume instance x i and training set D c The algorithm computes the version space x i is classified by unanimous voting (using the instances in the version space) this way L(x i, D c ) is computed Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 38

Inductive Bias for Candidate Elimination Assume instance x i and training set D c The algorithm computes the version space x i is classified by unanimous voting (using the instances in the version space) this way L(x i, D c ) is computed Now assume that the underlying concept c is in H This means that c is a member of its version space EnjoySport = k implies that all members of VS, including c vote for class k Because unanimous voting is required, k = c(x i ) This is also the output of the algorithm L(x i, D c ) The inductive bias of the Candidate Elimination Algorithm is: c is in H Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 38

Inductive Systems and Equivalent Deductive Systems Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 39

Three Learners with Different Biases Note that the inductive bias is often only implicitly encoded in the learning algorithm In the general case, it s much more difficult to determine the inductive bias Often properties of the learning algorithm have to be included, e.g. it s search strategy What is inductive bias of Rote learner: Store examples, Classify x iff it matches previously observed example. Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 40

Three Learners with Different Biases Note that the inductive bias is often only implicitly encoded in the learning algorithm In the general case, it s much more difficult to determine the inductive bias Often properties of the learning algorithm have to be included, e.g. it s search strategy What is inductive bias of Rote learner: Store examples, Classify x iff it matches previously observed example. No inductive bias ( no generalisation!) Candidate elimination algorithm Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 40

Three Learners with Different Biases Note that the inductive bias is often only implicitly encoded in the learning algorithm In the general case, it s much more difficult to determine the inductive bias Often properties of the learning algorithm have to be included, e.g. it s search strategy What is inductive bias of Rote learner: Store examples, Classify x iff it matches previously observed example. No inductive bias ( no generalisation!) Candidate elimination algorithm c is in H (see above) Find-S c is in H and that all instances are negative examples unless the opposite is entailed by its training data A good generalisation capability of course depends on the appropriate choice of Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 40

the inductive bias! Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 41

Summary Points Concept learning as search through H General-to-specific ordering over H Version space candidate elimination algorithm S and G boundaries characterize learner s uncertainty Learner can generate useful queries Inductive leaps possible only if learner is biased Inductive learners can be modelled by equivalent deductive systems Martin Riedmiller, Albert-Ludwigs-Universität Freiburg, riedmiller@informatik.uni-freiburg.de Machine Learning 42