Lecture Notes in Machine Learning Chapter 4: Version space learning

Similar documents
Data Mining and Machine Learning

Machine Learning: Exercise Sheet 2

Course 395: Machine Learning

BITS F464: MACHINE LEARNING

Notes on Machine Learning for and

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Introduction to Machine Learning

Tutorial 6. By:Aashmeet Kalra

Version Spaces.

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Concept Learning.

CS 6375 Machine Learning

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Lecture 2: Foundations of Concept Learning

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

CS 380: ARTIFICIAL INTELLIGENCE

Introduction to Machine Learning

Machine Learning Alternatives to Manual Knowledge Acquisition

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

Overview. Machine Learning, Chapter 2: Concept Learning

Computational Learning Theory

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Computational Learning Theory

Concept Learning through General-to-Specific Ordering

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

Question of the Day? Machine Learning 2D1431. Training Examples for Concept Enjoy Sport. Outline. Lecture 3: Concept Learning

Machine Learning 2010

An Integrated Discovery System. Bernd Nordhausen Pat Langley

EECS 349: Machine Learning

Intelligent Systems I

Machine Learning 2010

Decision Tree Learning

Discriminative Learning can Succeed where Generative Learning Fails

Computational Learning Theory (VC Dimension)

Parts 3-6 are EXAMPLES for cse634

Introduction to Artificial Intelligence. Learning from Oberservations

CS 4700: Artificial Intelligence

COMPUTATIONAL LEARNING THEORY

Generalisation and inductive reasoning. Computational Cognitive Science 2014 Dan Navarro

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Bayesian Classification. Bayesian Classification: Why?

Lecture 3: Decision Trees

Selected Algorithms of Machine Learning from Examples

Logical Agents. September 14, 2004

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University.

Induction of Decision Trees

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Machine Learning 2D1431. Lecture 3: Concept Learning

Decision Trees.

Incremental Learning and Concept Drift: Overview

Computational Learning Theory

Lecture Note on Linear Algebra 1. Systems of Linear Equations

Data Warehousing & Data Mining

Homework #1 RELEASE DATE: 09/26/2013 DUE DATE: 10/14/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM.

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex

Final Exam, Fall 2002

Lecture 3: Decision Trees

Concept Learning. Space of Versions of Concepts Learned

Sections 8.1 & 8.2 Systems of Linear Equations in Two Variables

Classification: Decision Trees

Classification (Categorization) CS 391L: Machine Learning: Inductive Classification. Raymond J. Mooney. Sample Category Learning Problem

Lattice Machine: Version Space in Hyperrelations

CS599 Lecture 1 Introduction To RL

Concept Learning. Berlin Chen References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s teaching materials.

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Principles of AI Planning

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Lecture Notes in Machine Learning Chapter 3: Languages for learning

Computational Learning Theory (COLT)

Decision Trees.

Statistical aspects of prediction models with high-dimensional data

EECS 349:Machine Learning Bryan Pardo

Click Prediction and Preference Ranking of RSS Feeds

Symbolic methods in TC: Decision Trees

Learning from Observations. Chapter 18, Sections 1 3 1

Machine Learning (CS 567) Lecture 2

MODULE -4 BAYEIAN LEARNING

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.

Outline / Reading. Greedy Method as a fundamental algorithm design technique

CS 446: Machine Learning Lecture 4, Part 2: On-Line Learning

Computational learning theory. PAC learning. VC dimension.

A Study on Monotone Self-Dual Boolean Functions

Principles of AI Planning

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Deterministic Finite Automata

Machine Learning in Modern Well Testing

Rule-Based Classifiers

Logistic Regression and Boosting for Labeled Bags of Instances

2D1431 Machine Learning. Bagging & Boosting

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Selection of Classifiers based on Multiple Classifier Behaviour

Transcription:

Lecture Notes in Machine Learning Chapter 4: Version space learning Zdravko Markov February 17, 2004 Let us consider an example. We shall use an attribute-value language for both the examples and the hypotheses L = {[A, B], A T 1, B T 2 }. T 1 and T 2 are taxonomic trees of attribute values. Let s consider the taxonomies of colors (T 1 ) and planar geometric shapes (T 2 ), defined by the relation son. Taxonomy of Colors: son(primary_color,any_color). son(composite_color,any_color). son(red,primary_color). son(blue,primary_color). son(green,primary_color). son(orange,composite_color). son(pink,composite_color). son(yellow,composite_color). son(grey,composite_color). Taxonomy of Shapes: son(polygon,any_shape). son(oval,any_shape). son(triangle,polygon). son(quadrangle,polygon). son(rectangle,quadrangle). son(square,quadrangle). son(trapezoid,quadrangle). son(circle,oval). son(ellipse,oval). Using the hierarchically ordered attribute values in taxonomies we can define the derivability relation ( ) by the cover relation ( ), as follows: [A 1, B 1 ] [A 2, B 2 ], if A 2 is a successor of A 1 in T 1, and B 2 is a successor of B 1 in T 2. For example, [red, polygon] [red, triangle], and [any color, any shape] covers all possible examples expressed in the language L. Let P L, Q L, and P Q. Then P is a generalization of Q, and Q is a specialization of P. Let E + = {E 1 +, E+ 2 }, E+ 1 = [red, square], E+ 2 = [blue, rectangle], E = [orange, triangle], and B =. Then the problem is to find such a hypothesis H, that H E 1 +, H E+ 2, i.e. H is a generalization of E +. Clearly there are a number of such generalizations, i.e. we have a hypothesis space S H = {[primary color, quadrangle], [primary color, polygon],..., [any color, any shape]}. However not all hypotheses from S H satisfy the consistency requirement, (H E ), i.e. some of them are overgeneralized. So, the elements H S, such that H E, have to be excluded, i.e they have to be specialized, so that not to cover any more the negative example. Thus we obtain a set of correct (consistent with the examples) hypotheses, which is called version space, V S = {[primary color, quadrangle], [any color, quadrangle]}. Now we can add the obtained hypotheses to the background knowledge and further process other positive and negative examples. Learning systems which process a sequence of examples one at a time and at each step maintain a consistent hypotheses are called incremental learning systems. Clearly the basic task of these systems is to search through the version space. As 1

we have shown above this search can be directed in two ways specific to general and general to specific. 1 Search strategies in version space To solve the induction problem the version space have to be searched through in order to find the best hypothesis. The simplest algorithm for this search could be the generate-and-test algorithm, where the generator produces all generalizations of the positive examples and the tester filters out those of them which cover the negative examples. Since the version space could be very large such an algorithm is obviously unsuitable. Hence the version space has to be structured and some directed search strategies have to be applied. 1.1 Specific to general search This search strategy maintains a set S (a part of the version space) of maximally specific generalizations. The aim here is to avoid overgeneralization. A hypothesis H is maximally specific if it covers all positive examples, none of the negative examples, and for any other hypothesis H that covers the positive examples, H H. The algorithm is the following: Begin Initialize S to the first positive example Initialize N to all negative examples seen so far For each positive example E + do begin Replace every H S, such that H E +, with all its generalizations that cover E + Delete from S all hypotheses that cover other hypotheses in S Delete from S all hypotheses that cover any element from N For every negative example E do begin Delete all members of S that cover E Add E to N 1.2 General to specific search This strategy maintains a set G (a part of the version space) of maximally general hypotheses. A hypothesis H is maximally general if it covers none of the negative examples, and for any other hypothesis H that covers no negative examples, H H. The algorithm is the following: Begin Initialize G to the most general concept in the version space Initialize P to all positive examples seen so far 2

For each negative example E do begin Replace every H G, such that H E, with all its specializations that do not cover E Delete from G all hypotheses more specific (covered by) other hypotheses in G Delete from G all hypotheses that fail to cover some example from P For every positive example E + do begin Delete all members of G that fail to cover E + Add E + to P 2 Candidate Elimination Algorithm The algorithms shown above generate a number of plausible hypotheses. Actually the sets S and G can be seen as boundary sets defining all hypotheses in the version space. This is expressed by the boundary set theorem [1], which says that for every element H from the version space there exist H S and H G, such that H H and H H. In other words the boundary sets S and G allows us to generate every possible hypothesis by generalization and specialization of their elements, i.e. every element in the version space can be found along the generalization/specialization links between elements of G and S. This suggests an algorithm combining the two search strategies of the version space, called candidate elimination algorithm [2]. The candidate elimination algorithm uses bi-directional search of the version space. It can be easily obtained by putting together the algorithms from section 2.1 and 2.2 and replacing the following items from them: 1. Replace Delete from S all hypotheses that cover any element from N with Delete from S any hypothesis not more specific than some hypothesis in G 2. Replace Delete from G all hypotheses that fail to cover some example from P with Delete from G any hypothesis more specific than some hypothesis in S These alterations are possible since each one of them implies what it alters. Thus collecting all positive and negative examples in the sets P and N becomes unnecessary. Clearly this makes the bi-directional algorithm more efficient. Furthermore using the boundary sets two stopping conditions can be imposed: 1. If G = S and both are singletons, then stop. The algorithm has found a single hypothesis consistent with the examples. 2. If G or S becomes empty then stop. Indicate that there is no hypothesis that covers all positive and none of the negative examples. 3 Experiment Generation, Interactive Learning The standard definition of the inductive learning problem assumes that the training examples age given by an independent agent and the learner has no control over them. In many cases, however, it is possible to select an example and then to acquire information about its classification. Learning systems exploring this strategy are called interactive learning systems. Such a system use an agent (called oracle) which provides the classification of any example 3

Hypothesis space G Example space S? + + +? Figure 1: Graphical representation of version space. Question marsk denote the areas from where new examples are chosen the systems asks for. Clearly the basic problem here is to ask in such a way that the number of further questions is minimal. A common strategy in such situations is to select an example which halves the number of hypotheses, i.e. one that satisfies one halve of the hypotheses and does not satisfy the other halve. Within the framework of the version space algorithm the halving strategy would be to find an example that does not belong to the current version space (otherwise its classification is known - it has to be positive) and to check it against all other hypotheses outside the version space. Clearly this could be very costly. Therefore a simple strategy is the following (this is actually and interactive version of the candidate elimination algorithm): 1. Ask for the first positive example 2. Calculate S and G using the candidate elimination algorithm 3. Find E, such that G E, s S, E s (E is not in the version space). 4. Ask about the classification of E 5. Go to 2 The exit from this loop is through the stopping conditions of the candidate elimination algorithm (item 2). A graphical illustration of the experiment generation algorithm is shown in Figure 1 4

References [1] M. Genesereth and N. Nilsson. Logical Foundations of Artificial Intelligence. Morgan Kaufmann, 1987. [2] T. M. Mitchell. Generalization as search. Artificial Intelligence, 18:203 226, 1982. 5