Machine Learning 2010

Size: px
Start display at page:

Download "Machine Learning 2010"

Transcription

1 Machine Learning 2010 Concept Learning: The Logical Approach Michael M Richter mrichter@ucalgary.ca 1 -

2 Part 1 Basic Concepts and Representation Languages 2 -

3 Why Concept Learning? Concepts describe properties of a given set of objects. Examples: The set of prime numbers: This set is precisely described, according to the definition there is no doubt. The property of having a certain type of lung infection. Again, we have certain indicators but a more precise definition can improve the therapy. Special concepts are definitions of functions. Because of the lack of definitions we have to rely on observations and to learn from them. 3 -

4 Learning a Function Examples for the values of a function f: f(0) = -1, f(1) = -1, f(2) = -1, f(3) = 17, f(4) = 399 A possible hypothesis that is correct for the shown examples is f(x) = x x+1 - (x+1) x Occam`s principle says: Use the simplest description describing the seen examples correctly. Depending on the criteria for simplicity, this is at least a good (and correct) description. 4 -

5 Concept Learning (1) Basics Different Representation Languages Learning by Searching complete enumeration Searching by generalization search by most special generalization search by most general generalization Version space method 5 -

6 Concept Learning (2) There are different ways to represent concept: Logical formulas (this chapter) Trees (decision tree chapter) Numerical methods (support vector machine chapter) They all have different advantages and disadvantages which we will discuss. 6 -

7 Classification task: given: Basic set M, Set of class indices I Classification Goal: Assign to each element form M a class index from I Example: Basic set = Set of bank customers class indices I = { credit worthy, not credit worthy } Goal: Classify the bank customers 7 -

8 Classifier Definition: A classifier for some set M is a mapping f : M I, where I is a set, called index set. If I = {0, 1}, then we denote P = { x f(x) = 1} the set of positive elements and N = { x f(x) = 0} the set of negative elements 8 -

9 Classifier Descriptions Distinguish: Classifiers and Classifier descriptions Different possibilities for descriptions: Enumeration of all elements (possible only for finite sets) Presenting a formula of predicate logic Presenting a C-Program... Observe: A classifier description uniquely determines a classifier A classifier can have different classifier descriptions 9 -

10 Different Representation Languages How are examples and hypotheses (possible concepts) represented? Examples and hypotheses use often the same representation language This has an influence on Learning algorithms and complexity Definition of the more general relation. 10 -

11 Conjunctive Concepts (1) Representation by attribute-value pairs: Attributes A 1,...,A n Associated domains T 1,...,T n Examples are n-ary vectors (w 1, w 2,..., w n ) with w i T i Equivalent logical representation: (A 1 =w 1 ) (A 2 =w 2 ) (A n =w n ) A concept now is an n-ary vector (w 1, w 2,..., w n ) with w i T i * (meaning of *: don t care or don t know) The example (w 1, w 2,..., w n ) belongs to the concept (w 1, w 2,..., w n ) iff " i=1,...,n (w i = w i ) (w i = *) 11 -

12 Conjunctive Concepts (2) Checking the more general relation: (w 1,...,w n ) (w 1,...,w n ) " i=1,...,n (w i = w i ) (w i = *) The -relation is a partial ordering Example: Attributes : A 1 : Size, A 2 : Shape positive example: (small, circle) negative example: (large, triangle) (small, circle) (small, *) (*,circle) 12 -

13 Generality of Concepts Definition: A concept K 1 is more general than a concept K 2 (notation: K 2 K 1 ) iff for all e M we have: if K 2 (e) then K 1 (e) is also true. The inverse relation is more special than. K 2 K 1 We have: K 2 K 1 K 3 K 1 K

14 Most Special Generalizations Definition: A concept K is a most special generalization of a given set E of examples iff K is a complete and consistent concept description for E and for all other complete and consistent concept generalizations K for E we have : if K K then also K K Example: not complete and consistent complete and consistent Most special generalizations 14 -

15 Most General Specializations Definition: A concept K is a most generalal specialization of a given set E of examples iff K is a complete and consistent concept description for E and for all other complete and consistent concept specialization K for E we have : if K K then also K K Example: not complete and consistent complete and consistent Most general specializations 15 -

16 Representation by Rules Examples: variable free rules Concepts: arbitrary rules Definition: For rules F and Y we have F Y (F is more general than Y) iff there is a substitution s such that Y = s(f) holds Example: Has_feaver(Bill) Sniffles(Bill) Coughs(Bill) Has_cold(Bill) Has_feaver(X) Sniffles(X) Coughs(X) Has_cold(X) Has_feaver(X) Sniffles(X) Has_cold(X) 16 -

17 Definitions: Properties of Concepts A concept K is complete for some set E of examples iff for all e E holds: if e positive then K(e) is true. A concept K is consistent for some set E of examples iff for all e E holds: if e negative then K(e) is true. Goal: Find complete and consistent concepts complete, not complete, complete., not consistent consistent consistent 17 -

18 Part 2 Learning Methods and Algorithms 18 -

19 Learning Concepts Determine the concept on the basis of some given classified examples (experiences). Experiences positive examples negative examples Learning function concept K(X) The generated concepts are hypotheses. Learning tells us only that something is a plausible explanation but never that it is always true. 19 -

20 Incremental vs. Non-Incremental Learning Non-incremental concept learning: Given: set P of positive examples set N of negative examples Wanted: concept Incremental concept learning: Given: actual concept (hypothesis) new example (positive or negative) Wanted: updated concept 20 -

21 Idea of the Version Space l more general concepts inconsistent concepts G negative examples positive examples more special concepts S incomplete concepts H: version space (all possible solutions) Shrinking of the version performs a sear ch 21 -

22 Classification with the Version Space (1) Application of the version space for classifying new elements a of the underlying set M Definitions: An element a M is classified as a positive element if and only iff "K H we have: K(a). Equivalent definition but simpler to verify: An element a M is classified as a positive element if and only if "K S we have: K(a). G: S:

23 Classification with the Version Space (2) Definition: An element a M is classified as a negative element if and only if "K H we have: K(a). Equivalent definition (simpler to verify): An element a M is classified as a negative element if and only if "K G we have: K(a). G: - S: 23 -

24 Classification with the Version Space (3) Definiti: An element a is not classified if and only if $ K 1 H with K 1 (a) and $ K 2 H with K 2 (a) Equivalent definition (simpler to verify): An element a is not classified if and only if $ K 1 G with K 1 (a) and $ K 2 S with K 2 (a) G: + S:

25 VS - Algorithm for Conjunctive Concepts (1) Given: Set of examples a 1,...,a n, with a 1 positive Initialize (when a 1 is presented): S:= a 1, G:={(*,...,*)} For each a i (i = 2... n) DO: IF a i positive THEN FOR each K G DO IF K does not include a i THEN G := G \ {K} S := most special generalization of S which includes a i IF G = { } OR ($K G with K S) THEN STOP Failure IF G = {S} THEN STOP Success with S ; 25 -

26 VS - Algorithm for Conjunctive Concepts (2) IF a i negative IF S includes a i THEN STOP Failure G := {} FOR EACH K G DO G = G G := G { K K is most general specialization of K which excludes a i and S K } IF G = { } THEN STOP Failure IF G = {S} THEN STOP Success with S Success means the search has terminated If no more examples exist: STOP Version space: S, G 26 -

27 Properties of the Algorithm For a finite number of examples the algorithm always terminates. Correctness: 1. If the correct concept K is in the space of hypotheses (I.e. has a conjunctive description) then the algorithm terminates after sufficiently many examples are presented with Success and K = S OR Terminates after all examples are processed. The intended concept K is an element of the remaining version space. 2. If the algorithm terminates with Failure then the concept K is not an element of space of hypotheses. 3. If the concept K is not an element of space of hypotheses nothing can be asserted. If the algorithm stops because it is running out of examples the version space generated so far can still be used to classify many objects

28 Example 1 Attributes: Size (small, large) ; Shape (circle, triangle, square) (*,*) (*,ci) (*,sq) (*,tr) (sm,*) (la,*) (sm,ci) (la,ci) (sm,sq) (la,sq) (sm,tr) (la,tr) a 1 = (sm,ci) positive S = (sm,ci) G = {(*,*)} a 2 = (la,tr) negative S = (sm,ci) G = { (*,ci), (sm,*) } a 3 = (la,ci) positive S = (*,ci) G = { (*,ci) } Success! 28 -

29 Example 2 (*,*) (*,ci) (*,sq) (*,tr) (sm,*) (sm,*) (sm,ci) (la,ci) (sm,sq) (la,sq) (sm,tr) (la,tr) We try to learn the concept small circle or large triangle : a 1 = (sm,ci) positive S = (sm,ci) G = {(*,*)} a 2 = (la,ci) negative S = (sm,ci) G = { (sm,*) } a 3 = (la,tr) positive S = (*,*) G = {} Failure! 29 -

30 Example 3 (*,*) (*,ci) (*,sq) (*,tr) (sm,*) (la,*) (sm,ci) (la,ci) (sm,sq) (la,sq) (sm,tr) (la,tr) We try again to learn the concept small circle or large triangle but with a different ordering of the examples: a 1 = (sm,ci) positive S = (sm,ci) G = {(*,*)} a 2 = (la,tr) positive S = (*,*) G = {(*,*)} Success! (but The algorithm generalizes too much because the concept is not expressed as a conjunction! The algorithm presents an incorrect result. 30 -

31 Remark The version space algorithm finds only concepts that are in the version space (the hypothesis space). In many applications it cannot be guaranteed a priori that the wanted concept is in the hypothesis space. What to do? Intuitively: Find the best concept that is available in the hypothesis space. But how to define best and how to find it? This leads to approximative learning and is discussed in the section on PAC-learning. 31 -

32 Example Task: Generate a classifier for assigning bank customers one of the classes {credit worthy, not credit worthy} Traditional approach: Knowledge acquisition from banking experts Programming a special classificator Using a learning system : Make use of experiences from the past: Take the set of previous customers and their classifications {credit worthy, not credit worthy} Automatic learning of a classificator (concept learning of a concept from positive and negative examples). 32 -

33 Algorithm: Complete Enumeration Input: P: set of positive examples N: set of negative examples Output: H: set of all complete, consistent concepts H := {} FOR each concept K DO IF K contains all examples from P AND K contains no example from N THEN H := H {K} RETURN H 33 -

34 Properties of the Algorithm Assumptions: Representation language is finite Membership of examples to a concept is decidable Disadvantages: Very inefficient No differentiation between the learned concepts Advantages: Very easy to realize 34 -

35 Improving the Search Idea: Make use of the more general relation Control of the search: from special to general concepts from general to special concepts combined search 35 -

36 Input: Algorithm P = {p 1,...,p m }: set of positive examples; N: set of negative examples (w 1,...,w n ) := p 1 ; FOR i=2 to m DO Sei p i = (w 1,...,w n ) FOR j=1 to n DO IF w j w j THEN w j := * IF (w 1,...,w n ) contains an example from N THEN RETURN {} ELSE RETURN { (w 1,...,w n ) } 36 -

37 Properties: Properties of the Algorithm Algorithm terminates and returns the most special generalization of the examples if it exists, otherwise the empty set. Assumption: finite domains Advantages: very efficient Disadvantage: Algorithms works in this simple form only for conjunctive descriptions 37 -

38 Learning by Breadth First Search Search Direction: General -> Special Learning goal: Learning of some concept Finding the most general generalization Experiences: positive and negative examples attribute-value representation Hypotheses space: conjunctive concept descriptions Example presentation: non-incremental Search strategy: Breadth first Search direction: general -> special 38 -

39 Algorithm Input: P: set of positive examples; N: set of negative examples C := {}; H:= { (*,...,*) }; WHILE TRUE FOR ALL h H DO IF NOT h contains all examples from P THEN H := H \ {h} IF h contains no example from N THEN H := H \ {h}; C := C {h} IF H = {} THEN RETURN C H := {} (* Generate specializations *) FOR ALL (w 1,...,w n ) H DO FOR i {1,..., n} with w i = * DO H := H FOR ALL w i T i DO IF NOT [ $ h C with h (w 1,...,w i-1,w i,w i+1,...,w n ) ] THEN H := H {(w 1,...,w i-1,w i,w i+1,...,w n )} 39 -

40 Properties of the Algorithm Properties: Algorithm terminates and delivers the set of all most general generalizationen of the examples Assumption: Finite domains Disadvantages: inefficient therefore impractible for large numbers of attributes and large domains But observe: The algorithm can be extended to other representations. 40 -

41 The General Version Space Method VS Concept learning from positive and negative examples The examples are represented incrementally bidirectional breadth-first search, i.e. combining the two directions general -> special special -> general 41 -

42 Generalization Operator Given: concept K = (w 1,...,w n ) positive examples B = (w 1,...,w n) K K B Algorithm: FOR j=1 to n DO IF w j w j THEN w j := * RETURN (w 1,...,w n ) 42 -

43 Specialization Operator Given: concept K = (w 1,...,w n ) negative examples B = (w 1,...,w n) K B K 1 K 2 K m Algorithm: IF concept K does not contain example B THEN RETURN { K } S = {} FOR j=1 to n DO IF w j = * THEN FOR EACH w T j / { w j } DO S = S {(w 1,...,w j-1,w,w j+1,...,w n ) RETURN S 43 -

44 General VS - Algorithm (1) Given: Set of examples a 1,...,a n, with a 1 positive Initialize (if example a 1 is presented): S:= { K } where K is the most special concept description containing a 1 G:= { A } where A is the most general concept description Processing the examples a i (i > 1): IF a i is positive THEN 1. Remove from G all concepts that do not contain a i. 2. Replace the concepts s S by the most special generalizations of s that contain a i and exclude all earlier negative examples; remove them if impossible or if they are more general than a concept from G. 3. Remove each concept s S that is more general than another concept from S. 44 -

45 IF a i is negative THEN General VS-Algorithm (2) 1. Remove from S all concept that contain a i. 2. Replace the concepts g G by the most general specializations of g that exclude a i and contain all earlier positive examples; if this is impossible or if they are more special some concept from S remove them. 3. Remove each concept g G that is more special than some other concept from G. Termination criteria: IF S = G and S = 1 THEN STOP learning success with S IF S= {} oder H = {} THEN STOP failure IF all examples are processed THEN STOP Version space: S, G 45 -

46 Properties of the Algorithm Termination: The algorithm terminates if the set of examples is finite. Correctness: 1. If the wanted concept K is inthe hypothesis space then The algorithm stops after sufficiently many examples with success ; We get K = S. OR The algorithm stops after processing all examples. The wanted concept K is then in the version space. 2. If the algorithm stops with failure then the wanted concept is not in the hypothesis space. 3. If the wanted concept is not in the hypothesis space then no assertion can be made. 46 -

47 Advantages of the Version Space Method Incremental Correctness of learning result is guaranteed if the concept is in the hypothesis space The algorithm recognizes of no more examples are needed The algorithm can learn in principle more powerful representations 47 -

48 Disadvantages of the Version Space Method High complexity for more powerful (e.g. disjunctive) description languages The cardinality of the sets S and G can grow exponentially with the number of examples. Convergence of S and G is lost for disjunctive description languages : S = Disjunction of all positive examples G = Conjunction of all negative examples Convergence only after all examples are seen! 48 -

49 Quality Criteria for Learned Concepts Learning of concepts from examples is always an inductive conclusion The correctness of the learned concept for the whole set can not be assured Evaluation functions (quality of the learned concept): Classification quality: Percentage of the correctly classified elements of the set Costs for misclassification: Assumption: each misclassification causes costs The total sum (or expected sum) of costs has to be considered This runs under cost sensitive classification. 49 -

50 Discussion: Correctness (1) Correctness seems to be a natural quality condition for concept learning: We want that a learned concept classifies elements correctly. As a consequence, one concept is more correctly than another one if it classifies more elements correctly. The version space algorithm takes this up by considering only concepts that classify elements seen so far correctly. Despite the fact that this seems plausible so far the question arises: Does the fact that a concept classifies the seen examples correctly have an impact on the correctness of all elements? 50 -

51 Discussion: Correctness (2) We introduce a correctness measure for classifying concepts: corr(c) = {a M I C classifies a correctly}. This a partially ordered measure that refers to all elements, not only to the seen ones. If one regards this as a kind of landscape the experience in the area of hill climbing indicates that going upwards only is often not the best way. In fact, theoretical investigations show that this is provably not the case (see [Lange, Wiehagen]) 51 -

52 Classification and Diagnosis (1) Machine learning technology is well suited for the induction of diagnostic and prognostic rules and solving of small and specialized diagnostic and prognostic problems. What has to be done is to type the data, i.e. the records of the patients with known correct diagnosis, into the computer in the appropriate form and run the learning algorithm. This is of course an oversimplification, but in principle, the medical diagnostic knowledge can be automatically derived from the description of cases solved in the past. The derived classifier can then be used either to assist the physician when diagnosing new patients in order to improve the diagnostic speed, accuracy and/or reliability, or to train the students or physicians non-specialists to diagnose the patients in some special diagnostic problem. 52 -

53 Classification and Diagnosis (2) In the area of diagnostics several aspects occur because there may be several reasons for a fault or a disease. In order to make the connection between the examples and the learned concepts explicit one often uses rules, e.g.: Has_feaver( Bill) Sniffles(Bill) Coughs(Bill) Has_a_Cold(Bill) Has_feaver(Mary) Sniffles(Mary) Coughs(Mary) Has_Influenza(Mary) Learned rules: Has_feaver(Person) Sniffles(Person) Coughs(Person) Has_a_Cold(Person) Has_feaver(Person) Sniffles(Person) Coughs(Person) Has_Influenza(Person) 53 -

54 Comparison: Machine Learning-Human Four physicians specialists in each domain were tested. A subset of patients was randomly selected. The performances of physicians in the table are the averages of four physicians specialists in each domain and compared with two learning algorithms (naive Bayes, Assistant) The physicians were tested in University Medical Center in Ljubljana. Classifier Primary Tumor Breast Cancer Thyroid Rheumatology Naive Bayes 49% 78% 70% 67% Assistant 44% 77% 73% 61% Physicians 42% 64% 64% 66% Both algorithms significantly outperform the diagnostic performance of the physicians in terms of the classification accuracy and the average information score of the classifier. Source:Tatjana Zrimec and Igor Kononenko University of Ljubljana 54 -

55 Summary Different representations Completeness and Correctness The more-general relation, anti-unification Version space and version space algorithm Classification and Diagnosis 55 -

56 Recommended Literature T. Mitchell: Machine Learning. McGraw Hill, 1997 Tzung-Pei Hong, Shian-Shyong Tseng: Generalized Version Space Learning Algorithm for Noisy and Uncertain Data. CS Digital Library Papers with typical applications: Hee-Woong Lim, Ji-Eun Yun, Hae-Man Jang, Young-Gyu ChaiSuk-In Yoo, and Byoung-Tak Zhang: Version Space Learning with DNA Molecules. Lecture Notes In Computer Science; Vol Tessa Lau, Pedro Domingos, Daniel S. Weld: Learning Programs from Traces using Version Space Algebra. ICML 2000, Stanford, CA, June 2000, pp

Introduction to Machine Learning

Introduction to Machine Learning Outline Contents Introduction to Machine Learning Concept Learning Varun Chandola February 2, 2018 1 Concept Learning 1 1.1 Example Finding Malignant Tumors............. 2 1.2 Notation..............................

More information

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses 1 CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC ORDERING [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specific ordering over hypotheses Version spaces and

More information

Machine Learning 2010

Machine Learning 2010 Machine Learning 2010 Decision Trees Email: mrichter@ucalgary.ca -- 1 - Part 1 General -- 2 - Representation with Decision Trees (1) Examples are attribute-value vectors Representation of concepts by labeled

More information

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997 Outline Training Examples for EnjoySport Learning from examples General-to-specific ordering over hypotheses [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Version spaces and candidate elimination

More information

Lecture 2: Foundations of Concept Learning

Lecture 2: Foundations of Concept Learning Lecture 2: Foundations of Concept Learning Cognitive Systems II - Machine Learning WS 2005/2006 Part I: Basic Approaches to Concept Learning Version Space, Candidate Elimination, Inductive Bias Lecture

More information

Machine Learning 2010

Machine Learning 2010 Machine Learning 2010 Michael M Richter Support Vector Machines Email: mrichter@ucalgary.ca 1 - Topic This chapter deals with concept learning the numerical way. That means all concepts, problems and decisions

More information

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others)

Text Categorization CSE 454. (Based on slides by Dan Weld, Tom Mitchell, and others) Text Categorization CSE 454 (Based on slides by Dan Weld, Tom Mitchell, and others) 1 Given: Categorization A description of an instance, x X, where X is the instance language or instance space. A fixed

More information

Concept Learning through General-to-Specific Ordering

Concept Learning through General-to-Specific Ordering 0. Concept Learning through General-to-Specific Ordering Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 2 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell

More information

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system

Introduction to machine learning. Concept learning. Design of a learning system. Designing a learning system Introduction to machine learning Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2). Introduction to machine learning When appropriate

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Machine Learning (CS 567) Lecture 3

Machine Learning (CS 567) Lecture 3 Machine Learning (CS 567) Lecture 3 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

Lecture Notes in Machine Learning Chapter 4: Version space learning

Lecture Notes in Machine Learning Chapter 4: Version space learning Lecture Notes in Machine Learning Chapter 4: Version space learning Zdravko Markov February 17, 2004 Let us consider an example. We shall use an attribute-value language for both the examples and the hypotheses

More information

Concept Learning.

Concept Learning. . Machine Learning Concept Learning Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg Martin.Riedmiller@uos.de

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Chapter 6 Classification and Prediction (2)

Chapter 6 Classification and Prediction (2) Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature

More information

Inductive Learning. Inductive hypothesis h Hypothesis space H size H. Example set X. h: hypothesis that. size m Training set D

Inductive Learning. Inductive hypothesis h Hypothesis space H size H. Example set X. h: hypothesis that. size m Training set D Inductive Learning size m Training set D Inductive hypothesis h - - + - + + - - - + + - + - + + - - + + - p(x): probability that example x is picked + from X Example set X + - L Hypothesis space H size

More information

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013

Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013 Simple Classifiers Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 1 Overview Pruning 2 Section 3.1: Simplicity First Pruning Always start simple! Accuracy can be misleading.

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination.

Outline. [read Chapter 2] Learning from examples. General-to-specic ordering over hypotheses. Version spaces and candidate elimination. Outline [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Learning from examples General-to-specic ordering over hypotheses Version spaces and candidate elimination algorithm Picking new examples

More information

Version Spaces.

Version Spaces. . Machine Learning Version Spaces Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

The Perceptron. Volker Tresp Summer 2016

The Perceptron. Volker Tresp Summer 2016 The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters

More information

Final Exam. December 11 th, This exam booklet contains five problems, out of which you are expected to answer four problems of your choice.

Final Exam. December 11 th, This exam booklet contains five problems, out of which you are expected to answer four problems of your choice. CS446: Machine Learning Fall 2012 Final Exam December 11 th, 2012 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. Note that there is

More information

EECS 349:Machine Learning Bryan Pardo

EECS 349:Machine Learning Bryan Pardo EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example

More information

Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

More information

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

CSC314 / CSC763 Introduction to Machine Learning

CSC314 / CSC763 Introduction to Machine Learning CSC314 / CSC763 Introduction to Machine Learning COMSATS Institute of Information Technology Dr. Adeel Nawab More on Evaluating Hypotheses/Learning Algorithms Lecture Outline: Review of Confidence Intervals

More information

The Perceptron. Volker Tresp Summer 2014

The Perceptron. Volker Tresp Summer 2014 The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

Tutorial 6. By:Aashmeet Kalra

Tutorial 6. By:Aashmeet Kalra Tutorial 6 By:Aashmeet Kalra AGENDA Candidate Elimination Algorithm Example Demo of Candidate Elimination Algorithm Decision Trees Example Demo of Decision Trees Concept and Concept Learning A Concept

More information

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University

Concept Learning Mitchell, Chapter 2. CptS 570 Machine Learning School of EECS Washington State University Concept Learning Mitchell, Chapter 2 CptS 570 Machine Learning School of EECS Washington State University Outline Definition General-to-specific ordering over hypotheses Version spaces and the candidate

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision: Confusion matrix classifier-determined positive label classifier-determined negative label true positive a b label true negative c d label Accuracy = (a+d)/(a+b+c+d) a = true positives b = false negatives

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

CS 543 Page 1 John E. Boon, Jr.

CS 543 Page 1 John E. Boon, Jr. CS 543 Machine Learning Spring 2010 Lecture 05 Evaluating Hypotheses I. Overview A. Given observed accuracy of a hypothesis over a limited sample of data, how well does this estimate its accuracy over

More information

Computational Learning Theory

Computational Learning Theory 09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision

More information

Algorithms for Classification: The Basic Methods

Algorithms for Classification: The Basic Methods Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Computational Learning Theory

Computational Learning Theory 0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions

More information

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition Introduction Decision Tree Learning Practical methods for inductive inference Approximating discrete-valued functions Robust to noisy data and capable of learning disjunctive expression ID3 earch a completely

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Concept Learning Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 /

More information

Classification: Decision Trees

Classification: Decision Trees Classification: Decision Trees These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online. Feel

More information

g(e) c(h T) E 1 E 12 E n E S

g(e) c(h T) E 1 E 12 E n E S Conditions for Occam's Razor Applicability and Noise Elimination Dragan Gamberger 1 and Nada Lavrac 2 1 Rudjer Boskovic Institute, Bijenicka 54,10000 Zagreb, Croatia E-mail: gambi@lelhp1.irb.hr 2 Jozef

More information

Naïve Bayesian. From Han Kamber Pei

Naïve Bayesian. From Han Kamber Pei Naïve Bayesian From Han Kamber Pei Bayesian Theorem: Basics Let X be a data sample ( evidence ): class label is unknown Let H be a hypothesis that X belongs to class C Classification is to determine H

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Decision Tree Learning - ID3

Decision Tree Learning - ID3 Decision Tree Learning - ID3 n Decision tree examples n ID3 algorithm n Occam Razor n Top-Down Induction in Decision Trees n Information Theory n gain from property 1 Training Examples Day Outlook Temp.

More information

ICML '97 and AAAI '97 Tutorials

ICML '97 and AAAI '97 Tutorials A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best

More information

Parts 3-6 are EXAMPLES for cse634

Parts 3-6 are EXAMPLES for cse634 1 Parts 3-6 are EXAMPLES for cse634 FINAL TEST CSE 352 ARTIFICIAL INTELLIGENCE Fall 2008 There are 6 pages in this exam. Please make sure you have all of them INTRODUCTION Philosophical AI Questions Q1.

More information

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds

Lecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,

More information

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Short Note: Naive Bayes Classifiers and Permanence of Ratios Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence

More information

Decision Tree Learning and Inductive Inference

Decision Tree Learning and Inductive Inference Decision Tree Learning and Inductive Inference 1 Widely used method for inductive inference Inductive Inference Hypothesis: Any hypothesis found to approximate the target function well over a sufficiently

More information

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning Topics Bayesian Learning Sattiraju Prabhakar CS898O: ML Wichita State University Objectives for Bayesian Learning Bayes Theorem and MAP Bayes Optimal Classifier Naïve Bayes Classifier An Example Classifying

More information

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees) Decision Trees Lewis Fishgold (Material in these slides adapted from Ray Mooney's slides on Decision Trees) Classification using Decision Trees Nodes test features, there is one branch for each value of

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science

More information

Decision Trees.

Decision Trees. . Machine Learning Decision Trees Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Decision Trees.

Decision Trees. . Machine Learning Decision Trees Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

More information

Intelligent Agents. First Order Logic. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University. last change: 19.

Intelligent Agents. First Order Logic. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University. last change: 19. Intelligent Agents First Order Logic Ute Schmid Cognitive Systems, Applied Computer Science, Bamberg University last change: 19. Mai 2015 U. Schmid (CogSys) Intelligent Agents last change: 19. Mai 2015

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Classification (Categorization) CS 391L: Machine Learning: Inductive Classification. Raymond J. Mooney. Sample Category Learning Problem

Classification (Categorization) CS 391L: Machine Learning: Inductive Classification. Raymond J. Mooney. Sample Category Learning Problem Classification (Categorization) CS 9L: Machine Learning: Inductive Classification Raymond J. Mooney University of Texas at Austin Given: A description of an instance, x X, where X is the instance language

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

MAI0203 Lecture 7: Inference and Predicate Calculus

MAI0203 Lecture 7: Inference and Predicate Calculus MAI0203 Lecture 7: Inference and Predicate Calculus Methods of Artificial Intelligence WS 2002/2003 Part II: Inference and Knowledge Representation II.7 Inference and Predicate Calculus MAI0203 Lecture

More information

Decision Tree Learning

Decision Tree Learning Topics Decision Tree Learning Sattiraju Prabhakar CS898O: DTL Wichita State University What are decision trees? How do we use them? New Learning Task ID3 Algorithm Weka Demo C4.5 Algorithm Weka Demo Implementation

More information

Data Mining Part 4. Prediction

Data Mining Part 4. Prediction Data Mining Part 4. Prediction 4.3. Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Bayes Theorem Naïve References Introduction Bayesian classifiers A statistical classifiers Introduction

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with

More information

CS 380: ARTIFICIAL INTELLIGENCE

CS 380: ARTIFICIAL INTELLIGENCE CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING 11/11/2013 Santiago Ontañón santi@cs.drexel.edu https://www.cs.drexel.edu/~santi/teaching/2013/cs380/intro.html Summary so far: Rational Agents Problem

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 27

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 27 CS 70 Discrete Mathematics for CS Spring 007 Luca Trevisan Lecture 7 Infinity and Countability Consider a function f that maps elements of a set A (called the domain of f ) to elements of set B (called

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples

Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples Published in: Journal of Global Optimization, 5, pp. 69-9, 199. Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples Evangelos Triantaphyllou Assistant

More information

Introduction to Machine Learning

Introduction to Machine Learning Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis Decision-Tree Learning Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) [read Chapter 3] [some of Chapter 2 might help ] [recommended exercises 3.1, 3.2] Decision tree representation

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Chapter 3: Decision Tree Learning

Chapter 3: Decision Tree Learning Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) Administration Books? New web page: http://www.cs.rutgers.edu/~mlittman/courses/ml03/ schedule lecture notes assignment info.

More information

Lecture 3 Classification, Logistic Regression

Lecture 3 Classification, Logistic Regression Lecture 3 Classification, Logistic Regression Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se F. Lindsten Summary

More information

Stephen Scott.

Stephen Scott. 1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions

More information

Learning From Inconsistent and Noisy Data: The AQ18 Approach *

Learning From Inconsistent and Noisy Data: The AQ18 Approach * Eleventh International Symposium on Methodologies for Intelligent Systems, Warsaw, pp. 411-419, 1999 Learning From Inconsistent and Noisy Data: The AQ18 Approach * Kenneth A. Kaufman and Ryszard S. Michalski*

More information

Decision Tree Learning

Decision Tree Learning 0. Decision Tree Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 3 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell PLAN 1. Concept learning:

More information

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington State University Outline Decision tree representation ID3 learning algorithm Entropy and information gain

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Lecture 8. Instructor: Haipeng Luo

Lecture 8. Instructor: Haipeng Luo Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

More information

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University.

Concept Learning. Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University. Concept Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Tom M. Mitchell, Machine Learning, Chapter 2 2. Tom M. Mitchell s

More information

Fundamentals of Concept Learning

Fundamentals of Concept Learning Aims 09s: COMP947 Macine Learning and Data Mining Fundamentals of Concept Learning Marc, 009 Acknowledgement: Material derived from slides for te book Macine Learning, Tom Mitcell, McGraw-Hill, 997 ttp://www-.cs.cmu.edu/~tom/mlbook.tml

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016

Machine Learning. Computational Learning Theory. Eric Xing , Fall Lecture 9, October 5, 2016 Machine Learning 10-701, Fall 2016 Computational Learning Theory Eric Xing Lecture 9, October 5, 2016 Reading: Chap. 7 T.M book Eric Xing @ CMU, 2006-2016 1 Generalizability of Learning In machine learning

More information