Machine Learning Alternatives to Manual Knowledge Acquisition

Similar documents
Learning Decision Trees

Induction on Decision Trees

Learning Decision Trees

Classification: Decision Trees

Tutorial 6. By:Aashmeet Kalra

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision Trees / NLP Introduction


Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification Using Decision Trees

Decision Tree Learning and Inductive Inference

Lecture 3: Decision Trees

Decision Trees. Tirgul 5

CS 6375 Machine Learning

The Solution to Assignment 6

Decision Tree Learning

the tree till a class assignment is reached

Rule Generation using Decision Trees

Learning Classification Trees. Sargur Srihari

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Decision Tree Learning

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Lecture 3: Decision Trees

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Classification and Regression Trees

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

EECS 349:Machine Learning Bryan Pardo

Machine Learning 2nd Edi7on

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision T ree Tree Algorithm Week 4 1

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision Support. Dr. Johan Hagelbäck.

Decision Trees.

Decision Tree Learning - ID3

Artificial Intelligence. Topic

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Inductive Learning. Chapter 18. Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Machine Learning & Data Mining

Dan Roth 461C, 3401 Walnut

Decision Trees. Gavin Brown

Bayesian Classification. Bayesian Classification: Why?

Induction of Decision Trees

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Machine Learning 2010

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Chapter 3: Decision Tree Learning

Decision Trees.

Decision Trees. Danushka Bollegala

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Administrative notes. Computational Thinking ct.cs.ubc.ca

Classification and Prediction

Tools of AI. Marcin Sydow. Summary. Machine Learning

Algorithms for Classification: The Basic Methods

Decision Trees Entropy, Information Gain, Gain Ratio

Machine Learning 3. week

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

Symbolic methods in TC: Decision Trees

Decision Tree. Decision Tree Learning. c4.5. Example

Einführung in Web- und Data-Science

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

Notes on Machine Learning for and

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Modern Information Retrieval

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Data classification (II)

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Decision Tree And Random Forest

CC283 Intelligent Problem Solving 28/10/2013

Decision Tree Learning

10-701/ Machine Learning: Assignment 1

Learning Systems : AI Course Lecture 31 34, notes, slides RC Chakraborty, June 01, 2010.

Decision Tree Learning

CS145: INTRODUCTION TO DATA MINING

Chapter 6: Classification

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Inductive Learning. Chapter 18. Why Learn?

Reminders. HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (

Typical Supervised Learning Problem Setting

Data Mining. Chapter 1. What s it all about?

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Bayesian Learning. Bayesian Learning Criteria

Decision Tree Learning Lecture 2

UVA CS 4501: Machine Learning

Data Mining and Machine Learning

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

CSCE 478/878 Lecture 6: Bayesian Learning

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Transcription:

Machine Learning Alternatives to Manual Knowledge Acquisition Interactive programs which elicit knowledge from the expert during the course of a conversation at the terminal. Programs which learn by scanning texts. Programs which learn the concepts of a domain under varying degrees of supervision from a human teacher. Lecture 11 Machine Learning 1

Inductive Learning Inductive learning is a form of supervised learning which involves learning from examples by a process of generalization. The learning task is to identify or construct the relevant concept, i.e., the concept which includes all of the positive examples and none of the negative examples. This kind of learning is often called concept learning Lecture 11 Machine Learning 2

Concept Learning Problem A concept can be conceived of as a pattern which states those properties which are common to instances of the concept. Given (i) a language of patterns for describing concepts, (ii) sets of positive and negative instances of the target concept, and (iii) a way of matching data in the form of training instances against hypothetical descriptions of the output, the task is to determine concept description in the language that are consistent with the training instances. Lecture 11 Machine Learning 3

Generality and Specificity P1 STANDING BRICK SUPPORTS LYING WEDGE or BRICK P2 not LYING any shape TOUCHES any orientation WEDGE or BRICK P1 and P2 both represent the following patterns, but P1 is more specific than P2. Lecture 11 Machine Learning 4

Representation Language Properties and values of each car in the concept space: Origin {Japan, USA, Britain, Germany, Italty} Manufacturer {Honda, Toyota, Ford, Chrysler, BMW} Color {Blue, Green, Red, White} Decade {1960, 1970, 1980, 1990, 2000} Type {Economy, Luxury, Sports} A car is represented by an ordered list: (x 1, x 2, x 3, x 4, x 5 ) Thus the concept of Japanese economy car will be (Japan, x 2, x 3, x 4, Economy) Lecture 11 Machine Learning 5

Partial Ordering of Concepts (x 1, x 2, x 3, x 4, x 5 ) (Japan, x 2, x 3, x 4, x 5 ) (x 1, x 2, x 3, x 4, Economy)...... (Japan, x 2, x 3, x 4, Economy) (USA, x 2, x 3, x 4, Economy)...... (Japan,Honda,White,1990,Economy) (USA,Chrysler,Green,1980,Economy)... Lecture 11 Machine Learning 6

A Training Set origin mfr color decade type pos/neg Japan Honda Blue 1980 Economy pos Japan Toyota Green 1970 Sports neg Japan Toyota Blue 1990 Economy pos USA Chrysler Red 2000 Economy neg Japan Honda White 1980 Economy pos Lecture 11 Machine Learning 7

Version Space The set of maximally general patterns (G) The set of maximally specific patterns (S) All concept descriptions which occur between these two sets (Version Space) in the partial ordering Boundary of S Version Space??? Boundary of G?? + + +? + + + + +? +???? Lecture 11 Machine Learning 8

Candidate Elimination Algorithm 1. Initialize G to contain the most general descriptions (i.e. all features are variables). 2. Initialize S to contain the first positive example 3. Accept a new training example If positive example, remove from G any descriptions that do not cover the example. Then update S to contain the most specific set of descriptions in the version space that cover the example and the current elements of S, i.e. generalize S as little as possible. If negative example, remove from S any descriptions that cover the example. Then update G to contain the most general set of descriptions in the version space that do not cover the example, i.e. specialize G as little as possible. 4. If G=S and both are singletons, output their value and halt. If G and S are singletons but G S, then training cases are inconsistent. Output result and halt. Otherwise, go to Step 3. Lecture 11 Machine Learning 9

A Search of Version Space 1 st Example (pos): (Japan, Honda, Blue, 1980, Economy) G={(x 1, x 2, x 3, x 4, x 5 )} S={(Japan, Honda, Blue, 1980, Economy)} 2 nd Example (neg): (Japan, Toyota, Green, 1970, Sports) G={(x 1, Honda, x 3, x 4, x 5 ), (x 1, x 2, Blue, x 4, x 5 ), (x 1, x 2, x 3, 1980, x 5 ), (x 1, x 2, x 3, x 4, Economy)} S={(Japan, Honda, Blue, 1980, Economy)} 3 rd Example (pos): (Japan, Toyota, Blue, 1990, Economy) G={(x 1, x 2, Blue, x 4, x 5 ), (x 1, x 2, x 3, x 4, Economy)} S={(Japan, x 2, Blue, x 4, Economy)} Lecture 11 Machine Learning 10

A Search of Version Space (contd.) 4 th Example (neg): (USA, Chrysler, Red, 2000, Economy) G={(Japan, x 2, Blue, x 4, x 5 ), (Japan, x 2, x 3, x 4, Economy)} S ={(Japan, x 2, Blue, x 4, Economy)} 5 th Example (pos): (Japan, Honda, White, 1980, Economy) G={(Japan, x 2, x 3, x 4, Economy)} S ={(Japan, x 2, x 3, x 4, Economy)} Lecture 11 Machine Learning 11

Meta-DENDRAL Meta-DENDRAL is an expert system that helps chemists determine the dependence of mass spectrometric fragmentation on substructural features. It does this by discovering fragmentation rules for given classes of molecules. The system derives these rules from training instances consisting of sets of molecules with known 3-D structures and mass spectra. Meta-DENDRAL uses Candidate Elimination Algorithm. It first generates a set of highly specific rules which account for a single fragmentation in a particular molecule. Then it uses the training examples to generalize these rules. Lecture 11 Machine Learning 12

Decision Trees as Knowledge Representation Rules are not the only way of representing attribute-value information about concepts for the purpose of classification. Decision trees are an alternative way of structuring such information. Quinlan defines decision trees as structures that consist of Leaf nodes, representing a class, and Decision nodes, spe some test to be carried out on a single attribute value, with one branch for each possible outcome of the test. Lecture 11 Machine Learning 13

A Training Set: Play/Don t Play No. Outlook Temperature Humidity Windy Class 1 sunny hot high false N 2 sunny hot high true N 3 overcast hot high false P 4 rain mild high false P 5 rain cool normal false P 6 rain cool normal true N 7 overcast cool normal true P 8 sunny mild high false N 9 sunny cool normal false P 10 rain mild normal false P 11 sunny mild normal true P 12 overcast mild high true P 13 overcast hot normal false P 14 rain mild high true N Lecture 11 Machine Learning 14

Decision Tree Derived from Training Set outlook sunny overcast rain humidity P windy high normal true false N P N P Lecture 11 Machine Learning 15

Classification Rule based on Decision Tree If Then outlook = overcast outlook = sunny & humidity = normal outlook = rain & windy = false P Lecture 11 Machine Learning 16

ID3 Algorithm Given (1) a set of disjoint target classes (C 1, C 2,, C k ), and (2) a set of training data, S, containing objects of more than one class ID3 uses a series of tests to refine S into subsets that contain objects of only one class. ID3 builds a decision tree, where non-terminal nodes correspond to tests on a single attribute of the data, and terminal nodes correspond to classified subsets of the data. Let T be any test on a single attribute. Thus T produces a partition {S 1, S 2,, S n } based on outcome O 1, O 2,,O n : S i = {x T(x) = O i } Lecture 11 Machine Learning 17

Tree Structure of Partitioned Objects O 1 O 2..... O n S S 1 S 2....... S n Lecture 11 Machine Learning 18

Information Theory Consider a set of message M = {m 1, m 2,, m n } Each message mi has probability p(m i ) of being received and contains an amount of information I(m i ) as follows: I(m i ) = log 2 p(m i ) The uncertainty (or entropy) of a message set U(M) is the sum of information in the possible messages weighted by their probabilities: U(M) = Σ i p(m i )log 2 p(m i ) for i = 1 to n Lecture 11 Machine Learning 19

Building Decision Trees in ID3 Let N i stand for the number of cases in S that belong to class C i. Then the probability that a random case c belongs to class C i is estimated to be: Ni p( c Ci ) = S Thus the amount of information in a message of class C i is: I c C ) = log p( c C ) bits ( i 2 i Consider the set of target classes as a message set {C 1, C 2,,C k }. The uncertainty U(S) measures the average amount of information need to determine the class of a random case, c S, prior to partitioning by any test. Thus: U ( S) = ( ) ( ) i = p c C 1to k i I c Ci bits Lecture 11 Machine Learning 20

Building Decision Trees in ID3 (contd.) Consider a similar uncertainty measure after S has been partitioned into {S 1, S 2,, S n } by a test T: U T ( S) = Si U i = 1to n S i ( S i ) U T (S) measures how much information is needed for the partitioning. Thus ID3 decides what attribute to branch on next by selecting the test T that gains the most information, i.e. maximum G S (T) given below: G S (T) = U(S) U T (S) Lecture 11 Machine Learning 21

S = {P, N } Play/Don t Play Example U ( S) = p /( p + n)log p /( p + n) n /( p + n)log n /( p + n) = (9 /14)log 2 2 (9 /14) (6/14) log (6/14) = 0.9338 For T = Outlook, {S 1, S 2, S 3 } = {sunny, overcast, rain} U(sunny) = (2/5)log 2 (2/5) (3/5)log 2 (3/5) = 0.971 U(overcast) = (4/4)log 2 (4/4) (0/4)log 2 (0/4) = 0 U(rain) = (3/5)log 2 (3/5) (2/5)log 2 (2/5) = 0.971 U Outlook (S) = (5/14) 0.971 + (4/14) 0 + (5/14) 0.971 = 0.6936 G S (Outlook) = U(S) U Outlook (S) = 0.9338 0.694 = 0.2402 2 2 Lecture 11 Machine Learning 22

Play/Don t Play Example (contd.) Similarly, U Temperature (S) = 0.9226 U Humidity (S) = 0.9177 U Windy (S) = 0.8922 G S (Temperature) = U(S) U Temperature (S) = 0.9338 0.9226 = 0.0112 G S (Humidity) = U(S) U Humidity (S) = 0.9338 0.9177 = 0.0161 G S (Windy) = U(S) U Windy (S) = 0.9338 0.8922 = 0.0416 Thus T = Outlook has the highest information gain and is thus chosen as the root. Lecture 11 Machine Learning 23

C4.5 C4.5 is a suite of programs that embody the ID3 algorithm. The gain criterion is defined as a gain ratio, H S (T) in C4.5: H S ( T) = G S ( T ) V ( S) where V = Si Si S) log i = 1 to n S S ( 2 The new heuristics is to select a test that maximizes the gain ratio. Lecture 11 Machine Learning 24