Classification: Rules. Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo regionale di Como

Similar documents
p-adic Egyptian Fractions

Convert the NFA into DFA

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Parse trees, ambiguity, and Chomsky normal form

Bayesian Networks: Approximate Inference

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

2.4 Linear Inequalities and Interval Notation

Chapter 5 Plan-Space Planning

Review of Gaussian Quadrature method

19 Optimal behavior: Game theory

1 Online Learning and Regret Minimization

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

0.1 THE REAL NUMBER LINE AND ORDER

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

CS 188: Artificial Intelligence Spring 2007

Fast Frequent Free Tree Mining in Graph Databases

Designing Information Devices and Systems I Spring 2018 Homework 7

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Finite Automata-cont d

Linear Inequalities. Work Sheet 1

Section 6: Area, Volume, and Average Value

CS 330 Formal Methods and Models

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

Interpreting Integrals and the Fundamental Theorem

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Chapter 0. What is the Lebesgue integral about?

Lecture 2 : Propositions DRAFT

New data structures to reduce data size and search time

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

First Midterm Examination

Nondeterminism and Nodeterministic Automata

Resources. Introduction: Binding. Resource Types. Resource Sharing. The type of a resource denotes its ability to perform different operations

Continuous Random Variables Class 5, Jeremy Orloff and Jonathan Bloom

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac

Lecture 20: Numerical Integration III

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Section 6.1 Definite Integral

Name Ima Sample ASU ID

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2.

1 Nondeterministic Finite Automata

First Midterm Examination

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Special Relativity solved examples using an Electrical Analog Circuit

Lecture 3: Equivalence Relations

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

Minimal DFA. minimal DFA for L starting from any other

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1.

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

1B40 Practical Skills

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

Improper Integrals. The First Fundamental Theorem of Calculus, as we ve discussed in class, goes as follows:

1 From NFA to regular expression

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms

Quantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O

C Dutch System Version as agreed by the 83rd FIDE Congress in Istanbul 2012

Physics 1402: Lecture 7 Today s Agenda

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

Section 4: Integration ECO4112F 2011

Review of Probability Distributions. CS1538: Introduction to Simulations

Continuous Random Variables

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

Homework 3 Solutions

Chapter 4: Techniques of Circuit Analysis. Chapter 4: Techniques of Circuit Analysis

DIRECT CURRENT CIRCUITS

CS 275 Automata and Formal Language Theory

An Overview of Integration

M344 - ADVANCED ENGINEERING MATHEMATICS

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Bases for Vector Spaces

Monte Carlo method in solving numerical integration and differential equation

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Date Lesson Text TOPIC Homework. Solving for Obtuse Angles QUIZ ( ) More Trig Word Problems QUIZ ( )

KNOWLEDGE-BASED AGENTS INFERENCE

MAA 4212 Improper Integrals

MATH 144: Business Calculus Final Review

CM10196 Topic 4: Functions and Relations

For the percentage of full time students at RCC the symbols would be:

Reinforcement Learning

Predict Global Earth Temperature using Linier Regression

Topics Covered AP Calculus AB

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar)

Lecture 08: Feb. 08, 2019

10 Vector Integral Calculus

CS 311 Homework 3 due 16:30, Thursday, 14 th October 2010

Combinational Logic. Precedence. Quick Quiz 25/9/12. Schematics à Boolean Expression. 3 Representations of Logic Functions. Dr. Hayden So.

AQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Designing finite automata II

3 Regular expressions

STRAND J: TRANSFORMATIONS, VECTORS and MATRICES

LECTURE NOTE #12 PROF. ALAN YUILLE

Transcription:

Metodologie per Sistemi Intelligenti Clssifiction: Prof. Pier Luc Lnzi Lure in Ingegneri Informtic Politecnico di Milno Polo regionle di Como Rules

Lecture outline Why rules? Wht re clssifiction rules? Which type of rules? Sequentil Covering Algorithms

Why rules? One of the most expressive nd most humn redle representtion for hypotheses is sets of IF-THEN rules

The Wether dtset gin! No True High Mild Riny Flse Norml Hot Overcst True High Mild Overcst True Norml Mild Sunny Flse Norml Mild Riny Flse Norml Cool Sunny No Flse High Mild Sunny True Norml Cool Overcst No True Norml Cool Riny Flse Norml Cool Riny Flse High Mild Riny Flse High Hot Overcst No True High Hot Sunny No Flse High Hot Sunny Ply Windy Humidity Temp Outlook

Clssifiction rules for the Wether dtset Rule 1: outlook = overcst -> clss Ply [70.7%] Rule 2: outlook = rin windy = flse -> clss Ply [63.0%] Rule 3: outlook = sunny humidity = high -> clss Don't Ply [63.0%] Rule 4: outlook = rin windy = true -> clss Don't Ply [50.0%] Defult clss: Ply

Wht re clssifiction rules? They re IF-THEN rules The IF prt sttes condition over the dt The THEN prt includes clss lel Which type of conditions? Propositionl, with ttriute-vlue comprisons First order Horn cluses, with vriles

Wht re the methods? Method 1: lern decision tree, then convert to rules Method 2: sequentil covering lgorithm

Sequentil covering lgorithms Consider the set E of positive nd negtive exmples Repet Lern one rule with high ccurcy, ny coverge Remove positive exmples covered y this rule Until ll the exmples re covered

Sequentil covering lgorithms

Exploring the Hypothesis Spce Generl to Specific Strt with the most generl hypothesis nd then go on through speciliztion steps Specific to Generl Strt with the set of the most specific hypothesis nd then go on through generliztion steps

Lern-one-rule

Exmple: generting rule y x If true then clss =

Exmple: generting rule, II y x y x 1 2 If x > 1.2 then clss = If true then clss =

Exmple: generting rule, III y x y 1 2 x y 2 6 1 2 x If true then clss = If x > 1.2 nd y > 2.6 then clss = If x > 1.2 then clss =

Exmple: generting rule, IV y x y 1 2 x y 2 6 1 2 x If true then clss = If x > 1.2 nd y > 2.6 then clss = If x > 1.2 then clss = Possile rule set for clss : If x 1.2 then clss = If x > 1.2 nd y 2.6 then clss = More rules could e dded for perfect rule set

Rules vs. trees Corresponding decision tree: (produces exctly the sme predictions) But: rule sets cn e more cler when decision trees suffer from replicted sutrees Also: in multi-clss situtions, covering lgorithm concentrtes on one clss t time wheres decision tree lerner tkes ll clsses into ccount

Rules vs. trees Sequentil covering genertes rule y dding tests tht mximize rule s ccurcy Similr to sitution in decision trees: prolem of selecting n ttriute to split on But decision tree inducer mximizes overll purity Ech new test reduces rule s coverge spce of exmples rule so fr rule fter dding new term

Lern-one-rule Strt from the most generl rule, consisting of n empty condition Add tests on single ttriutes until the performnce (the ccurcy) improves

Lern-one-rule

Exploring the hypothesis spce The lgorithm to explore the hypothesis spce is greedy nd might tend to locl optim To improve the explortion of the hypothesis spce, we cn em serch: t ech step k cndidte hypotheses re considered.

Another exmple: contct lens dt Rule we seek: Possile tests: If? then recommendtion = hrd Age = Young Age = Pre-presyopic Age = Presyopic Spectcle prescription = Myope Spectcle prescription = Hypermetrope Astigmtism = no Astigmtism = yes Ter production rte = Reduced Ter production rte = Norml 2/8 1/8 1/8 3/12 1/12 0/12 4/12 0/12 4/12

Modified rule nd resulting dt Rule with est test dded: Instnces covered y modified rule: Age Young Young Young Young Pre-presyopic Pre-presyopic Pre-presyopic Pre-presyopic Presyopic Presyopic Presyopic Presyopic Spectcle prescription Myope Myope Hypermetrope Hypermetrope Myope Myope Hypermetrope Hypermetrope Myope Myope Hypermetrope Hypermetrope Astigmtism If stigmtism = yes then recommendtion = hrd Ter production rte Reduced Norml Reduced Norml Reduced Norml Reduced Norml Reduced Norml Reduced Norml Recommended lenses None Hrd None hrd None Hrd None None None Hrd None None

Further refinement Current stte: If stigmtism = yes nd? then recommendtion = hrd Possile tests: Age = Young Age = Pre-presyopic Age = Presyopic Spectcle prescription = Myope Spectcle prescription = Hypermetrope Ter production rte = Reduced Ter production rte = Norml 2/4 1/4 1/4 3/6 1/6 0/6 4/6

Modified rule nd resulting dt Rule with est test dded: If stigmtism = yes nd ter production rte = norml then recommendtion = hrd Instnces covered y modified rule: Age Young Young Pre-presyopic Pre-presyopic Presyopic Presyopic Spectcle prescription Myope Hypermetrope Myope Hypermetrope Myope Hypermetrope Astigmtism Ter production rte Norml Norml Norml Norml Norml Norml Recommended lenses Hrd hrd Hrd None Hrd None

Further refinement Current stte: If stigmtism = yes nd ter production rte = norml nd? then recommendtion = hrd Possile tests: Age = Young Age = Pre-presyopic Age = Presyopic Spectcle prescription = Myope Spectcle prescription = Hypermetrope Tie etween the first nd the fourth test We choose the one with greter coverge 2/2 1/2 1/2 3/3 1/3

The result Finl rule: If stigmtism = yes nd ter production rte = norml nd spectcle prescription = myope then recommendtion = hrd Second rule for recommending hrd lenses : (uilt from instnces not covered y first rule) If ge = young nd stigmtism = yes nd ter production rte = norml then recommendtion = hrd These two rules cover ll hrd lenses : Process is repeted with other two clsses

Selecting test Gol: mximize ccurcy t totl numer of instnces covered y rule p positive exmples of the clss covered y rule t p numer of errors mde y rule Simple pproch: select test tht mximizes the rtio p/t We re finished when p/t = 1 or the set of instnces cn t e split ny further

*Test selection criteri Bsic covering lgorithm: keep dding conditions to rule to improve its ccurcy Add the condition tht improves ccurcy the most Mesure 1: p/t t totl instnces covered y rule p numer of these tht re positive Produce rules tht don t cover negtive instnces, s quickly s possile My produce rules with very smll coverge specil cses or noise? Mesure 2: Informtion gin p (log(p/t) log(p/t)) P nd T the positive nd totl numers efore the new condition ws dded Informtion gin emphsizes positive rther thn negtive instnces These interct with the pruning mechnism used

*Missing vlues, numeric ttriutes Common tretment of missing vlues: for ny test, they fil Algorithm must either use other tests to seprte out positive instnces leve them uncovered until lter in the process In some cses it s etter to tret missing s seprte vlue Numeric ttriutes re treted just like they re in decision trees

*Pruning rules Two min strtegies: Incrementl pruning Glol pruning Other difference: pruning criterion Error on hold-out set (reduced-error pruning) Sttisticl significnce MDL principle Also: post-pruning vs. pre-pruning

The RISE lgorithm It works in specific-to-generl pproch Initilly, it cretes one rule for ech trining exmple Then it goes on through elementry generliztion steps until the overll ccurcy does not decrese

The RISE lgorithm Input: ES is the trining set Let RS e ES Compute Acc(RS) Repet For ech rule R in RS, find the nerest exmple E not covered y R (E is of the sme clss s R) R = MostSpecificGenerliztion(R,E) RS = RS with R replced y R if (Acc(RS ) Acc(RS)) then RS=RS if R is identicl to nother rule in RS then,delete R from RS Until no increse in Acc(RS) is otined

Inferring rudimentry rules 1R: lerns 1-level decision tree I.e., rules tht ll test one prticulr ttriute Bsic version One rnch for ech vlue Ech rnch ssigns most frequent clss Error rte: proportion of instnces tht don t elong to the mjority clss of their corresponding rnch Choose ttriute with lowest error rte (ssumes nominl ttriutes)

Pseudo-code for 1R For ech ttriute, For ech vlue of the ttriute, mke rule s follows: count how often ech clss ppers find the most frequent clss mke the rule ssign tht clss to this ttriute-vlue Clculte the error rte of the rules Choose the rules with the smllest error rte Note: missing is treted s seprte ttriute vlue

Evluting the wether ttriutes Outlook Sunny Sunny Overcst Riny Riny Riny Overcst Sunny Sunny Riny Sunny Temp Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Humidity High High High High Norml Norml Norml High Norml Norml Norml Windy Flse True Flse Flse Flse True True Flse Flse Flse True Ply No No No No Attriute Outlook Temp Humidity Windy Rules Sunny No Overcst Riny Hot No* Mild Cool High No Norml Flse True No* Errors 2/5 0/4 2/5 2/4 2/6 1/4 3/7 1/7 2/8 3/6 Totl errors 4/14 5/14 4/14 5/14 Overcst Overcst Mild Hot High Norml True Flse * indictes tie Riny Mild High True No

Deling with numeric ttriutes Discretize numeric ttriutes Divide ech ttriute s rnge into intervls Sort instnces ccording to ttriute s vlues Plce rekpoints where the clss chnges (the mjority clss) This minimizes the totl error

Deling with numeric ttriutes Exmple: temperture from wether dt Outlook Temperture Humidity Windy Ply Sunny 85 85 Flse No Sunny 80 90 True No Overcst 83 86 Flse Riny 75 80 Flse 64 65 68 69 70 71 72 72 75 75 80 81 83 85 No No No No No

The prolem of overfitting This procedure is very sensitive to noise One instnce with n incorrect clss lel will proly produce seprte intervl Also: ID -like ttriutes will hve zero errors Simple solution: enforce minimum numer of instnces in mjority clss per intervl

Discretiztion exmple Exmple (with min = 3): 64 65 68 69 70 71 72 72 75 75 80 81 83 85 No No No No No Finl result for temperture ttriute 64 65 68 69 70 71 72 72 75 75 80 81 83 85 No No No No No

With overfitting voidnce Resulting rule set: Attriute Rules Errors Totl errors Outlook Sunny No 2/5 4/14 Overcst 0/4 Riny 2/5 Temperture 77.5 3/10 5/14 > 77.5 No* 2/4 Humidity 82.5 1/7 3/14 > 82.5 nd 95.5 No 2/6 > 95.5 0/1 Windy Flse 2/8 5/14 True No* 3/6

Discussion of 1R 1R ws descried in pper y Holte (1993) Contins n experimentl evlution on 16 dtsets (using cross-vlidtion so tht results were representtive of performnce on future dt) Minimum numer of instnces ws set to 6 fter some experimenttion 1R s simple rules performed not much worse thn much more complex decision trees Simplicity first pys off!

The Monks 1 dtset A1 A2 A3 A4 A5 A5 CLASS 1 1 1 1 3 1 1 1 1 1 1 3 2 1 1 1 1 3 2 1 1 1 1 1 3 3 2 1 1 1 2 3 1 2 1 1 2 1 1 1 2 1 1 2 1 1 2 1 0 1 2 1 1 3 1 0 1 2 1 1 4 2 0 1 2 1 2 1 1 1 1 2 1 2 3 1 0 1 2 1 2 3 2 0

Monks 1: decision tree ttriute#5 = 1: 1 (29.0/1.4) ttriute#5 = 2: 0 (31.0/13.4) ttriute#5 = 3: ttriute#6 = 1: 0 (13.0/4.7) ttriute#6 = 2: ttriute#3 = 1: 1 (7.0/3.4) ttriute#3 = 2: 0 (10.0/4.6) ttriute#5 = 4: ttriute#1 = 1: 0 (14.0/2.5) ttriute#1 = 2: ttriute#2 = 1: 0 (6.0/1.2) ttriute#2 = 2: 1 (4.0/1.2) ttriute#2 = 3: 0 (1.0/0.8) ttriute#1 = 3: ttriute#2 = 1: 1 (0.0) ttriute#2 = 2: 0 (3.0/1.1) ttriute#2 = 3: 1 (6.0/1.2)

Monks 1: clssifiction rules Rule 1: ttriute#5 = 1 -> clss 1 [95.3%] Rule 20: ttriute#1 = 3 ttriute#2 = 3 -> clss 1 [92.2%] Rule 17: ttriute#1 = 2 ttriute#2 = 2 -> clss 1 [91.2%] Rule 7: ttriute#1 = 1 ttriute#2 = 1 -> clss 1 [85.7%] Rule 14: ttriute#1 = 1 ttriute#5 = 4 -> clss 0 [82.2%] Rule 16: ttriute#1 = 2 ttriute#2 = 1 ttriute#5 = 4 -> clss 0 [79.4%] Defult clss: 0

Monks 1: nother solution (A1 = A2) OR (A5=1) Decision trees nd clssifiction rules do not include vriles, ut only propositions

Wht re Horn cluses? An Horn cluse is n expression of the form: H L 1 L 2 L 3 L n Where H, L 1 L n re positive literls (predictes pplied to set of terms) H is clled hed or consequent L 1 L 2 L 3 L n is clled ody or ntecedents Exmple: dughter(x,y) fther(y,x) femle(x)

Lerning First Order Rules: FOIL Extends the typicl sequentil covering lgorithms to the lerning of first order rules, or Horn cluses The most known lgorithm is FOIL tht employs n pproch very similr to the sequentil-covering nd lern-one-rule lgorithms. FOIL rules re more restricted thn generl Horn cluses (literls cnnot contin functions) FOIL rules re more expressive thn Horn cluses ecuse the literls ppering in the ody my e negted

The FOIL lgorithm

Monks 1: FOIL Cluse 0: is_0(a,b,c,d,e,f) :- A<>B, E<>A1.

Other pproches Sequentil-covering lgorithms re one of the possile pproches Clssifiction rules cn e lso derived from other representtions such s Decision trees Assocition Rules Neurl Networks Alterntively, clssifiction rules cn e derived through other serch pproches, such s Genetic Algorithms nd Genetic Progrmming.

Summry Clssifiction rules re used ecuse more expressive nd more humn redle Most of the lgorithms re sed on sequentil covering which cn e used oth to derive propositionl nd first order rules Other pproches exist Specific to generl explortion (RISE) Post processing of neurl networks, ssocition rules, decision trees, etc.

References Roert C. Holte, Very Simple Clssifiction Rules Perform Well on Most Commonly Used Dtsets. Computer Science Deprtment, University of Ottw Two-Wy Induction. Proceedings of the Seventh IEEE Interntionl Conference on Tools with Artificil Intelligence (pp. 182-189), 1995. Herndon, VA: IEEE Computer Society Press. Rule Induction nd Instnce-Bsed Lerning: A Unified Approch. Proceedings of the Fourteenth Interntionl Joint Conference on Artificil Intelligence (pp. 1226-1232), 1995. Montrel, Cnd: Morgn Kufmnn. The RISE System: Conquering Without Seprting. Proceedings of the Sixth IEEE Interntionl Conference on Tools with Artificil Intelligence (pp. 704-707), 1994. New Orlens, LA: IEEE Computer Society Press.