Performance. Learning Classifiers. Instances x i in dataset D mapped to feature space:
|
|
- Bertram Brown
- 6 years ago
- Views:
Transcription
1 Learning Classifiers Performance Instances x i in dataset D mapped to feature space: Initial Model (Assumptions) Training Process Trained Model Testing Process Tested Model Performance Performance Training Data Test Data decision boundary Classes associated with instances: X, Classification: f(x i ) = c {X, } with x i,j {, }, and f classifier dataset D is a multiset Objective: learn f (supervised) Performance measures: TP: True Positive TN: True Negative FP: False Positive FN: False Negative Note: if N = D, then N = TP+TN+FP+FN Confusion matrix: Predicted class Actual true positive false negative class false positive true negative Performance Performance measures: Success rate σ: TP + TN σ = N Error rate ɛ: ɛ = 1 σ TPR (= recall ρ) True Positive Rate TPR = TP/(TP + FN) FNR False Negative Rate: FNR = 1 TPR FPR False Positive Rate: FPR = FP/(FP + TN) TNR True Negative Rate: TNR = 1 FPR Precision π: π = TP/(TP + FP) F -measure: F = 2 ρ π ρ + π = 2TP 2TP + FP + FN Example: choosing contact lenses Tear production Age prescription Ast rate Lens young myope reduced ne young myope rmal soft young myope reduced ne young myope rmal hard young hypermetrope reduced ne young hypermetrope rmal soft young hypermetrope reduced ne young hypermetrope rmal hard pre-presbyopic myope reduced ne pre-presbyopic myope rmal soft pre-presbyopic myope reduced ne pre-presbyopic myope rmal hard pre-presbyopic hypermetrope reduced ne pre-presbyopic hypermetrope rmal soft pre-presbyopic hypermetrope reduced ne pre-presbyopic hypermetrope rmal ne presbyopic myope reduced ne presbyopic myope rmal ne presbyopic myope reduced ne presbyopic myope rmal hard presbyopic hypermetrope reduced ne presbyopic hypermetrope rmal soft presbyopic hypermetrope reduced ne presbyopic hypermetrope rmal ne Ast = Astigmatism
2 Rule representation and reasoning Rule representation: Logical implication (= rules) LHS RHS (LHS = left-hand side = antecedent; RHS = right-hand side = consequent) Literals in LHS and RHS are of the form: Variable value where {<,, =, >, } Rule-based reasoning: where R F C (or Attribute value) R is a set of rules r R (rule-base) F is a set of facts of the form Variable = value C is a set of conclusions of the same form as facts Basic ideas: ZeroR Construct rule that predicts the majority class Used as baseline performance Example Contact lenses recommendation rule: Lens = ne Total number of instances: 24 Correctly classified instances: 15 (625%) Incorrectly classified instances: 9 (375%) === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class soft hard ne === Confusion Matrix === a b c <-- classified as a = soft b = hard c = ne OneR OneR: Example Construct a single-condition rule for each variable-value pair Select the rules defined for a single variable (in the condition) which perform best OneR(classvar) { R for each var Vars do for each value Domain(var) do classvarmost-freq-value MostFreq(varvalue, classvar) rule MakeRule(varvalue, classvarmost-freq-value) R R {rule} for each r R do CalculateErrorRate(r) R SelectBestRulesForSingleVar(R) } Rules for contact-lenses recommendation: Tears = reduced Lens = ne Tears = rmal Lens = soft 17/24 instances correct Correctly classified instances: 17 (7083%) Incorrectly classified instances: 7 (2916%) === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class soft hard ne === Confusion Matrix === a b c <-- classified as a = soft b = hard c = ne
3 Generalisation: separate-and-cover Covering of classes: Rule-set generation for each class value separately Peeling: box compression instances are peeled off (fall outside the box) one face at the time PRISM algorithm Example: choosing contact lenses Recommended contact lenses: ne, soft, hard General principles: 1 Choose class value, eg hard 2 Construct rule Condition Lens = hard 3 Determine accuracy α = p/t for all possible conditions, where t: total number of instances covered by the rule p: covered instances with the right (positive) class value Condition α = p/t Age = youg 2/8 Age = pre-presbyopic 1/8 Age = presbyopic 1/8 s = myope 3/12 s = hypermetrope 3/12 Astigmatism = 0/12 Astigmatism = 4/12 Tears = reduced 0/12 Tears = rmal 4/12 4 Select best condition (4/12) Separate-and-cover algorithm SC(classvar, D) { R for each val Domain(classvar) do E D while E contains instances with val do rule MakeRule(rhs(classvarval), lhs( )) IR until rule is perfect do for each var Vars, rule IR : var / rule do for each value Domain(var) do inter-rule Add(rule, lhs(varvalue)) IR IR {inter-rule} rule SelectRule(IR) R R {rule} RC InstancesCoveredBy(rule, E) E E\RC } SelectRule: based on accuracy α = p/t; if α = α, for two rules, select the one with highest p Rule: SelectRule example RHS: Lens = hard LHS: Astigmatism =, with α = 4/12 Not very accurate; expanded rule: (Astigmatism = New-condition) Lens = hard Tear product Age prescription Ast rate Lens young myope reduced ne young myope rmal hard young hypermetrope reduced ne young hypermetrope rmal hard pre-presbyopic myope reduced ne pre-presbyopic myope rmal hard pre-presbyopic hypermetrope reduced ne pre-presbyopic hypermetrope rmal ne presbyopic myope reduced ne presbyopic myope rmal hard presbyopic hypermetrope reduced ne presbyopic hypermetrope rmal ne Age = young (2/4); Age = pre-presbyopic (1/4); Age = presbyopic (1/4) s = myope (3/6); s = hypermetrope (1/6) Tears = reduced (0/6); Tears = rmal (4/6)
4 SelectRule example (continued) Rule: (Astigmatism = Tears = rmal) Lens = hard Expanded rule: (Astigmatism = Tears = rmal New-condition) Lens = hard Tear product Age prescription Ast rate Lens young myope rmal hard young hypermetrope rmal hard pre-presbyopic myope rmal hard pre-presbyopic hypermetrope rmal ne presbyopic myope rmal hard presbyopic hypermetrope rmal ne Age = young (2/2); Age = pre-presbyopic (1/2); Age = presbyopic (1/2) s = myope (3/3); s = hypermetrope (1/3) (Astigmatism = Tears = rmal s = myope) Lens = hard SC example (continued) Delete 3 instances from E; new rule: New-condition Lens = hard Tear production Age prescription Ast rate Lens young myope reduced ne young myope rmal soft young myope reduced ne young hypermetrope reduced ne young hypermetrope rmal soft young hypermetrope reduced ne young hypermetrope rmal hard pre-presbyopic myope reduced ne pre-presbyopic myope rmal soft pre-presbyopic myope reduced ne pre-presbyopic hypermetrope reduced ne pre-presbyopic hypermetrope rmal soft pre-presbyopic hypermetrope reduced ne pre-presbyopic hypermetrope rmal ne presbyopic myope reduced ne presbyopic myope rmal ne presbyopic myope reduced ne presbyopic hypermetrope reduced ne presbyopic hypermetrope rmal soft presbyopic hypermetrope reduced ne presbyopic hypermetrope rmal ne Rules: WEKA Results If Astigmatism = and Tears = rmal and s = hypermetrope then Lens = soft If Astigmatism = and Tears = rmal and age = young then Lens = soft If age = pre-presbyopic and Astigmatism = and Tears = rmal then Lens = soft If Astigmatism = and Tears = rmal and s = myope then Lens = hard If age = young and Astigmatism = and Tears = rmal then Lens = hard If Tears = reduced then ne If age = presbyopic and Tears = rmal and s = myope and Astigmatism = then Lens = ne If s = hypermetrope and Astigmatism = and age = pre-presbyopic then Lens = ne If age = presbyopicand s = hypermetrope and Astigmatism = then Lens = ne === Confusion Matrix === a b c <-- classified as a = soft b = hard c = ne Correctly classified instances: 24 (100%) Limitations Adding one condition at the time is greedy search ( optimal state may be missed) Accuracy α = p/t: promotes overfitting: the more correct (higher p compared to t) is, the higher α Resulting rules cover all instances perfectly Example Consider rule r 1 with accuracy α 1 = 1/1 and rule r 2 with accuracy α 2 = 19/20, then r 1 is considered superior to r 2 Alternative 1: information gain Alternative 2: probabilistic measure
5 Information gain where I D (r) = p [ log p t log p ] t α = p/t is the accuracy before adding a condition to r α = p /t is the accuracy after a condition has been added to r Example Consider rule r with α = 1/1 and rule r with accuracy α = 19/20, both modifications of r with α = 20/200 Then is r considered superior to r according to accuracy, but I D (r ) = 1[log(1/1) log(20/200)] = 1 I D (r ) = 19[log(19/20) log(20/200)] 186 hence r is inferior to r according to information gain Comparison accuracy versus information gain Information gain I: Emphasis is on large number of positive instances High coverage cases first, special cases later Resulting rules cover all instances perfectly Accuracy α: Takes number of positive instances only into account if ties break Special cases first, high coverage cases later Resulting rules cover all instances perfectly Probabilistic measure N instance in D Probabilistic measure N instance in D rule r selects t instances M positive concerning the class rule r selects t instances M positive concerning the class p instances concern the class N = D : # instances in dataset D M: # instances in D concerning a class t: # instances in D on which rule r succeeds p: # positive instances Hypergeometric distribution: ( )( ) M N M k t k f(k) = ( ) N t sampling without replacement: probability that k instances out of t belong to the class p instances concern the class Hypergeometric distribution: ( )( ) M N M k t k f(k) = ( ) N t Rule r selects t instances, of which p are positive Probability that a randomly chosen rule r does as well or better than r: ( )( ) min{t,m} P (r min{t,m} M N M k t k ) = f(k) = ( ) N k=p k=p t
6 P (r ) = Approximation min{t,m} k=p min{t,m} k=p ( )( ) M N M k t k ( ) N t ( t k ) ( M N ) k ( 1 M ) t k N ie hypergeometric distribution approximated by a bimial distribution = I M/N (p, t p + 1) where I x (α, β) is the incomplete beta function: I x (α, β) = 1 x B(α, β) 0 zα 1 (1 z) β 1 dz where B(α, β) is the beta function, defined as 1 B(α, β) = 0 zα 1 (1 z) β 1 dz Reduced-error pruning Danger of overfitting to training set can be reduced by splitting this into: a growing set (GS) (2/3 of training set) a pruning set (PS) (1/3 of training set) REP(classvar, D) { R ; E D (GS, PS) Split(E) while E do IR for each val Domain(classvar) do if GS and PS contain a val-instance then rule BSC(classvarval, GS) while P (rule PS) > P (rule PS) do rule rule IR IR {rule} rule SelectRule(IR); R R {rule} RC InstancesCoveredBy(rule, E) E E\RC (GS, PS) Split(E) } BSC is basic separate-and-cover algorithm, and rule is a rule with last condition removed Divide-and-conquer: decision trees young ne Age Astigmatism soft soft Age hard prescription young pre prebyopic myope hypermetrope presbyopic ne reduced presbyopic Tears pre prebyopic soft rmal Learning decision trees: hypermetrope Lens =? prescription myope hard ne ne R Quinlan: ID3, C45 and C50 L Breiman: CART (Classification and Regression Trees) Which variable/attribute is best? X D X = Dataset D: C X class variable D = D X= D X= Entropy: X DX = C H C (X =x) = P (C =c X =x) ln P (C =c X =x) c Expected entropy: E HC (X) = P (X = x)h C (X = x) x
7 Dataset D: Information gain (again) D = D X= D X= Entropy: H C (X =x) = c Expected entropy: E HC (X) = x P (C =c X =x) ln P (C =c X =x) P (X = x)h C (X = x) Without the split of the dataset D on variable X, the entropy is: H C ( ) = c P (C = c) ln P (C = c) Information gain G C from X: G C (X) = H C ( ) E HC (X) Example: contact lenses recommendation Class variable is Lens: 5/24 if Lens = soft P (Lens) = 4/24 if Lens = hard 15/24 if Lens = ne H( ) = 5 24 ln ln For variable Ast (Astigmatism): ln /12 if Lens = soft P (Lens Ast = ) = 0/12 if Lens = hard 7/12 if Lens = ne Therefore: H(Ast=) = 5 12 ln ln ln Example (continued) For variable Ast (Astigmatism): 0/12 if Lens = soft P (Lens Ast = ) = 4/12 if Lens = hard 8/12 if Lens = ne Therefore: H(Ast=) = 0 12 ln ln ln E H (Ast) = 1 2 H(Ast=) H(Ast=) Information gain: = 1/2( ) = 066 G(Ast) = H( ) E H (Ast) = = 026 Example (continued) For variable Tears: E H (Tears) = 1 2 H(Tears=red) H(Tears=rm) 1/2( ) = 055 Information gain: G(Tears) = H( ) E H (Tears) = = 037 Comparison: G(Tears) > G(Ast) Select Tears as first splitting variable
8 Final remarks I A de with too many branches causes the information gain measure to break down Example Suppose that with each branch of a de a dataset with exactly one instance is associated: E HC (X) = n 1/n (1 log log 0) = 0 if X has n values Hence, G C (X) = H C ( ) 0 = H C ( ) attains a maximum Solution: Gain ratio R C : Split information H X ( ) = x Gain ratio: R C (X) = G C (X)/H X ( ) P (X = x) ln P (X = x) Final remarks II Variable selection is myopic: it does t look beyond the effects of its own values; a resulting decision tree is therefore likely to be suboptimal Decision trees may grow unwieldy, and may need to be pruned (ID3 C45) subtree replacement subtree raising Decision trees can also be used for numerical variables: regression trees
Data Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced
More informationClassification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci
Classification: Rule Induction Information Retrieval and Data Mining Prof. Matteo Matteucci What is Rule Induction? The Weather Dataset 3 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny
More informationDecision trees. Decision tree induction - Algorithm ID3
Decision trees A decision tree is a predictive model which maps observations about an item to conclusions about the item's target value. Another name for such tree models is classification trees. In these
More informationEvaluation & Credibility Issues
Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 4 Algorithms: The basic methods Simplicity first: 1R Use all attributes: Naïve Bayes Decision trees: ID3 Covering algorithms: decision rules: PRISM Association
More informationMachine Learning Chapter 4. Algorithms
Machine Learning Chapter 4. Algorithms 4 Algorithms: The basic methods Simplicity first: 1R Use all attributes: Naïve Bayes Decision trees: ID3 Covering algorithms: decision rules: PRISM Association rules
More informationPerformance Evaluation
Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive
More informationMachine Learning 2nd Edi7on
Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
More informationMoving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1
Machine Learning Overview Supervised Learning Training esting Te Unseen data Data Observed x 1 x 2... x n 1.6 7.1... 2.7 1.4 6.8... 3.1 2.1 5.4... 2.8... Machine Learning Patterns y = f(x) Target y Buy
More informationDECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]
1 DECISION TREE LEARNING [read Chapter 3] [recommended exercises 3.1, 3.4] Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting Decision Tree 2 Representation: Tree-structured
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}
More informationCC283 Intelligent Problem Solving 28/10/2013
Machine Learning What is the research agenda? How to measure success? How to learn? Machine Learning Overview Unsupervised Learning Supervised Learning Training Testing Unseen data Data Observed x 1 x
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationData classification (II)
Lecture 4: Data classification (II) Data Mining - Lecture 4 (2016) 1 Outline Decision trees Choice of the splitting attribute ID3 C4.5 Classification rules Covering algorithms Naïve Bayes Classification
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationPerformance Evaluation
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Example:
More informationQ1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)
Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationDecision Trees Entropy, Information Gain, Gain Ratio
Changelog: 14 Oct, 30 Oct Decision Trees Entropy, Information Gain, Gain Ratio Lecture 3: Part 2 Outline Entropy Information gain Gain ratio Marina Santini Acknowledgements Slides borrowed and adapted
More informationML techniques. symbolic techniques different types of representation value attribute representation representation of the first order
MACHINE LEARNING Definition 1: Learning is constructing or modifying representations of what is being experienced [Michalski 1986], p. 10 Definition 2: Learning denotes changes in the system That are adaptive
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy
More informationDecision Trees. Introduction. Some facts about decision trees: They represent data-classification models.
Decision Trees Introduction Some facts about decision trees: They represent data-classification models. An internal node of the tree represents a question about feature-vector attribute, and whose answer
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect
More informationDecision Tree Learning
Topics Decision Tree Learning Sattiraju Prabhakar CS898O: DTL Wichita State University What are decision trees? How do we use them? New Learning Task ID3 Algorithm Weka Demo C4.5 Algorithm Weka Demo Implementation
More informationEECS 349:Machine Learning Bryan Pardo
EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example
More informationQuestion of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning
Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationCSCI 5622 Machine Learning
CSCI 5622 Machine Learning DATE READ DUE Mon, Aug 31 1, 2 & 3 Wed, Sept 2 3 & 5 Wed, Sept 9 TBA Prelim Proposal www.rodneynielsen.com/teaching/csci5622f09/ Instructor: Rodney Nielsen Assistant Professor
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationDecision Trees / NLP Introduction
Decision Trees / NLP Introduction Dr. Kevin Koidl School of Computer Science and Statistic Trinity College Dublin ADAPT Research Centre The ADAPT Centre is funded under the SFI Research Centres Programme
More informationMachine Learning & Data Mining
Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM
More informationCptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1
CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationSupervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!
Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationAlgorithms for Classification: The Basic Methods
Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationC4.5 - pruning decision trees
C4.5 - pruning decision trees Quiz 1 Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can have? A: No. Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationDecision Trees. Tirgul 5
Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2
More informationDecision Tree Learning
Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,
More informationThe Solution to Assignment 6
The Solution to Assignment 6 Problem 1: Use the 2-fold cross-validation to evaluate the Decision Tree Model for trees up to 2 levels deep (that is, the maximum path length from the root to the leaves is
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information
More informationAdaptive Crowdsourcing via EM with Prior
Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and
More informationIndex. Cambridge University Press Relational Knowledge Discovery M E Müller. Index. More information
s/r. See quotient, 93 R, 122 [x] R. See class, equivalence [[P Q]]s, 142 =, 173, 164 A α, 162, 178, 179 =, 163, 193 σ RES, 166, 22, 174 Ɣ, 178, 179, 175, 176, 179 i, 191, 172, 21, 26, 29 χ R. See rough
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Intelligent Data Analysis Decision Trees Paul Prasse, Niels Landwehr, Tobias Scheffer Decision Trees One of many applications:
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More information.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional
More informationCS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014
CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof.
More informationMachine Learning, Fall 2011: Homework 5
0-60 Machine Learning, Fall 0: Homework 5 Machine Learning Department Carnegie Mellon University Due:??? Instructions There are 3 questions on this assignment. Please submit your completed homework to
More informationDecision Support. Dr. Johan Hagelbäck.
Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More informationEVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE
EVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE Msc. Daniela Qendraj (Halidini) Msc. Evgjeni Xhafaj Department of Mathematics, Faculty of Information Technology, University
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 06 - Regression & Decision Trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom
More informationBig Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010
Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn 1 Some overheads from Galit Shmueli and Peter Bruce 2010 Most accurate Best! Actual value Which is more accurate?? 2 Why Evaluate
More informationEvaluation Metrics for Intrusion Detection Systems - A Study
Evaluation Metrics for Intrusion Detection Systems - A Study Gulshan Kumar Assistant Professor, Shaheed Bhagat Singh State Technical Campus, Ferozepur (Punjab)-India 152004 Email: gulshanahuja@gmail.com
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationDecision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology
Decision trees Special Course in Computer and Information Science II Adam Gyenge Helsinki University of Technology 6.2.2008 Introduction Outline: Definition of decision trees ID3 Pruning methods Bibliography:
More informationClassification and regression trees
Classification and regression trees Pierre Geurts p.geurts@ulg.ac.be Last update: 23/09/2015 1 Outline Supervised learning Decision tree representation Decision tree learning Extensions Regression trees
More informationThe exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS 188 Fall 2015 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
More informationPerformance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany
Sören Sonnenburg Fraunhofer FIRST.IDA, Kekuléstr. 7, 2489 Berlin, Germany Roadmap: Contingency Table Scores from the Contingency Table Curves from the Contingency Table Discussion Sören Sonnenburg Contingency
More informationPerformance evaluation of binary classifiers
Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in
More informationClassification: Decision Trees
Classification: Decision Trees These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online. Feel
More informationDiagnostics. Gad Kimmel
Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationClassification Using Decision Trees
Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association
More informationSmart Home Health Analytics Information Systems University of Maryland Baltimore County
Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on
More informationHypothesis Evaluation
Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall 1395 1 / 31 Table of contents 1 Introduction
More informationLinear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining
Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X
More informationDecision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University
Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington State University Outline Decision tree representation ID3 learning algorithm Entropy and information gain
More informationGeneralization Error on Pruning Decision Trees
Generalization Error on Pruning Decision Trees Ryan R. Rosario Computer Science 269 Fall 2010 A decision tree is a predictive model that can be used for either classification or regression [3]. Decision
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationDynamic Clustering-Based Estimation of Missing Values in Mixed Type Data
Dynamic Clustering-Based Estimation of Missing Values in Mixed Type Data Vadim Ayuyev, Joseph Jupin, Philip Harris and Zoran Obradovic Temple University, Philadelphia, USA 2009 Real Life Data is Often
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationSymbolic methods in TC: Decision Trees
Symbolic methods in TC: Decision Trees ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado https://www.cs.tcd.ie/kevin.koidl/cs0/ kevin.koidl@scss.tcd.ie, maldonaa@tcd.ie 01-017 A symbolic
More informationTypical Supervised Learning Problem Setting
Typical Supervised Learning Problem Setting Given a set (database) of observations Each observation (x1,, xn, y) Xi are input variables Y is a particular output Build a model to predict y = f(x1,, xn)
More informationPerformance Evaluation and Hypothesis Testing
Performance Evaluation and Hypothesis Testing 1 Motivation Evaluating the performance of learning systems is important because: Learning systems are usually designed to predict the class of future unlabeled
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationClassification: Decision Trees
Classification: Decision Trees Outline Top-Down Decision Tree Construction Choosing the Splitting Attribute Information Gain and Gain Ratio 2 DECISION TREE An internal node is a test on an attribute. A
More informationReview of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations
Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,
More informationData Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29
Data Mining and Knowledge Discovery Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2011/11/29 1 Practice plan 2011/11/08: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate test set,
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationJialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 10/01/2013
Simple Classifiers Jialiang Bao, Joseph Boyd, James Forkey, Shengwen Han, Trevor Hodde, Yumou Wang 1 Overview Pruning 2 Section 3.1: Simplicity First Pruning Always start simple! Accuracy can be misleading.
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationArtificial Intelligence Decision Trees
Artificial Intelligence Decision Trees Andrea Torsello Decision Trees Complex decisions can often be expressed in terms of a series of questions: What to do this Weekend? If my parents are visiting We
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationDecision Trees. Danushka Bollegala
Decision Trees Danushka Bollegala Rule-based Classifiers In rule-based learning, the idea is to learn a rule from train data in the form IF X THEN Y (or a combination of nested conditions) that explains
More informationMachine Learning 3. week
Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which
More informationRule Generation using Decision Trees
Rule Generation using Decision Trees Dr. Rajni Jain 1. Introduction A DT is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given
More informationPointwise Exact Bootstrap Distributions of Cost Curves
Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline
More informationBagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7
Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More information