Performance Evaluation

Similar documents
Evaluation & Credibility Issues

Performance evaluation of binary classifiers

Performance Evaluation

Anomaly Detection. Jing Gao. SUNY Buffalo

Machine Learning Concepts in Chemoinformatics

Performance Metrics for Machine Learning. Sargur N. Srihari

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Model Accuracy Measures

Evaluation Metrics for Intrusion Detection Systems - A Study

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

Bayesian Decision Theory

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining and Analysis: Fundamental Concepts and Algorithms

Evaluation. Andrea Passerini Machine Learning. Evaluation

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

Evaluation requires to define performance measures to be optimized

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Linear Classifiers as Pattern Detectors

Machine Learning Linear Classification. Prof. Matteo Matteucci

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)

Performance. Learning Classifiers. Instances x i in dataset D mapped to feature space:

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Multiple regression: Categorical dependent variables

Application of Jaro-Winkler String Comparator in Enhancing Veterans Administrative Records

Least Squares Classification

Data Mining algorithms

Information Retrieval Tutorial 6: Evaluation

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Linear Classifiers as Pattern Detectors

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Performance Evaluation and Comparison

Large-scale Data Linkage from Multiple Sources: Methodology and Research Challenges

Hypothesis Evaluation

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Pointwise Exact Bootstrap Distributions of Cost Curves

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Introduction to Supervised Learning. Performance Evaluation

Stephen Scott.

Probability and Statistics. Terms and concepts

Supplementary Information

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Lecture 3 Classification, Logistic Regression

Given a feature in I 1, how to find the best match in I 2?

Classifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University

Diagnostics. Gad Kimmel

Lec 12 Review of Part I: (Hand-crafted) Features and Classifiers in Image Classification

Meta-Analysis for Diagnostic Test Data: a Bayesian Approach

Applied Machine Learning Annalisa Marsico

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier

Chapter DM:II (continued)

CSC314 / CSC763 Introduction to Machine Learning

Computational Statistics with Application to Bioinformatics. Unit 18: Support Vector Machines (SVMs)

CS395T Computational Statistics with Application to Bioinformatics

An optimized energy potential can predict SH2 domainpeptide

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Bayesian Decision Theory

IMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback

Classification Part 4. Model Evaluation

Data Analytics for Social Science

Knowledge Discovery and Data Mining 1 (VO) ( )

Classifier performance evaluation

Evaluation of Intrusion Detection using a Novel Composite Metric

Generalized Linear Models

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Performance Evaluation and Hypothesis Testing

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg

Applied Multivariate and Longitudinal Data Analysis

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015

Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers

Stats notes Chapter 5 of Data Mining From Witten and Frank

Dynamic Clustering-Based Estimation of Missing Values in Mixed Type Data

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Performance Evaluation

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Accuracy Measures for the Comparison of Classifiers

Online Advertising is Big Business

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Precision-Recall-Gain Curves: PR Analysis Done Right

Present Practice, issues and headaches.

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

NON-TRIVIAL APPLICATIONS OF BOOSTING:

CSC 411: Lecture 03: Linear Classification

Statistics for classification

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Machine Learning for natural language processing

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Tolerating Broken Robots

Detection theory. H 0 : x[n] = w[n]

Data Mining and Knowledge Discovery: Practice Notes

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Transcription:

Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive Rate (TPR): o It is the proportion of positive cases that were correctly identified, as calculated using the equation: Recall = A A + B Accuracy (AC): o AC: is the proportion of the total number of predictions that were correct. o It is determined using the equation: Accuracy = A + D A + B + C + D o Error rate (misclassification rate)= 1 AC 1

The false positive rate (FPR) is the proportion of negatives cases that were incorrectly classified as positive, as calculated using the equation: C FPR = C + D The true negative rate (TNR) or Specificity: o It is defined as the proportion of negatives cases that were classified correctly, as calculated using the equation: TNR = C D + D The false negative rate (FNR): o It is the proportion of positives cases that were incorrectly classified as negative, as calculated using the equation: FNR = A B + B Precision: o P is the proportion of the predicted positive cases that were correct, as calculated using the equation: Precision = A A + C 2

F-measure: o The F-Measure computes some average of the information retrieval precision and recall metrics. o Why F-measure? An arithmetic mean does not capture the fact that a (50%, 50%) system is often considered better than an (80%, 20%) system o F-measure is computed using the harmonic mean: Given n points, x 1, x 2,, x n, the harmonic mean is: n 1 1 1 = n H i= 1 xi o So, the harmonic mean of Precision and Recall: 1 1 1 1 P R = ( + ) = + F 2 R P 2PR o The computation of F-measure: Each cluster is considered as if it were the result of a query and each class as if it were the desired set of documents for a query We then calculate the recall and precision of that cluster for each given class. The F-measure of cluster j and class i is defined as follows: 3

2 * Recall(i, j) * Precision(i, j) F ij = Precision (i, j) + Recall(i, j) The F-measure of a given clustering algorithm is then computed as follows: n i F measure = max({fij}) n Where n is the number of documents in the collection and n i is the number of documents in cluster i. Note that the computed values are between 0 and 1 and a larger F-Measure value indicates a higher classification/clustering quality. 4

Receiver Operating Characteristic (ROC) Curve: o It is a graphical approach for displaying the tradeoff between true positive rate (TPR) and false positive rate (FPR) of a classifier: TPR = positives correctly classified/total positives FPR = negatives incorrectly classified/total negatives o TPR is plotted along the y axis o FPR is plotted along the x axis Performance of each classifier represented as a point on the ROC curve 5

Important Points: (TP,FP) o (0,0): declare everything to be negative class o (1,1): declare everything to be positive class o (1,0): ideal Diagonal line: o Random guessing Area Under Curve (AUC): o It provides which model is better on the average. o Ideal Model: area = 1 6

o If the model is simply performs random guessing, then its area under the curve would equal 0.5. o A model that is better than another would have a larger area. Example: No model consistently outperform the other o M1 is better for small FPR o M2 is better for large FPR 7

Example: (Kumar et al.) o Compute P(+ A) which is a numeric value that represents the degree to which an instance is a member of a class. In other words, it is the probability or ranking of the predicted class of each data point. o P(+ A) is the posterior probability as defined in Bayesian classifier. Instance P(+ A) True Class 1 0.95 + 2 0.93 + 3 0.87-4 0.85-5 0.85-6 0.85 + 7 0.76-8 0.53 + 9 0.43-10 0.25 + Use classifier that produces posterior probability for each test instance P(+ A) Sort the instances according to P(+ A) in decreasing order Apply threshold at each unique value of P(+ A) o Count the number of TP, FP, TN, FN at each threshold o TP rate, TPR = TP/(TP+FN) o FP rate, FPR = FP/(FP + TN) 8

Class + - + - - - + - + + 0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00 TP 5 4 4 3 3 3 3 2 2 1 0 FP 5 5 4 4 3 2 1 1 0 0 0 TN 0 0 1 1 2 3 4 4 5 5 5 FN 0 1 1 2 2 2 2 3 3 4 5 TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 0 FPR 1 1 0.8 0.8 0.6 0.4 0.2 0.2 0 0 0 9

Clustering Only Intra-Cluster Similarity (ICS): o It looks at the similarity of all the data points in a cluster to their cluster centroid. o It is calculated as arithmetic mean of all of the data point-centroid similarities. o Given a set of k clusters, ICS is defined as follows: ICS 1 = k k 1 C i= 1 i d C j sim( d i, c ) j i Where c i is the centroid of cluster C i. o A good clustering algorithm maximizes intracluster similarity. Centroid Similarity (CS): o It computes the similarity between the centroids of all clusters. o Given a set of k clusters, CS is defined as follows: CS = k k i= 1 j= 1 sim( c i, c j ) 10