Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany

Similar documents
Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010

Evaluation & Credibility Issues

Performance evaluation of binary classifiers

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

Performance Evaluation

Model Accuracy Measures

Machine Learning Concepts in Chemoinformatics

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Performance Evaluation

Knowledge Discovery and Data Mining 1 (VO) ( )

Tolerating Broken Robots

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Linear Classifiers as Pattern Detectors

Multiple regression: Categorical dependent variables

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier

Probability and Statistics. Terms and concepts

Least Squares Classification

Evaluation requires to define performance measures to be optimized

Linear Classifiers as Pattern Detectors

Data Mining and Analysis: Fundamental Concepts and Algorithms

Evaluation. Andrea Passerini Machine Learning. Evaluation

Anomaly Detection. Jing Gao. SUNY Buffalo

Evaluation Metrics for Intrusion Detection Systems - A Study

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Data Mining algorithms

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)

Performance Evaluation

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Performance Metrics for Machine Learning. Sargur N. Srihari

CS395T Computational Statistics with Application to Bioinformatics

Pointwise Exact Bootstrap Distributions of Cost Curves

Computational Statistics with Application to Bioinformatics. Unit 18: Support Vector Machines (SVMs)

Data Analytics for Social Science

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Supplementary Information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Present Practice, issues and headaches.

Applied Machine Learning Annalisa Marsico

Hypothesis Evaluation

Directly and Efficiently Optimizing Prediction Error and AUC of Linear Classifiers

CSC314 / CSC763 Introduction to Machine Learning

Performance. Learning Classifiers. Instances x i in dataset D mapped to feature space:

Performance Evaluation and Comparison

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

Diagnostics. Gad Kimmel

Classifier performance evaluation

Generalized Linear Models

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Online Advertising is Big Business

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Classification objectives COMS 4771

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

Mathematical Programming for Multiple Kernel Learning

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

CSC 411: Lecture 03: Linear Classification

Assessing the quality of position-specific scoring matrices

Stephen Scott.

Evaluation of predicted regulatory elements

Bayesian Decision Theory

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation

Data Mining and Knowledge Discovery: Practice Notes

arxiv: v1 [stat.ml] 8 Feb 2014

Performance Evaluation and Hypothesis Testing

Efficient Target Activity Detection Based on Recurrent Neural Networks

Introduction to Supervised Learning. Performance Evaluation

Recent Advances in Flares Prediction and Validation

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

A New Unsupervised Event Detector for Non-Intrusive Load Monitoring

Classification. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 162

An Analysis of Reliable Classifiers through ROC Isometrics

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

Learning from Corrupted Binary Labels via Class-Probability Estimation

Machine Learning in Action

Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation

2p or not 2p: Tuppence-based SERS for the detection of illicit materials

Large-scale Data Linkage from Multiple Sources: Methodology and Research Challenges

Precision-Recall-Gain Curves: PR Analysis Done Right

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

Surrogate regret bounds for generalized classification performance metrics

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Ricco Rakotomalala.

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg

CC283 Intelligent Problem Solving 28/10/2013

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

Beyond Accuracy: Scalable Classification with Complex Metrics

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Machine Learning - Michaelmas Term 2016 Lectures 9, 10 : Support Vector Machines and Kernel Methods

Machine Learning for natural language processing

Quantum Minimum Distance Classifier

Lecture 3 Classification, Logistic Regression

Transcription:

Sören Sonnenburg Fraunhofer FIRST.IDA, Kekuléstr. 7, 2489 Berlin, Germany

Roadmap: Contingency Table Scores from the Contingency Table Curves from the Contingency Table Discussion Sören Sonnenburg

Contingency Table Contingency Table / Confusion Matrix Given sample x...x N X 2-class classifier f(x) {, +}, outputs f(x )...f(x n ) for x...x N true labeling (y...y N ) {,+} N partitions the N sample into outputs\ labeling y=+ y=- Σ f(x)=+ TP FP O + f(x)=- FN TN O Σ N + N N Sören Sonnenburg 2

Scores I o\ l y=+ y=- Σ f(x)=+ TP FP O + f(x)=- FN TN O Σ N + N N acc = TP+TN N - accurracy err = FP+FN N - error Sören Sonnenburg 3

Scores II o\ l y=+ y=- Σ f(x)=+ TP FP O + f(x)=- FN TN O Σ N + N N tpr = TP/N + = tnr = TN/N = fnr = FN/N + = fpr = FP/N = ppv = TP/O + = fdr = FP/O + = npv = TN/O = fdr = FN/O = TP T P +F N - sensitivity/recall TN T N +F P - specificity FN F N +T P - -sensitivity FP F P +T N - -specificity TP T P +F P - positive predictive value / precision FP F P +T P - false discovery rate TN T N +F N - negative predictive value FN FN+TN - negative false discovery rate Sören Sonnenburg 4

Scores III o\ l y=+ y=- Σ f(x)=+ TP FP O + f(x)=- FN TN O Σ N + N N ber = 2 ( FN FN+TP + ) FP FP+TN - balanced error WRacc = TP TP+FN FP F P +T N - weighted relative accuracy f = 2 TP 2 TP+FP+FN - F score (harmonic mean between precision/recall) cc = TP 2 FP FN - cross correlation (TP+FP)(TP+FN)(TN+FP)(TN+FN) Sören Sonnenburg 5

curves Receiver Operator Characteristic Curve ROC ROC true positive rate true positive rate 0. 0. 0.0 0 0.2 0.4 0.6 0.8 false positive rate 0.0 0.000 0.00 0.0 0. false positive rate use tpr/fpr obtained by varying bias independent of class skew monotonely ascending Sören Sonnenburg 6

curves Precision Recall Curve positive predictive value 0. PPV 0 0.2 0.4 0.6 0.8 true positive rate positive predictive value PPV 0. 0. true positive rate use ppv/tpr obtained by varying bias depends on class skew not monotone Sören Sonnenburg 7

curves ROC vs. PRC ROC PPV true positive rate 0. 0.0 0.000 0.00 0.0 0. false positive rate ROC (0 times as many negatives) positive predictive value 0. 0. true positive rate PRC (0 times as many negatives) true positive rate 0. 0.0 0.000 0.00 0.0 0. false positive rate positive predictive value 0. 0.0 0.0 0. true positive rate Sören Sonnenburg 8

Area under the Curves auroc remember: tpr = TP/N + = TP TP+FN and fpr = FP/N = FP FP+TN Properties considering output as ranking, it is the number of swappings such that output of positive examples output of negative examples divided by N + N equivalent to wilcoxon rank test Gini = 2 auroc independant of bias independant of class skew Sören Sonnenburg 9

Area under the Curves auprc remember: ppv = TP/O + = TP TP+FP and tpr = TP/N + = TP TP+FN Properties considering output as ranking,...??? meaning??? independant of bias dependant on class skew Sören Sonnenburg 0

Comparison accuracy, auroc, auprc 90 80 Classification Performance (in percent) 70 60 50 40 30 20 0 Accuracy Area under the ROC Area under the PRC 000 0000 00000 000000 0000000 Number of training examples Sören Sonnenburg

Extensions ROC N Score - area under the ROC curve up to the first N false-positives other??? Sören Sonnenburg 2

Summary When use what: balanced data accuracy/auroc/... unbalanced date BER/F/auPPV/CC other cases / other measures??? Sören Sonnenburg 3

auroc/auprc optimized training current classifiers minimize trainerror (+ some complexity) want to optimize auroc/auprc/... directly first approaches (also implemented in shogun) create pairs of examples (x I, x J ),I = {i y i = +}, J = {i y i = +}, N + Ṅ,e.g. O(N 2 ) examples in standard SVM learning unusuably slow recently TJ (ICML 2005) SVM multi : min w,ξ 0 2 w 2 + Cξ s.t. Y Y\Y : w T [Ψ(X, Y ) Ψ(X,Y )] (Y, Y ) ξ with n Ψ(X, Y ) = y i x i i= Sören Sonnenburg 4

AIM has been done for auroc, F, BER and is doable for any other measure based on scores in contingency table AIM: Do this for auprc/roc n! min w,ξ 0 2 w 2 + Cξ s.t. Y Y\Y : w T [Ψ(X, Y ) Ψ(X,Y )] (Y, Y ) ξ with n Ψ(X, Y ) = y i x i i= Anyone interested? Sören Sonnenburg 5

Outlook muliclass regression...... Sören Sonnenburg 6