Performance Metrics for Machine Learning. Sargur N. Srihari

Similar documents
Performance Evaluation

Performance Evaluation

Performance evaluation of binary classifiers

PATTERN RECOGNITION AND MACHINE LEARNING

Machine Learning Srihari. Information Theory. Sargur N. Srihari

CSC 411: Lecture 03: Linear Classification

Machine Learning using Bayesian Approaches

Evaluation & Credibility Issues

Evaluation. Andrea Passerini Machine Learning. Evaluation

Least Squares Classification

Stephen Scott.

Evaluation requires to define performance measures to be optimized

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Classification, Linear Models, Naïve Bayes

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Applied Machine Learning Annalisa Marsico

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Bayesian Decision Theory

Lecture 3 Classification, Logistic Regression

Linear Classifiers. Michael Collins. January 18, 2012

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning for natural language processing

Classification: Analyzing Sentiment

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Information Retrieval Tutorial 6: Evaluation

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes

ANLP Lecture 10 Text Categorization with Naive Bayes

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Classification: Analyzing Sentiment

Statistical Machine Learning Lectures 4: Variational Bayes

Diagnostics. Gad Kimmel

The Perceptron Algorithm, Margins

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Variational Inference and Learning. Sargur N. Srihari

Boosting: Foundations and Algorithms. Rob Schapire

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Pattern Recognition 2018 Support Vector Machines

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany

CS395T Computational Statistics with Application to Bioinformatics

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING

Computational Statistics with Application to Bioinformatics. Unit 18: Support Vector Machines (SVMs)

Policy Gradient. U(θ) = E[ R(s t,a t );π θ ] = E[R(τ);π θ ] (1) 1 + e θ φ(s t) E[R(τ);π θ ] (3) = max. θ P(τ;θ)R(τ) (6) P(τ;θ) θ log P(τ;θ)R(τ) (9)

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Non-parametric Methods

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier

Naïve Bayes, Maxent and Neural Models

CS 188: Artificial Intelligence. Machine Learning

Knowledge Discovery and Data Mining 1 (VO) ( )

CS 188: Artificial Intelligence Spring Today

Supplementary Information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Generative MaxEnt Learning for Multiclass Classification

Distribution-Free Distribution Regression

Model Accuracy Measures

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

CS281B/Stat241B. Statistical Learning Theory. Lecture 1.

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

Active and Semi-supervised Kernel Classification

CS 188: Artificial Intelligence. Outline

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Kernel Methods and Support Vector Machines

Entropy. Expected Surprise

Outline lecture 4 2(26)

Variational Inference. Sargur Srihari

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

Information Retrieval

Evaluation Metrics. Jaime Arguello INLS 509: Information Retrieval March 25, Monday, March 25, 13

Variational Inference. Sargur Srihari

Machine Learning and Deep Learning! Vincent Lepetit!

OPTIMIZING SEARCH ENGINES USING CLICKTHROUGH DATA. Paper By: Thorsten Joachims (Cornell University) Presented By: Roy Levin (Technion)

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Machine learning - HT Maximum Likelihood

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Online Advertising is Big Business

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

CSC321 Lecture 4 The Perceptron Algorithm

Machine Learning Concepts in Chemoinformatics

Introduction to AI Learning Bayesian networks. Vibhav Gogate

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

LECTURE NOTE #3 PROF. ALAN YUILLE

Conceptual Similarity: Why, Where, How

Models, Data, Learning Problems

Machine detection of emotions: Feature Selection

Web Search and Text Mining. Learning from Preference Data

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Machine Learning Lecture 5

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation

Machine Learning for Signal Processing Bayes Classification and Regression

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Classification. Team Ravana. Team Members Cliffton Fernandes Nikhil Keswaney

Binary Classification / Perceptron

Errors, and What to Do. CS 188: Artificial Intelligence Fall What to Do About Errors. Later On. Some (Simplified) Biology

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

CS 188: Artificial Intelligence Fall 2011

Anomaly Detection. Jing Gao. SUNY Buffalo

Transcription:

Performance Metrics for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1

Topics 1. Performance Metrics 2. Default Baseline Models 3. Determining whether to gather more data 4. Selecting hyperparamaters 5. Debugging strategies 6. Example: multi-digit number recognition 2

Topics in Performance Metrics 1. Metrics for Regression: squared error, RMS 2. Metric for Density Estimation: KL divergence 3. Metrics for Classification: Accuracy 4. Metrics for Unbalanced data: Loss, Specificity/Sensitivity 5. Metrics for Retrieval: Precision and Recall 6. Combining Precision and Recall: F-Measure 7. Metrics for Image Segmentation: Dice Coefficient 3

Metrics for Regression Sum of squares of the errors between the predictions y(x n,w) for each test data point x n and target value t n N n=1 E(w) = {y(x n,w) t n } 2 where w are M parameter weights such as those associated with M basis functions RMS error E RMS = 2E(w)/ N 4

Metric for Density estimation K-L Divergence information required as a result of using q(x) in place of p(x) KL(p q) = p(x)lnq(x)dx = p(x)ln p(x) dx q(x) ( p(x)ln p(x)dx ) Not a symmetrical quantity: KL(p q) KL(q p) K-L divergence satisfies KL(p q)>0 with equality iff p(x)=q(x) 5

Metric for Classification For classification and transcription we often measure accuracy of the model Accuracy is proportion of examples for which the model produces the correct output Error rate: proportion of examples for which model produces an incorrect output Error rate is referred to as expected 0-1 loss 0 if correctly classified and 1 if it is not 6

Loss Function Sometimes it is more costly to make one kind of mistake than another Ex: email spam detection Incorrectly classifying legitimate message as spam Incorrectly allowing a spam message to appear in in box Assign higher cost to one type of error Cost of blocking legitimate message is higher than allowing spam messages 7

Summary of Loss Functions Given a prediction (p) and a label (y), a loss function measures the discrepancy between the algorithm's prediction and the desired output. Squared loss is the default https://github.com/johnlangford/vowpal_wabbit/wiki/loss-functions 8

Precision and Recall Definitions for binary classification Sample # Classifier Label=T Classifier Label=F Label Label=T TP 1 F F 2 F F 3 F F 4 F F 5 F T 6 T T FN Type 2 error Classifier 1 Label Label=F FP Type1 error TN Classification examples Classifier Label=T Classifier Label=F Label=T Label=F 1 (TP) 1 (FP) 0 (FN) 4 (TN) Accuracy = 5 / 6 = 83% Precision = 1 / 2 = 50% Recall = 1 /1 = 100% F-measure = 2 / 3 = 66% TP +TN Accuracy = TP +TN + FP + FN TP Precision = TP + FP Recall = TP TP + FN 2 F-measure == 1 P + 1 = 2PR P + R R Sample # Label 1 F F 2 F F 3 F F 4 F F 5 F F 6 T F Classifier 2 is dumb: always outputs F. Yet has same accuracy as Classifier 1 Classifier 2 Label Classifier Label=T Classifier Label=F Label=T Accuracy = 83% Precision = 0 / 0 =? Recall = 0 /1 = 0% F-measure =? Precision and Recall are useful when the true class is rare, e.g., rare disease. Same holds true in information retrieval when only a few of a large no. of documents are relevant Label=F 0 (TP) 0 (FP) 1 (FN) 5 (TN)

Precision Deep Learning Precision-Recall in IR Precision-Recall are evaluated w.r.t. a set of queries Precision-Recall Curve Thresh method: threshold t on similarity measure Rank Method: no of top choices presented Typical inverse relationship % Precision = TP/ TP+FP Recall = TP/TP+FN Objects returned for query Q TP FP Ideal Thresh method Ideal Rank method FN TN Recall % Relevant to Q Irrelevant to Q Orange better than blue curve Database

Text to Image search Experimental settings: 150 x 100 = 15,000 word images 10 different queries Each query has 100 relevant word images When half the relevant words are retrieved system has 80% precision

F = Combined Measures of Precision- 2 1 P + 1 R = 2PR P + R Recall Harmonic mean of precision and recall High value when both P and R are high E = u + P 1 1 u R PR = 1 (1 u) P + ur 1 u = measure of relative importance of P and R u = 1/( v 2 + 1) The coefficient u has range [0,1] and can be equivalently written as E ( v 1 v 2 = 2 + 1) PR P + R E-measure reduces to F-measure when precision and recall are equally weighted, i.e. v=1 or u=0.5 F 2 ( v = 1 E = 2 v + 1) PR P + R = 2PR P + R

Example of Precision/Recall curve and F-measu Best F-measure value is obtained when recall = 67% and precision = 50% Arabic word spotting

Metric for Image Segmentation Dice Coefficient X = ROI output by model, a mask Y = ROI produced by human expert Metric is (twice) the ratio of the intersection over the sum of areas. It is 0 for disjoint areas, and 1 for perfect agreement. E.g., model performance is written as 0.82 (0.23), where the parentheses contain the standard deviation. 14