Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation
|
|
- Phyllis Sutton
- 5 years ago
- Views:
Transcription
1 Data Privacy in Biomedicine Lecture 11b: Performance Measures for System Evaluation Bradley Malin, PhD Professor of Biomedical Informatics, Biostatistics, & Computer Science Vanderbilt University February 14, 2018
2 Classification Imagine you have a dataset of tokens D You believe they can be classified into two different classes ID: Tokens that correspond to PHI Non-ID: Tokens that correspond to non-phi 2018 Bradley Malin 2
3 So Try to Classify You have written several competing methods for classification Now you want to know which one is better How can you do this in a quantitative manner? 2018 Bradley Malin 3
4 Notation and Models Dataset of Tokens D d 1 d 2 d 3 d Bradley Malin 4
5 Notation and Models Known Underlying Truth T = {t 1,, t n } d 1 d 2 d 3 d 4 t 1 = PHI t 2 = not PHI t 3 = PHI t 4 = not PHI 2018 Bradley Malin 5
6 Algorithm A Predictions Predictions A = {a 1,, a n } d 1 d 2 d 3 d 4 t 1 = PHI t 2 = not PHI t 3 = PHI t 4 = not PHI a 1 = PHI a 2 = not PHI a 3 = PHI a 4 = PHI 2018 Bradley Malin 6
7 Algorithm A Predictions Correct Predictions d 1 d 2 d 3 d 4 t 1 = PHI t 2 = not PHI t 3 = PHI t 4 = not PHI a 1 = PHI a 2 = not PHI a 3 = PHI a 4 = PHI 2018 Bradley Malin 7
8 Algorithm A Predictions Incorrect Predictions d 1 d 2 d 3 d 4 t 1 = PHI t 2 = not PHI t 3 = PHI t 4 = not PHI a 1 = PHI a 2 = not PHI a 3 = PHI a 4 = PHI 2018 Bradley Malin 8
9 Algorithm B Predictions Predictions B = {b 1,, b n } d 1 d 2 d 3 d 4 t 1 = PHI t 2 = not PHI t 3 = PHI t 4 = not PHI b 1 = not PHI b 2 = not PHI b 3 = PHI b 4 = PHI 2018 Bradley Malin 9
10 Algorithm B Predictions Correct Predictions d 1 d 2 d 3 d 4 t 1 = PHI t 2 = not PHI t 3 = PHI t 4 = not PHI b 1 = not PHI b 2 = not PHI b 3 = PHI b 4 = PHI 2018 Bradley Malin 10
11 Algorithm B Predictions Incorrect Predictions d 1 d 2 d 3 d 4 t 1 = PHI t 2 = not PHI t 3 = PHI t 4 = not PHI b 1 = not PHI b 2 = not PHI b 3 = PHI b 4 = PHI 2018 Bradley Malin 11
12 Enter the Contingency Table MODEL PREDICTED It s NOT PHI It s PHI GOLD STANDARD TRUTH Was NOT PHI A B Was PHI C D 2018 Bradley Malin 12
13 Contingency Terms MODEL PREDICTED NO EVENT EVENT GOLD STANDARD TRUTH NO EVENT EVENT TRUE NEGATIVE C B TRUE POSITIVE 2018 Bradley Malin 13
14 Some More Terms MODEL PREDICTED NO EVENT EVENT GOLD STANDARD TRUTH NO EVENT EVENT A FALSE NEGATIVE (Type 2 Error) FALSE POSITIVE (Type 1 Error) D 2018 Bradley Malin 14
15 Accuracy What does this mean? What is the difference between accuracy and an accurate prediction? Contingency Table Interpretation (True Positives) + (True Negatives) (True Positives) + (True Negatives) + (False Positives) + (False Negatives) Is this a good measure? (Why or Why Not?) 2018 Bradley Malin 15
16 Algorithm Comparison Algorithm A Algorithm B NOT PERSON PERSON NOT PERSON PERSON TRUTH NOT PERSON 1 1 PERSON 0 2 TRUTH NOT PERSON 1 1 PERSON 1 1 Accuracy Algorithm A 3 / 4 = 0.75 Accuracy Algorithm B 2 / 4 = Bradley Malin 16
17 Note on Discrete Classes TRADITIONALLY Show contingency table when reporting predictions of model. BUT probabilistic models do not provide discrete calculations of the matrix cells!!! IN OTHER WORDS An algorithm does not necessarily report the number of documents that were correctly predicted INSTEAD report probability the output will be certain variable (e.g. PHI or not PHI ) 2018 Bradley Malin 17
18 What to Do if Classification is Probabilistic? Imagine you have 2 different probabilistic classification models e.g. Algorithm A vs. Algorithm B How do you know which one is better? How do you communicate your belief? Can you provide quantitative evidence beyond a gut feeling and subjective interpretation? 2018 Bradley Malin 18
19 Frequency Which Score Should Be The Threshold? ?????? NOT Person NOT PHI PHI Person Score 2018 Bradley Malin 19
20 Consider Precision-Recall First, order your documents by score for positive class E.g. PHI scores from Algorithm A (higher the score, higher the confidence) d 2 d 1 d 3 d 4 t 2 = not PHI PHI score 0.2 t 1 = PHI t 3 = PHI t 4 = not PHI Bradley Malin 20
21 Recall Now, choose a threshold score and make classification Ex: Threshold = 0.4 Classify all as NOT PHI Classify all as PHI d 2 d 1 d 3 d 4 t 2 = not PHI PHI score 0.2 t 1 = PHI t 3 = PHI t 4 = not PHI Bradley Malin 21
22 Recall Recall is the number of documents you wanted that you classified as PHI With Threshold at 0.4, Recall = 1.0 Classify all as NOT PHI Classify all as PHI d 2 d 1 d 3 d 4 t 2 = not PHI PHI score 0.2 t 1 = PHI t 3 = PHI t 4 = not PHI Bradley Malin 22
23 Precision Precision is the number of documents you recalled that were labeled correctly as PHI With Threshold at 0.4, Precision = 0.67 Classify all as NOT PHI Classify all as PHI d 2 d 1 d 3 d 4 t 2 = not PHI PHI score 0.2 t 1 = PHI t 3 = PHI t 4 = not PHI Bradley Malin 23
24 Precision Recall ala Venn Set of Documents D The Set of Relevent Documents in the D (i.e. PHI class) The Set of Documents classified as Relevent (i.e. PHI ) 2018 Bradley Malin 24
25 Precision Recall ala Venn RECALL Z / (X + Z) X Z Y PRECISION Z / (Z + Y) The Set of Relevent Documents in the D (i.e. PHI class) The Set of Documents classified as Relevent (i.e. PHI ) 2018 Bradley Malin 25
26 Precision Recall Curve Previous example showed Recall and Precision for single threshold Now calculate scores at thresholds across the range of scores Plot the resulting scores as <recall, precision> coordinate points Usually in range [0,1] Standard 11 point curve, i.e. 11 points plotted 2018 Bradley Malin 26
27 P-R Curve Example 4 documents not enough for P-R curve Imagine you had 200 documents (100 PHI and 100 NOT PHI ) This graph is P-R curve for Algorithm A 1 Precision Recall 2018 Bradley Malin 27
28 P-R Curve Example To compare algorithms, consider plotting both P-R curves in the same graph Use critical points or thresholds to determine which algorithm is better in particular scenarios From a general perspective, the area under the curve, or AUC, provides a measure of how good a classification method is Bradley Malin 28
29 Comparative Performance 1 Precision Algorithm A Algorithm B Recall 2018 Bradley Malin 29
30 ROC Curves Receiver operator characteristic Summarize & present performance of any binary classification model Models ability to distinguish between false & true positives 2018 Bradley Malin 30
31 Beyond Precision Recall: ROC Originated from signal detection theory Binary signal corrupted by Guassian noise What is the optimal threshold (i.e., operating point)? Dependence on 3 factors Signal Strength Noise Variance Personal tolerance in Hit / False Alarm Rate 2018 Bradley Malin 31
32 Also Uses Multiple Contingency Tables Sample contingency tables from range of threshold/probability. TRUE POSITIVE RATE (also called SENSITIVITY) True Positives (True Positives) + (False Negatives) FALSE POSITIVE RATE (also called 1 - SPECIFICITY) False Positives (False Positives) + (True Negatives) Plot Sensitivity vs. (1 Specificity) for sampling and you are done 2018 Bradley Malin 32
33 Data-Centric Example TRUTH LOGISTIC NEURAL Bradley Malin 33
34 ROC Rates LOGISTIC REGRESSION NEURAL NETWORK THRESHOLD TP-Rate FP-Rate TP-Rate FP-Rate Bradley Malin 34
35 ROC Point Plot model1 model2 1 sensitivity model1 model specificity 1 model1 model2 LOGISTIC ivity NEURAL 2018 Bradley Malin 35
36 Sidebar: Use More Samples Sensitivity Specificitiy sensitivity specificity RM RM_AGE41 Series1 Linear (RM) Deviance Model RM+Age41 (These are plots from a much larger dataset) 2018 Bradley Malin 36
37 ROC Quantification Area Under ROC Curve Use quadrature to calculate the area e.g. trapz (trapezoidal rule) function in Matlab will work most programs have a function you can call (python: roc_curve, R: roc.area) AREA UNDER ROC CURVE LOGISTIC NEURAL Example Appears Neural Network model is better 2018 Bradley Malin 37
38 Theory: Model Optimality Classifiers on convex hull are always optimal e.g., Net & Tree Neural Net Decision Tree Naïve Bayes Classifiers below convex hull are always suboptimal e.g., Naïve Bayes 2018 Bradley Malin 38
39 Building Better Classifiers Classifiers on convex hull can be combined to form a strictly dominant hybrid classifier Neural Net Decision Tree ordered sequence of classifiers can be converted into ranker 2018 Bradley Malin 39
40 Some Statistical Insight Curve Area: Take random non-phi from records score of X Take random PHI from records score of Y Area estimate of P [Y > X] Slope of curve is equal to likelihood: P (score Signal) P (score Noise) ROC graph captures all information in conting. table False negative & true negative rates are complements of true positive & false positive rates, resp Bradley Malin 40
41 Can Always Quantify Best Operating Point When misclassification costs are equal, best operating point is 45 tangent to curve closest to (0,1) coord. Verify this mathematically (economic interpretation) Sensitivity Specificitiy RM RM_AGE41 Series1 Linear (RM) Why? 2018 Bradley Malin 41
42 Quick Question Are ROC curves always appropriate? Subjective operating points? Must weight the tradeoffs between false positives and false negatives ROC curve plot is independent of the class distribution or error costs This leads into utility theory (not touching this today) 2018 Bradley Malin 42
43 Much Much More than ROC You should also look up and learn about: Confidence intervals Iso-accuracy lines Skew distributions and why the 45 line isn t always best Convexity vs. non-convexity vs. concavity Mann-Whitney-Wilcoxon sum of ranks Gini coefficient Calibrated thresholds Averaging ROC curves Cost Curves 2018 Bradley Malin 43
44 Some References Drummond C and Holte R. What ROC curves can and can t do (and cost curves can). In Proceedings of the Workshop on ROC Analysis in AI; in conjunction with the European Conference on AI. Valencia, Spain Lasko T, Bhagwat JG, Zou KH, Ohno-Machado L. The use of receiver operator characteristic curves in biomedical informatics. Journal of Biomedical Informatics. 2005; 38(5): McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Medical Decision Making. 1984; 4: Provost F and Fawcett T. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15 th International Conference on Machine Learning. Madison, Wisconsin. 1998: Swets J. Measuring the accuracy of diagnostic systems. Science. 1988; 240(4857): (based on his 1967 book Information Retrieval Systems) 2018 Bradley Malin 44
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationClassifier performance evaluation
Classifier performance evaluation Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánu 1580/3, Czech Republic
More informationVUS and HUM Represented with Mann-Whitney Statistic
Communications for Statistical Applications and Methods 05, Vol., No. 3, 3 3 DOI: http://dx.doi.org/0.535/csam.05..3.3 Print ISSN 87-7843 / Online ISSN 383-4757 VUS and HUM Represented with Mann-Whitney
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationPerformance evaluation of binary classifiers
Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in
More informationLecture 3 Classification, Logistic Regression
Lecture 3 Classification, Logistic Regression Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se F. Lindsten Summary
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationIntroduction to Signal Detection and Classification. Phani Chavali
Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)
More informationPointwise Exact Bootstrap Distributions of Cost Curves
Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline
More informationIntroduction to Statistical Inference
Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationOptimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005
IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers using ROC Analysis Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / pie@zurich.ibm.com ICML 2005 August 9, 2005 To classify, or not to classify:
More informationDiagnostics. Gad Kimmel
Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationCSC 411: Lecture 03: Linear Classification
CSC 411: Lecture 03: Linear Classification Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 03-Classification 1 / 24 Examples of Problems What
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationLecture 2. Judging the Performance of Classifiers. Nitin R. Patel
Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationLearning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht
Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Computer Science Department University of Pittsburgh Outline Introduction Learning with
More information9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients
What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate
More informationHow to evaluate credit scorecards - and why using the Gini coefficient has cost you money
How to evaluate credit scorecards - and why using the Gini coefficient has cost you money David J. Hand Imperial College London Quantitative Financial Risk Management Centre August 2009 QFRMC - Imperial
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationLDA, QDA, Naive Bayes
LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology Detection theory
More informationThreshold Choice Methods: the Missing Link
Threshold Choice Methods: the Missing Link arxiv:1112.264v2 [cs.ai] 28 Jan 212 José Hernández-Orallo Departament de Sistemes Informàtics i Computació Universitat Politècnica de València, Spain Peter Flach
More informationA Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems
Machine Learning, 45, 171 186, 001 c 001 Kluwer Academic Publishers. Manufactured in The Netherlands. A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationSmart Home Health Analytics Information Systems University of Maryland Baltimore County
Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationPerformance Evaluation
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Example:
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationGeneralized Linear Models
Generalized Linear Models Lecture 7. Models with binary response II GLM (Spring, 2018) Lecture 7 1 / 13 Existence of estimates Lemma (Claudia Czado, München, 2004) The log-likelihood ln L(β) in logistic
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationBANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1
BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationConfidence Intervals for the Area under the ROC Curve
Confidence Intervals for the Area under the ROC Curve Corinna Cortes Google Research 1440 Broadway New York, NY 10018 corinna@google.com Mehryar Mohri Courant Institute, NYU 719 Broadway New York, NY 10003
More informationPresent Practice, issues and headaches.
1 Present Practice, issues and headaches. Classification is Data Mining area par excellence. Will focus on binary targets of events/non-events. Research and applications in: clinical data analysis (disease
More informationCptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1
CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationMaximization of AUC and Buffered AUC in Binary Classification
Maximization of AUC and Buffered AUC in Binary Classification Matthew Norton,Stan Uryasev March 2016 RESEARCH REPORT 2015-2 Risk Management and Financial Engineering Lab Department of Industrial and Systems
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationLecture #11: Classification & Logistic Regression
Lecture #11: Classification & Logistic Regression CS 109A, STAT 121A, AC 209A: Data Science Weiwei Pan, Pavlos Protopapas, Kevin Rader Fall 2016 Harvard University 1 Announcements Midterm: will be graded
More informationAUC Maximizing Support Vector Learning
Maximizing Support Vector Learning Ulf Brefeld brefeld@informatik.hu-berlin.de Tobias Scheffer scheffer@informatik.hu-berlin.de Humboldt-Universität zu Berlin, Department of Computer Science, Unter den
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationBoosting the Area Under the ROC Curve
Boosting the Area Under the ROC Curve Philip M. Long plong@google.com Rocco A. Servedio rocco@cs.columbia.edu Abstract We show that any weak ranker that can achieve an area under the ROC curve slightly
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationIntroduction to Supervised Learning. Performance Evaluation
Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation
More informationAssignment 1: Probabilistic Reasoning, Maximum Likelihood, Classification
Assignment 1: Probabilistic Reasoning, Maximum Likelihood, Classification For due date see https://courses.cs.sfu.ca This assignment is to be done individually. Important Note: The university policy on
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationMachine Learning (CSE 446): Multi-Class Classification; Kernel Methods
Machine Learning (CSE 446): Multi-Class Classification; Kernel Methods Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 12 Announcements HW3 due date as posted. make sure
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationA.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace
A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical
More informationCHAPTER-17. Decision Tree Induction
CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes
More informationMachine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.
10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationShort Note: Naive Bayes Classifiers and Permanence of Ratios
Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationParameter Estimation. Industrial AI Lab.
Parameter Estimation Industrial AI Lab. Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2 Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed
More informationIndex of Balanced Accuracy: A Performance Measure for Skewed Class Distributions
Index of Balanced Accuracy: A Performance Measure for Skewed Class Distributions V. García 1,2, R.A. Mollineda 2, and J.S. Sánchez 2 1 Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca Av.
More informationMachine Learning Concepts in Chemoinformatics
Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics
More informationConfusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:
Confusion matrix classifier-determined positive label classifier-determined negative label true positive a b label true negative c d label Accuracy = (a+d)/(a+b+c+d) a = true positives b = false negatives
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationLecture 9: Bayesian Learning
Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal
More informationClassification and Pattern Recognition
Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 06 - Regression & Decision Trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationDay 5: Generative models, structured classification
Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression
More informationPerformance Evaluation
Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive
More informationA Comparison of Different ROC Measures for Ordinal Regression
A Comparison of Different ROC Measures for Ordinal Regression Willem Waegeman Willem.Waegeman@UGent.be Department of Electrical Energy, Systems and Automation, Ghent University, Technologiepark 913, B-905
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the
More informationLecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,
Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More information