Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

Similar documents
Learning Medical Diagnosis Models from Multiple Experts

Graph-Based Anomaly Detection with Soft Harmonic Functions

Sampling Strategies to Evaluate the Performance of Unknown Predictors

Machine Learning Linear Classification. Prof. Matteo Matteucci

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Introduction to Support Vector Machines

Holdout and Cross-Validation Methods Overfitting Avoidance

Lecture 3 Classification, Logistic Regression

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machines

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Probabilistic Machine Learning. Industrial AI Lab.

Soft-margin SVM can address linearly separable problems with outliers

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

CPSC 340: Machine Learning and Data Mining

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

ECE521 week 3: 23/26 January 2017

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

CPSC 340: Machine Learning and Data Mining

Bayesian Decision Theory

ECE521 Lecture7. Logistic Regression

Support Vector Machine. Industrial AI Lab.

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Statistical Methods for SVM

Linear & nonlinear classifiers

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Feature Engineering, Model Evaluations

Support Vector Machine (SVM) and Kernel Methods

Stephen Scott.

Introduction to Machine Learning

Least Squares Regression

Review: Support vector machines. Machine learning techniques and image analysis

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

PATTERN RECOGNITION AND MACHINE LEARNING

CS145: INTRODUCTION TO DATA MINING

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Introduction to Signal Detection and Classification. Phani Chavali

Least Squares Regression

10-701/ Machine Learning - Midterm Exam, Fall 2010

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Discriminative Learning and Big Data

Concentration-based Delta Check for Laboratory Error Detection

CS 188: Artificial Intelligence. Outline

Non-linear Support Vector Machines

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Evaluation requires to define performance measures to be optimized

Least Squares Classification

Part of the slides are adapted from Ziko Kolter

ECS289: Scalable Machine Learning

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation

Conditional Anomaly Detection with Soft Harmonic Functions

CSC 411 Lecture 17: Support Vector Machine

6.873/HST.951 Medical Decision Support Spring 2004 Evaluation

Polyhedral Computation. Linear Classifiers & the SVM

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Lecture 7: Kernels for Classification and Regression

ESS2222. Lecture 4 Linear model

Active and Semi-supervised Kernel Classification

Applied Machine Learning Annalisa Marsico

Statistical Methods for Data Mining

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Classifier performance evaluation

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io

Discriminative Models

Machine Learning. Support Vector Machines. Manfred Huber

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines and Kernel Methods

Evaluation. Andrea Passerini Machine Learning. Evaluation

Time Series Classification

Large-Margin Thresholded Ensembles for Ordinal Regression

Advanced Introduction to Machine Learning CMU-10715

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:

1 [15 points] Frequent Itemsets Generation With Map-Reduce

Introduction to Machine Learning Midterm Exam

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines with a Reject Option

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Undirected Graphical Models

A Least Squares Formulation for Canonical Correlation Analysis

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

6.036 midterm review. Wednesday, March 18, 15

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Does Unlabeled Data Help?

CSCE 478/878 Lecture 6: Bayesian Learning

ECS289: Scalable Machine Learning

A Comparison of Different ROC Measures for Ordinal Regression

Pointwise Exact Bootstrap Distributions of Cost Curves

CSC314 / CSC763 Introduction to Machine Learning

MODULE -4 BAYEIAN LEARNING

Transcription:

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Computer Science Department University of Pittsburgh

Outline Introduction Learning with auxiliary information Framework Noise issue Modeling pairwise order constraints Combining class and auxiliary information Experimental evaluation Conclusion 2

Outline Introduction Learning with auxiliary information Framework Noise issue Modeling pairwise order constraints Combining class and auxiliary information Experimental evaluation Conclusion 3

Building a Classification Model Training Data (patient records, diagnoses) Learning New patient Classifier Disease or not disease? Typically: More training data => Better classifier 4

Data Labeling and Its Cost Training data Patient records Labs Medications Notes... Labeling Diagnoses (class labels) disease/ no disease Labeling requires human experts Time consuming and costly Small training data How to reduce the number of examples to label? Active learning: select only the most critical examples to label Can we obtain more information from selected examples? 5

Our Solution Idea: ask a human expert to provide, in addition to class labels, his/her certainty in the label decision and incorporate this information into the learning process Certainty can be represented in terms of Probability: e.g. probability of having disease p = 0.85 Qualitative ordinal category: e.g. strong, medium or weak belief in disease; or a discrete score from 0 to 5 6

Outline Introduction Learning with auxiliary information Framework Noise issue Modeling pairwise order constraints Combining class and auxiliary information Experimental evaluation Conclusion 7

Training with Class Label Information Patient record (labs, medications etc) x 1.. x N Binary class label (disease/no disease) y 1 =1/0.. y N =1/0 Learning Support Vector Machines Classifier 8

Training with Class Label Information Patient record (labs, medications etc) x 1.. x N Binary class label + Certainty score (disease/no disease) (certainty in disease) + y 1 =1/0 + p 1.. y N =1/0 + p N Learning Classifier 9

Learning with Auxiliary Information: Regression Patient record (labs, medications etc) x 1.. x N Certainty score (certainty in disease) p1.. p N Learning Regression ( f: X log p 1 p ) 10

Learning with Auxiliary Information: Noise Issue Human certainty estimates are often noisy Patient record p =? certainty score p may be inconsistent Regression relies on exact values of p Sensitive to noise 11

Learning with Auxiliary Information: Noise Issue LR: Logistic Regression with binary class labels LRPR: Logistic Regression with certainty labels No noise: LRPR clearly outperforms LR With noise: LRPR is not better than LR Solution? 12

Modeling pairwise orders Observation: Certainty scores let us order examples Idea: build a discriminant projection f(x) that respects this order Minimize the number of violated pairwise order constraints Modeling pairwise orders instead of relying on exact values of p => learning less sensitive to noise < f(x) 13

Learning with Class and Pairwise Order Constraints Modeling pairwise orders: adapt SVM Rank (Herbrich 2000) Combining class and certainty information Optimize: min w 1 2 wt w + C i,j:pi >p j ξ i,j N + B i=1 η i Penalty for violating class constraints Pairwise order constraints: i,j: p i > p j : w T (x i x j ) 1 - ξ i,j i j : ξ i,j 0 Penalty for violating pairwise orders constraints Class constraints: i : w T x i y i + b 1 - η i i : η i 0 Note: constants B and C regularize the trade-off between class and auxiliary certainty information 14

Outline Introduction Learning with auxiliary information Framework Noise issue Modeling pairwise order constraints Combining class and auxiliary information Experimental evaluation Conclusion 15

Models Trained on class labels Experimental Setup LR: standard logistic regression SVM: standard linear SVM Trained on certainty labels LRPR: Logistic Regression with lasso regularization Trained on both class and certainty labels Evaluation SVM-Combo: SVM with 2 hinge losses for class and pairwise order constraints Fixed test set Training examples were randomly sampled from train set Repeat training/testing process 30 times Average AUC (Area under ROC curve) and 95% confidence interval were recorded 16

Experimental Setup: UCI Data UCI data sets with continuous outputs Ailerons, Concrete, Kinematics, Puma32 Generated labels Certainty labels: by normalizing continuous outputs Binary labels: by setting a threshold on certainty labels Ratios of positive examples 10%, 25% and 50%

Experimental Setup: UCI Data (Cont d) Noises added to certainty labels 4 different levels: no noise, weak, moderate, strong noises, generated from Gaussian 0.05, 0.1, 0.2 * N(0,1) respectively. Average noise to signal ratios: Data Set Weak Noise Moderate Noise Strong Noise Ailerons 5.2 % 10.3 % 39.8 % Kinematics 10.6 % 20.8 % 38.9 % Puma32 10.3 % 20.2 % 39.3 % Concrete 15.2 % 29.6 % 55.1 % 18

Experimental Results: UCI Data 0.92 Data: concrete, %pos: 25, moderate noise 0.90 AUC 0.88 0.86 0.84 0.82 0.80 0.78 0.76 0.74 0 25 50 75 100 125 150 175 200 Number of training samples LR LRPR SVM SVM-Combo (Ours) Our method (SVM-Combo) consistently outperforms both regression and standard SVM 19

Experimental Results: UCI Data (Cont d) Our method (SVM-Combo) consistently outperforms both regression and standard SVM 20

Experiments: Unbalanced Data Challenge: in many applications data are often unbalanced (e.g. in medicine positive examples are usually rare) Does certainty information help? Auxiliary information shows more benefits with unbalanced data pos. examples = 50% pos. examples = 25% pos. examples = 10% 21

Experiments: HIT Data Heparin-induced thrombocytopenia (HIT): A life-threatening condition that may develop when patients are treated by heparin Data: Derived from PCP database (Hauskrecht et al. AMIA 2010) 199 patient instances labeled by an expert wrt HIT 50 features derived from time series of labs, medications and procedures 22

HIT Data Labeling: For each patient case we asked the expert 2 questions Do you agree with raising an alert on HIT or not? Yes/no How strongly do you agree? Scale 0 4 : strongly-disagree to strongly-agree Case review: We used an in-house EHR graphical interface to collect labels Average time to review a patient: 247 seconds Average time to enter labels: under 10 seconds The cost of collecting the auxiliary information is low 23

Experimental Results: HIT Data Our method (SVM-Combo) consistently outperforms both regression and standard SVM

Outline Introduction Learning with auxiliary information Framework Noise issue Modeling pairwise order constraints Combining class and auxiliary information Experimental evaluation Conclusion 25

Conclusion Auxiliary certainty information Helps to learn better classification models with smaller numbers of examples Especially useful when data are unbalanced Can be obtained with little additional cost Human subjective certainty assessments are noisy Proposed method is robust to noise Pairwise orders are more consistent than exact estimates 26

Thank you for your attention! Contact: Quang Nguyen, quang@cs.pitt.edu Acknowledgment: this research was supported by grants from the National Institute of Health 1R01LM010019 (M. Hauskrecht) 1R01GM088224 (M. Hauskrecht) 27