INTRODUCTION TO MACHINE LEARNING FOR MEDICINE

Fall 2017 INTRODUCTION TO MACHINE LEARNING FOR MEDICINE Carla E. Brdley Prfessr & Dean Cllege f Cmputer and Infrmatin Science Nrtheastern University

WHAT IS MACHINE LEARNING/DATA MINING? Figure is frm Fayyad, Piatetsky-Shapir, Smyth, and Uthurusamy. Advances in Knwledge Discvery and Data Mining, 1996; image fund at: www2.cs.uregina.ca/~dbd/cs831/ntes/kdd/kdd.gif

Fall 2017 SUPERVISED LEARNING NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 3

SUPERVISED LEARNING Given: eample < 1, 2, nn, ff( 1, 2, nn ) > fr sme unknwn functin ff Find: A gd apprimatin t ff Gal: Apply ff t previusly unseen data Eample Applicatins: Regressin: ff is a cntinuus variable (e.g., predicting EDSS fr MS patients) Classificatin: ff is a discrete variable (e.g., predicting whether a patient has unilateral r bilateral Meniere s)

CLASSIFICATION EXAMPLE: CITATION SCREENING FOR SYSTEMATIC REVIEWS Systematic review: an ehaustive assessment f all the published medical evidence regarding a precise clinical questin e.g., Is aspirin better than leeches in inducing mre than 50% relief in patients with tensin headaches? Must find all relevant studies

TYPICAL WORKFLOW 26M PubMed SEARCH 10,000 Ptentially eligible SCREEN 500 Relevant

CITATION SCREENING Dctrs read these. They d rather be ding smething else.

GENERATING TRAINING DATA FOR SUPERVISED LEARNING Epert labels randm subset Induce (train) a classifier C ver Apply C t unlabeled eamples -

A DETOUR INTO TEXT ENCODING Classificatin algrithms perate n vectrs Feature space: an n-dimensinal representatin A bag-f-wrds eample: S 1 = Bstn drivers are frequently aggressive S 2 = The Bstn Red S frequently hit line drives

TEXT ENCODING: STOP WORDS S 1 = Bstn drivers are frequently aggressive S 2 = The Bstn Red S frequently hit line drives

TEXT ENCODING: LOWERCASING S 1 = bstn drivers are frequently aggressive S 2 = The bstn red s frequently hit line drives

TEXT ENCODING: STEMMING S 1 = bstn drive are frequent aggressive S 2 = The bstn red s frequent hit line drive

TEXT ENCODING: VOILA hit red s line bstn frequent drive aggressive S 1 = 0 0 0 0 1 1 1 1 S 2 = 1 1 1 1 1 1 1 0 A new sentence, S 3, cmes alng: I hate the red s. Which sentence is it mst similar t? S 3 = 0 1 1 0 0 0 0 0

SUPPORT VECTOR MACHINES: A HAND-WAVING EXPLANATION margin supprt vectrs Minimize: 11 22 ww. ww

SUPPORT VECTOR MACHINES: THE NON-LINEARLY SEPARABLE CASE ε 6 ε 2 ε 11 Minimize: 11 22 ww. ww + CC εε RR kk=11 kk

SUPERVISED LEARNING Epert labels randm subset Induce (train) a classifier C ver Apply C t unlabeled eamples -

SUPERVISED LEARNING What if we are clever in what eamples we label? Induce (train) a classifier C ver Apply C t unlabeled eamples -

ACTIVE LEARNING Key idea: have the epert label eamples mst likely t be helpful in inducing a classifier Need fewer labels fr gd classificatin perfrmance = less time/wrk/mney Need a scring functin ff: epected value f labeling Mst ppular strategy: uncertainty sampling

UNCERTAINTY SAMPLING (W/ SVMS) Which eamples shuld we label net?

UNCERTAINTY SAMPLING (W/ SVMS) Uncertainty sampling: label the eamples nearest the separating plane

UNCERTAINTY SAMPLING (W/ SVMS)

WHY OFF-THE-SHELF AL DOESN T WORK FOR CITATION SCREENING Imbalanced data; relevant class is very small (~5%), but sensitivity t this class is paramunt Randm Active (uncertainty) Recall Accuracy

WHY MIGHT UNCERTAINTY SAMPLING FAIL? Randm sampling Uncertainty sampling Hasty generalizatin: uncertainty sampling may miss clusters Pre-clustering desn t help unreliable in high-dimensins small clusters f interest

GUIDING AL WITH DOMAIN KNOWLEDGE Labeled terms: terms r n-grams whse presence is indicative f class membership tensin headache, leeches, aspirin migraine headache, mice Is aspirin better than leeches in inducing mre than 50% relief in patients with tensin headaches?

CO-TESTING FRAMEWORK (MUSLEA ET AL., 2000) Mdel 1 Mdel 2 F 1 () F 2 () If mdel 1 disagrees with mdel 2 abut, then is a gd pint t label

LABELED TERMS + CO-TESTING Mdel 1: Standard BOW (linear kernel) SVM Mdel 2: Rati f #ps terms t #neg terms Query strategy: Find all dcuments abut which the mdels disagree Select fr labeling items f maimum disagreement

COPD: GENETIC ASSOCIATIONS WITH COPD

MOST IMPORTANT REQUIREMENT FOR MACHINE LEARNING TO WORK: THE DATA Are the features predictive f the class? Hw nisy is the data? (attribute nise vs. class nise) D yu have enugh (labeled) data? Are the training samples representative? NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 28

TRANSFER LEARNING A machine learning technique t imprve perfrmance leveraging n related knwledge A primary task n dataset TT An auiliary dataset TT aaaaaa TT and TT aaaaaa are usually related and have similar distributins Auiliary TT aaaaaa Primary TT

TRANSFER LEARNING EXAMPLES Predicting readmissin t hspitals Use data frm ther hspitals t predict fr yur hspital Predicting MS prgressin Cmbining data frm multiple physicians NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 30

Fall 2017 UNSUPERVISED LEARNING NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 31

CLUSTERING Given a set f data pints, each described by a set f attributes, find clusters such that: Inter-cluster similarity is maimized Intra-cluster similarity is minimized Requires the definitin f a similarity measure F1 F2

EXAMPLE: K-MEANS

EXAMPLE: CONSTRAINED K-MEANS

CHALLENGES IN CLUSTERING MEDICAL DATA Cnfunding factr: One r a set f features whse effect will lead t undesirable clustering slutin if nt remved Clustering clinical data: Physician subjectivity Age fr neurlgical test scring in MS

EXAMPLE: VESTIBULAR DISORDERS Balance Functin Age

CLUSTERING WITH K = 2 Balance Functin Age

PROPOSED SOLUTION Remve the impact f cnfunding factr F via cnstraint-based clustering: 1. Bin the data int hmgeneus grups w.r.t. F 2. Apply clustering t each grup and generate pair-wise instance cnstraints 3. Apply cnstraint-based clustering t entire data

STEP 1: BINNING (STRATIFICATION) Categrical F: Create ne bin per categry Eample: ne bin per physician fr MS data Numeric F: Create bins f: Unifrm ranges r unifrm bin sizes Dmain knwledge Mre sphisticated binning methds, such as nnparametric density estimatin, etc

STEP 1: BINNING Balance Functin 40 Age

STEP 2: CLUSTER IN EACH BIN AND GENERATE CONSTRAINTS In each bin: Apply clustering (e.g., EM ver a miture f Gaussians) Number f clusters can be specified by dmain knwledge r inferred using criteria such as BIC Generate must-nt-link cnstraints fr pairs f instances in different clusters

STEP 2: CLUSTER EACH BIN Balance Functin 40 Age

STEP 2: GENERATE CONSTRAINTS Balance Functin 40 Age

STEP 3: APPLY CONSTRAINT BASED CLUSTERING TO THE ENTIRE DATA Balance Functin Age

Fall 2017 ANOMALY DETECTION NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 52

ANOMALY DETECTION Given a set f data pints, each described by a set f attributes, pints that are far away frm mst f the ther pints als called utliers Requires the definitin f a similarity measure F1 F2

TYPES OF ANOMALY DETECTION Supervised Labelled nrmal and anmalus data Similar t rare (minrity) class mining Semi-supervised Labels available nly fr nrmal data Unsupervised N labelled data Assumptin: anmalies are rare cmpared t nrmal data

COMPLEXITIES OF ANOMALY DETECTION Where des the nrmal data cme frm? Feature selectin Metric Different parts f the space may have different densities NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 55

COMPLEXITIES OF ANOMALY DETECTION Which f P1, P2 and P3 are anmalies? Distance frm p 3 t nearest neighbr p 3 Distance frm p 2 t nearest neighbr p 2 p 1

ANOMALY DETECTION EXAMPLE: DETECTING CORTICAL LESIONS 50 millin affected by epilepsy wrldwide One-third remain refractry t treatment One f the mst cmmn causes f TRE: Fcal Crtical Dysplasia (FCD) Treatment: Surgical resectin f the abnrmal crtical tissue (aka lesin) Visual MRI Eam Lesin Identificatin & Tracing Inter-Cranial EEG Analysis Resective Surgery 70-80% f histlgically verified FCD cases have nrmal MRI Chances f being seizure free after surgery: MRI-Psitive: 66% MRI-Negative: 29%

MACHINE LEARNING CHALLENGES Input data Surfaces f FCD patients (MRI) Resected tissue (MRI-Negatives): histpathlgically verified Generus margins t ensure cmplete lesin remval Eact lcatin f the lesin is unknwn Labels Resectin znes fr MRI-negatives Lesin tracings by neurradilgists fr MRI-psitives False psitives in training data False negatives in training data frm lng untreated epilepsy, trauma, etc.

PROPOSED SOLUTION Hierarchical Cnditinal Randm Fields fr Outlier Detectin Discard piel-level labels and use nly image-level labels Redefine FCD lesin as: a crtical regin which is an utlier when cmpared t the same regin acrss a ppulatin f nrmal cntrls Hierarchical Cnditinal Randm Field

RESULTS Tested n fifteen MRI-negative patients with successful surgery High detectin rate (80%) fr MRInegative patients with higher average recall and precisin

MY LAST WORDS There are many, many different learning algrithms, but the key t success is in having the right training data. MLHC is a great cnference. NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 61