INTRODUCTION TO MACHINE LEARNING FOR MEDICINE

Similar documents
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

Chapter 3: Cluster Analysis

Pattern Recognition 2014 Support Vector Machines

IAML: Support Vector Machines

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

Elements of Machine Intelligence - I

Part 3 Introduction to statistical classification techniques

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Support-Vector Machines

MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

Checking the resolved resonance region in EXFOR database

What is Statistical Learning?

Time, Synchronization, and Wireless Sensor Networks

Hypothesis Tests for One Population Mean

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

Simple Linear Regression (single variable)

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

You need to be able to define the following terms and answer basic questions about them:

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

Five Whys How To Do It Better

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Chapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

Tree Structured Classifier

Anomalous State of Knowledge. Administrative. Relevance Feedback Query Expansion" computer use in class J hw3 out assignment 3 out later today

Trigonometric Ratios Unit 5 Tentative TEST date

Lim f (x) e. Find the largest possible domain and its discontinuity points. Why is it discontinuous at those points (if any)?

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Reinforcement Learning" CMPSCI 383 Nov 29, 2011!

Chapter 3 Digital Transmission Fundamentals

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Computational modeling techniques

COMP 551 Applied Machine Learning Lecture 4: Linear classification

, which yields. where z1. and z2

Distributions, spatial statistics and a Bayesian perspective

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

Linear Classification

Writing Guidelines. (Updated: November 25, 2009) Forwards

Professional Development. Implementing the NGSS: High School Physics

The blessing of dimensionality for kernel methods

CS 109 Lecture 23 May 18th, 2016

Administrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?

Math Foundations 20 Work Plan

The standards are taught in the following sequence.

COMP9444 Neural Networks and Deep Learning 3. Backpropagation

Physical Layer: Outline

SAMPLING DYNAMICAL SYSTEMS

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES

7 TH GRADE MATH STANDARDS

INSTRUMENTAL VARIABLES

Query Expansion. Lecture Objectives. Text Technologies for Data Science INFR Learn about Query Expansion. Implement: 10/24/2017

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1

Differentiation Applications 1: Related Rates

Activity 2 Dimensional Analysis

Misc. ArcMap Stuff Andrew Phay

Chapter 16. Capacitance. Capacitance, cont. Parallel-Plate Capacitor, Example 1/20/2011. Electric Energy and Capacitance

A Matrix Representation of Panel Data

Aircraft Performance - Drag

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

1b) =.215 1c).080/.215 =.372

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Course Review. COMP3411 c UNSW, 2014

SPH3U1 Lesson 06 Kinematics

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!)

AP Physics Kinematic Wrap Up

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

Standard Title: Frequency Response and Frequency Bias Setting. Andrew Dressel Holly Hawkins Maureen Long Scott Miller

24 Multiple Eigenvectors; Latent Factor Analysis; Nearest Neighbors

Medium Scale Integrated (MSI) devices [Sections 2.9 and 2.10]

Data mining/machine learning large data sets. STA 302 or 442 (Applied Statistics) :, 1

Lecture 02 CSE 40547/60547 Computing at the Nanoscale

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

Math 10 - Exam 1 Topics

Math 9 Year End Review Package. (b) = (a) Side length = 15.5 cm ( area ) (b) Perimeter = 4xside = 62 m

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Module 4: General Formulation of Electric Circuit Theory

Lecture 5: Equilibrium and Oscillations

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

Phys102 Second Major-102 Zero Version Coordinator: Al-Shukri Thursday, May 05, 2011 Page: 1

Determining the Accuracy of Modal Parameter Estimation Methods

Engineering Decision Methods

1 PreCalculus AP Unit G Rotational Trig (MCR) Name:

Exam #1. A. Answer any 1 of the following 2 questions. CEE 371 October 8, Please grade the following questions: 1 or 2

Transcription:

Fall 2017 INTRODUCTION TO MACHINE LEARNING FOR MEDICINE Carla E. Brdley Prfessr & Dean Cllege f Cmputer and Infrmatin Science Nrtheastern University

WHAT IS MACHINE LEARNING/DATA MINING? Figure is frm Fayyad, Piatetsky-Shapir, Smyth, and Uthurusamy. Advances in Knwledge Discvery and Data Mining, 1996; image fund at: www2.cs.uregina.ca/~dbd/cs831/ntes/kdd/kdd.gif

Fall 2017 SUPERVISED LEARNING NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 3

SUPERVISED LEARNING Given: eample < 1, 2, nn, ff( 1, 2, nn ) > fr sme unknwn functin ff Find: A gd apprimatin t ff Gal: Apply ff t previusly unseen data Eample Applicatins: Regressin: ff is a cntinuus variable (e.g., predicting EDSS fr MS patients) Classificatin: ff is a discrete variable (e.g., predicting whether a patient has unilateral r bilateral Meniere s)

CLASSIFICATION EXAMPLE: CITATION SCREENING FOR SYSTEMATIC REVIEWS Systematic review: an ehaustive assessment f all the published medical evidence regarding a precise clinical questin e.g., Is aspirin better than leeches in inducing mre than 50% relief in patients with tensin headaches? Must find all relevant studies

TYPICAL WORKFLOW 26M PubMed SEARCH 10,000 Ptentially eligible SCREEN 500 Relevant

CITATION SCREENING Dctrs read these. They d rather be ding smething else.

GENERATING TRAINING DATA FOR SUPERVISED LEARNING Epert labels randm subset Induce (train) a classifier C ver Apply C t unlabeled eamples -

A DETOUR INTO TEXT ENCODING Classificatin algrithms perate n vectrs Feature space: an n-dimensinal representatin A bag-f-wrds eample: S 1 = Bstn drivers are frequently aggressive S 2 = The Bstn Red S frequently hit line drives

TEXT ENCODING: STOP WORDS S 1 = Bstn drivers are frequently aggressive S 2 = The Bstn Red S frequently hit line drives

TEXT ENCODING: LOWERCASING S 1 = bstn drivers are frequently aggressive S 2 = The bstn red s frequently hit line drives

TEXT ENCODING: STEMMING S 1 = bstn drive are frequent aggressive S 2 = The bstn red s frequent hit line drive

TEXT ENCODING: VOILA hit red s line bstn frequent drive aggressive S 1 = 0 0 0 0 1 1 1 1 S 2 = 1 1 1 1 1 1 1 0 A new sentence, S 3, cmes alng: I hate the red s. Which sentence is it mst similar t? S 3 = 0 1 1 0 0 0 0 0

SUPPORT VECTOR MACHINES: A HAND-WAVING EXPLANATION margin supprt vectrs Minimize: 11 22 ww. ww

SUPPORT VECTOR MACHINES: THE NON-LINEARLY SEPARABLE CASE ε 6 ε 2 ε 11 Minimize: 11 22 ww. ww + CC εε RR kk=11 kk

SUPERVISED LEARNING Epert labels randm subset Induce (train) a classifier C ver Apply C t unlabeled eamples -

SUPERVISED LEARNING What if we are clever in what eamples we label? Induce (train) a classifier C ver Apply C t unlabeled eamples -

ACTIVE LEARNING Key idea: have the epert label eamples mst likely t be helpful in inducing a classifier Need fewer labels fr gd classificatin perfrmance = less time/wrk/mney Need a scring functin ff: epected value f labeling Mst ppular strategy: uncertainty sampling

UNCERTAINTY SAMPLING (W/ SVMS) Which eamples shuld we label net?

UNCERTAINTY SAMPLING (W/ SVMS) Uncertainty sampling: label the eamples nearest the separating plane

UNCERTAINTY SAMPLING (W/ SVMS)

WHY OFF-THE-SHELF AL DOESN T WORK FOR CITATION SCREENING Imbalanced data; relevant class is very small (~5%), but sensitivity t this class is paramunt Randm Active (uncertainty) Recall Accuracy

WHY MIGHT UNCERTAINTY SAMPLING FAIL? Randm sampling Uncertainty sampling Hasty generalizatin: uncertainty sampling may miss clusters Pre-clustering desn t help unreliable in high-dimensins small clusters f interest

GUIDING AL WITH DOMAIN KNOWLEDGE Labeled terms: terms r n-grams whse presence is indicative f class membership tensin headache, leeches, aspirin migraine headache, mice Is aspirin better than leeches in inducing mre than 50% relief in patients with tensin headaches?

CO-TESTING FRAMEWORK (MUSLEA ET AL., 2000) Mdel 1 Mdel 2 F 1 () F 2 () If mdel 1 disagrees with mdel 2 abut, then is a gd pint t label

LABELED TERMS + CO-TESTING Mdel 1: Standard BOW (linear kernel) SVM Mdel 2: Rati f #ps terms t #neg terms Query strategy: Find all dcuments abut which the mdels disagree Select fr labeling items f maimum disagreement

COPD: GENETIC ASSOCIATIONS WITH COPD

MOST IMPORTANT REQUIREMENT FOR MACHINE LEARNING TO WORK: THE DATA Are the features predictive f the class? Hw nisy is the data? (attribute nise vs. class nise) D yu have enugh (labeled) data? Are the training samples representative? NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 28

TRANSFER LEARNING A machine learning technique t imprve perfrmance leveraging n related knwledge A primary task n dataset TT An auiliary dataset TT aaaaaa TT and TT aaaaaa are usually related and have similar distributins Auiliary TT aaaaaa Primary TT

TRANSFER LEARNING EXAMPLES Predicting readmissin t hspitals Use data frm ther hspitals t predict fr yur hspital Predicting MS prgressin Cmbining data frm multiple physicians NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 30

Fall 2017 UNSUPERVISED LEARNING NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 31

CLUSTERING Given a set f data pints, each described by a set f attributes, find clusters such that: Inter-cluster similarity is maimized Intra-cluster similarity is minimized Requires the definitin f a similarity measure F1 F2

EXAMPLE: K-MEANS

EXAMPLE: K-MEANS

EXAMPLE: K-MEANS

EXAMPLE: K-MEANS

EXAMPLE: K-MEANS

EXAMPLE: K-MEANS

EXAMPLE: CONSTRAINED K-MEANS

EXAMPLE: CONSTRAINED K-MEANS

EXAMPLE: CONSTRAINED K-MEANS

CHALLENGES IN CLUSTERING MEDICAL DATA Cnfunding factr: One r a set f features whse effect will lead t undesirable clustering slutin if nt remved Clustering clinical data: Physician subjectivity Age fr neurlgical test scring in MS

EXAMPLE: VESTIBULAR DISORDERS Balance Functin Age

CLUSTERING WITH K = 2 Balance Functin Age

PROPOSED SOLUTION Remve the impact f cnfunding factr F via cnstraint-based clustering: 1. Bin the data int hmgeneus grups w.r.t. F 2. Apply clustering t each grup and generate pair-wise instance cnstraints 3. Apply cnstraint-based clustering t entire data

STEP 1: BINNING (STRATIFICATION) Categrical F: Create ne bin per categry Eample: ne bin per physician fr MS data Numeric F: Create bins f: Unifrm ranges r unifrm bin sizes Dmain knwledge Mre sphisticated binning methds, such as nnparametric density estimatin, etc

STEP 1: BINNING Balance Functin 40 Age

STEP 2: CLUSTER IN EACH BIN AND GENERATE CONSTRAINTS In each bin: Apply clustering (e.g., EM ver a miture f Gaussians) Number f clusters can be specified by dmain knwledge r inferred using criteria such as BIC Generate must-nt-link cnstraints fr pairs f instances in different clusters

STEP 2: CLUSTER EACH BIN Balance Functin 40 Age

STEP 2: GENERATE CONSTRAINTS Balance Functin 40 Age

STEP 3: APPLY CONSTRAINT BASED CLUSTERING TO THE ENTIRE DATA Balance Functin Age

Fall 2017 ANOMALY DETECTION NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 52

ANOMALY DETECTION Given a set f data pints, each described by a set f attributes, pints that are far away frm mst f the ther pints als called utliers Requires the definitin f a similarity measure F1 F2

TYPES OF ANOMALY DETECTION Supervised Labelled nrmal and anmalus data Similar t rare (minrity) class mining Semi-supervised Labels available nly fr nrmal data Unsupervised N labelled data Assumptin: anmalies are rare cmpared t nrmal data

COMPLEXITIES OF ANOMALY DETECTION Where des the nrmal data cme frm? Feature selectin Metric Different parts f the space may have different densities NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 55

COMPLEXITIES OF ANOMALY DETECTION Which f P1, P2 and P3 are anmalies? Distance frm p 3 t nearest neighbr p 3 Distance frm p 2 t nearest neighbr p 2 p 1

ANOMALY DETECTION EXAMPLE: DETECTING CORTICAL LESIONS 50 millin affected by epilepsy wrldwide One-third remain refractry t treatment One f the mst cmmn causes f TRE: Fcal Crtical Dysplasia (FCD) Treatment: Surgical resectin f the abnrmal crtical tissue (aka lesin) Visual MRI Eam Lesin Identificatin & Tracing Inter-Cranial EEG Analysis Resective Surgery 70-80% f histlgically verified FCD cases have nrmal MRI Chances f being seizure free after surgery: MRI-Psitive: 66% MRI-Negative: 29%

MACHINE LEARNING CHALLENGES Input data Surfaces f FCD patients (MRI) Resected tissue (MRI-Negatives): histpathlgically verified Generus margins t ensure cmplete lesin remval Eact lcatin f the lesin is unknwn Labels Resectin znes fr MRI-negatives Lesin tracings by neurradilgists fr MRI-psitives False psitives in training data False negatives in training data frm lng untreated epilepsy, trauma, etc.

PROPOSED SOLUTION Hierarchical Cnditinal Randm Fields fr Outlier Detectin Discard piel-level labels and use nly image-level labels Redefine FCD lesin as: a crtical regin which is an utlier when cmpared t the same regin acrss a ppulatin f nrmal cntrls Hierarchical Cnditinal Randm Field

RESULTS Tested n fifteen MRI-negative patients with successful surgery High detectin rate (80%) fr MRInegative patients with higher average recall and precisin

MY LAST WORDS There are many, many different learning algrithms, but the key t success is in having the right training data. MLHC is a great cnference. NORTHEASTERN UNIVERSITY COLLEGE OF COMPUTER AND INFORMATION SCIENCE 61