FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

Size: px

Start display at page:

Download "FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION"

Edith Bell
5 years ago
Views:

1 SunLab Enlighten the World FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL SCIENCE AND ENGINEERING GEORGIA INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE IN MEDICINE 2015 WORKSHOP ON MATRIX COMPUTATIONS FOR BIOMEDICAL INFORMATICS

2 Talk Outline

3 Talk Outline THE PROBLEM: PREDICT TYPE 2 DIABETES

4 Talk Outline THE PROBLEM: PREDICT TYPE 2 DIABETES USING ELECTRONIC HEALTH RECORDS

5 Talk Outline THE PROBLEM: PREDICT TYPE 2 DIABETES USING ELECTRONIC HEALTH RECORDS THE METHOD: FACTORIZATION MACHINES

6 Healthcare Data Challenges

7 Healthcare Data Challenges HETEROGENEOUS, HIGH-DIMENSIONAL Diagnoses Medications Genomic

8 Healthcare Data Challenges HETEROGENEOUS, HIGH-DIMENSIONAL NOISY

9 Healthcare Data Challenges HETEROGENEOUS, HIGH-DIMENSIONAL NOISY SPARSE

10 Type 2 Diabetes Mellitus

11 Type 2 Diabetes Mellitus CURRENTLY: MORE THAN 3M CASES PER YEAR IN THE US

12 Type 2 Diabetes Mellitus CURRENTLY: MORE THAN 3M CASES PER YEAR IN THE US EARLY DETECTION AND TREATMENT DECREASES THE RISK OF DEVELOPING DIABETES COMPLICATIONS

13 Type 2 Diabetes Mellitus CURRENTLY: MORE THAN 3M CASES PER YEAR IN THE US TARGET OF THIS WORK: ACCURATE CLASSIFICATION OF DIABETIC PATIENTS, GIVEN THEIR CLINICAL STATUS

14 Model learning from sparse data LINEAR REGRESSION y(x) =w 0 + p P w i x i w 0 2 R, w 2 R p

15 Model learning from sparse data LINEAR REGRESSION OFTEN INTERESTED IN FEATURE INTERACTIONS (e.g. how diagnostic conditions JOINTLY affect the target prediction) y(x) =w 0 + p P w i x i w 0 2 R, w 2 R p

16 Model learning from sparse data POLYNOMIAL REGRESSION (d=2) OFTEN INTERESTED IN FEATURE INTERACTIONS (e.g. how diagnostic conditions JOINTLY affect the target prediction) y(x) =w 0 + w i x i {z } linear terms + j i W i,j x i x j {z } non-linear terms w 0 2 R, w 2 R p, W 2 R p p

17 Model learning from sparse data POLYNOMIAL REGRESSION (d=2) OFTEN INTERESTED IN FEATURE INTERACTIONS (e.g. how diagnostic conditions JOINTLY affect the target prediction) y(x) =w 0 + w i x i {z } linear terms + j i W i,j x i x j {z } non-linear terms w 0 2 R, w 2 R p, W 2 R p p 1) VERY LARGE NUMBER OF PARAMETERS

18 Model learning from sparse data POLYNOMIAL REGRESSION (d=2) OFTEN INTERESTED IN FEATURE INTERACTIONS (e.g. how diagnostic conditions JOINTLY affect the target prediction) y(x) =w 0 + w i x i {z } linear terms + j i W i,j x i x j {z } non-linear terms w 0 2 R, w 2 R p, W 2 R p p 1) VERY LARGE NUMBER OF PARAMETERS e.g features => more than 1Mil parameters!

19 Model learning from sparse data POLYNOMIAL REGRESSION (d=2) OFTEN INTERESTED IN FEATURE INTERACTIONS (e.g. how diagnostic conditions JOINTLY affect the target prediction) y(x) =w 0 + w i x i {z } linear terms + j i W i,j x i x j {z } non-linear terms w 0 2 R, w 2 R p, W 2 R p p 2) SPARSE DATASET (=> FEW AVAILABLE FEATURES / PATIENT )

20 Model learning from sparse data POLYNOMIAL REGRESSION (d=2) OFTEN INTERESTED IN FEATURE INTERACTIONS (e.g. how diagnostic conditions JOINTLY affect the target prediction) y(x) =w 0 + w i x i {z } linear terms + j i W i,j x i x j {z } non-linear terms w 0 2 R, w 2 R p, W 2 R p p 2) SPARSE DATASET (=> FEW AVAILABLE FEATURES / PATIENT ) Estimate each interaction parameter INDEPENDENTLY from limited data

how diagnostic conditions JOINTLY affect the target prediction) y(x) =w 0 + w i x i {z } linear terms + j i W

21 Model learning from sparse data Similar issues with Poly-Kernel SVM POLYNOMIAL REGRESSION (d=2) OFTEN INTERESTED IN FEATURE INTERACTIONS (e.g. how diagnostic conditions JOINTLY affect the target prediction) y(x) =w 0 + w i x i {z } linear terms + j i W i,j x i x j {z } non-linear terms w 0 2 R, w 2 R p, W 2 R p p 2) SPARSE DATASET (=> FEW AVAILABLE FEATURES / PATIENT ) Estimate each interaction parameter INDEPENDENTLY from limited data

22 Factorization Machines [Rendle 2010, 2012]

23 Factorization Machines [Rendle 2010, 2012] Very successful in predicting your next movie ratings!

24 Factorization Machines (d=2) [Rendle 2010, 2012] w 0 2 R, w 2 R p, V 2 R p k y(x) =w 0 + w i x i + hv i,v j ix i x j j>i Represent each feature via a vector (length k) => low-rank matrix V Interaction between features i, j expressed through: <V i, V j > Weights are now dependent: <V i, V j >, <V i, V l > share V i Thus: estimation of one interaction aids in estimating related interactions

25 Factorization Machines (d=2) [Rendle 2010, 2012] w 0 2 R, w 2 R p, V 2 R p k 0 kx y(x) =w 0 + w i x f=1 V i,f x i! 2 V 2 i,f x 2 i 1 A Represent each feature via a vector (length k) => low-rank matrix V Interaction between features i, j expressed through: <V i, V j > Weights are now dependent: <V i, V j >, <V i, V l > share V i Thus: estimation of one interaction aids in estimating related interactions Flexible form allows LINEAR computation of model equation in both k and p

26 Factorization Machines (d=2) [Rendle 2010, 2012] w 0 2 R, w 2 R p, V 2 R p k 0 kx y(x) =w 0 + w i x f=1 V i,f x i! 2 V 2 i,f x 2 i 1 A Represent each feature via a vector (length k) => low-rank matrix V Interaction between features i, j expressed through: <V i, V j > Weights are now dependent: <V i, V j >, <V i, V l > share V i Thus: estimation of one interaction aids in estimating related interactions Flexible form allows LINEAR computation of model equation in both k and p By construction: #params << p 2 of Polynomial Regression!

27 Experiments Dataset released from Practice Fusion Task: diabetes classification 9948 patients, 1904 of them are diabetic 702 FEATURES: Diagnoses (first-3 ICD9 digits), sex, year of birth Sparse dataset: 1% non-zero values libfm package for Factorization Machines MCMC as the learning method (requires the least number of hyperparameters) Competing methods (Matlab built-in): SVM with gaussian kernel, Random Forests All parameters selected via cross-validation

28 Experiments - Prediction Accuracy F1-Score Accuracy Precision Recall AUC Factorization Machines SVM Random Forests Results on detecting Type 2 Diabetes Mellitus, 5-fold CV, low rank of FMs k=4 FMs outperform state-of-the-art classifiers in terms of F1-Score Approx. same performance with best alternative over Accuracy, AUC

29 Experiments - Selection of model dimensionality Average precision Model dimensionality Average precision of FM of increasing model complexity (varying k = {1, 2, 4, 8, 16, 32, 64})

30 Experiments - Scalability to the model dimensionality Average time per fold (seconds) Model dimensionality E ect of increasing model complexity (varying k = {1, 2, 4, 8, 16, 32, 64}) on the FM training time

31 Experiments - Scalability to the sample size 3 Training time (seconds) ,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 Size of training data (#samples) E ect of increasing dataset size on the FM training time (fixed k = 4)

32 Concluding remarks - Future work FACTORIZATION MACHINES: efficient model learning method for sparse EHR datasets HEALTHCARE DATA ARE INHERENTLY SPARSE: patients have to visit physician so that a diagnosis is recorded NEXT: incorporation of more data sources Take domain knowledge into account Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL SCIENCE AND ENGINEERING GEORGIA INSTITUTE OF TECHNOLOGY SunLab Enlighten the World

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Computer Science Department University of Pittsburgh Outline Introduction Learning with