A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

Similar documents
arxiv: v1 [stat.ap] 5 Nov 2017

Non-Parametric Bayes

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian non-parametric model to longitudinally predict churn

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Nonparametric Bayes tensor factorizations for big data

CPSC 540: Machine Learning

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model

Bayesian Networks in Educational Assessment

Principles of Bayesian Inference

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Part 7: Hierarchical Modeling

The STS Surgeon Composite Technical Appendix

Introduction to Machine Learning CMU-10701

A Bayesian multi-dimensional couple-based latent risk model for infertility

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

David Hughes. Flexible Discriminant Analysis Using. Multivariate Mixed Models. D. Hughes. Motivation MGLMM. Discriminant. Analysis.

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Nonparametric Bayes Modeling

Bayesian Nonparametrics

MCMC: Markov Chain Monte Carlo

Image segmentation combining Markov Random Fields and Dirichlet Processes

Principles of Bayesian Inference

Bayesian nonparametrics

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Contents. Part I: Fundamentals of Bayesian Inference 1

Principles of Bayesian Inference

Markov Chain Monte Carlo (MCMC)

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

Learning the hyper-parameters. Luca Martino

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Chapter 2. Data Analysis

Advanced Machine Learning

Answers and expectations

Bayes methods for categorical data. April 25, 2017

Data Preprocessing. Cluster Similarity

Bayesian Methods for Machine Learning

Variable selection for model-based clustering of categorical data

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

STAT 518 Intro Student Presentation

Bayesian Nonparametric Models

VCMC: Variational Consensus Monte Carlo

Probabilities for climate projections

Bayesian Regression Linear and Logistic Regression

eqr094: Hierarchical MCMC for Bayesian System Reliability

Machine Learning. Probabilistic KNN.

Scaling up Bayesian Inference

Advances and Applications in Perfect Sampling

Bayesian model selection: methodology, computation and applications

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Introduction to Machine Learning. Lecture 2

1 Bayesian Linear Regression (BLR)

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

On Bayesian Computation

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen

Dynamic Scheduling of the Upcoming Exam in Cancer Screening

Multivariate Survival Analysis

Hybrid Dirichlet processes for functional data

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

CSC 2541: Bayesian Methods for Machine Learning

Scaling Neighbourhood Methods

Nonparametric Bayesian Methods (Gaussian Processes)

MCMC algorithms for fitting Bayesian models

Longitudinal breast density as a marker of breast cancer risk

Stat 451 Lecture Notes Monte Carlo Integration

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Lecture 13 : Variational Inference: Mean Field Approximation

Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs

Bayesian model selection in graphs by using BDgraph package

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

10. Exchangeability and hierarchical models Objective. Recommended reading

CMPS 242: Project Report

Bayesian statistics and modelling in ecology

Bayesian Registration of Functions with a Gaussian Process Prior

Bayesian Inference of Interactions and Associations

Principles of Bayesian Inference

Monte Carlo in Bayesian Statistics

Bayesian GLMs and Metropolis-Hastings Algorithm

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

LECTURE 15 Markov chain Monte Carlo

Nonparametric Bayesian Methods - Lecture I

Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models

Overview of statistical methods used in analyses with your group between 2000 and 2013

Markov chain Monte Carlo

STAT 425: Introduction to Bayesian Analysis

Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

Approximate Bayesian Computation and Particle Filters

Hmms with variable dimension structures and extensions

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Transcription:

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes (OSU) and Rolando de la Cruz (Universidad Adolfo Ibáñez) July 29, 2018 JSM Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 1

The Motivating Question Women undergoing assisted reproductive therapy (ART) in order to become pregnant are at heightened risk of early pregnancy loss. The concentration of the hormone β-hcg is associated with the growth of the fetus in early pregnancy and may be predictive of abnormal pregnancy (loss of fetus or complications leading to nonterminal delivery). Normal Pregnancies Abnormal Pregnancies Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 2

The Motivating Question If we know (a part of) a patient s β-hcg trajectory, can we use this information to predict whether she will go on to have a successful pregnancy? Let D = 1 denote disease (abnormal pregnancy) and let Y denote the vector of HCG-values for patient i. Fit a model f (Y D = 0) for the normal pregnancy patients and a model f (Y D = 1) for the abnormal pregnancy patients. Use Bayes Theorem to get a probability of abnormal pregnancy, given the HCG trajectory. pr(d = 1 Y) = f (Y D = 1)pr(D = 1) f (Y D = 1)pr(D = 1) + f (Y D = 0)pr(D = 0). Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 3

The Motivating Question Normal Pregnancies Abnormal Pregnancies A couple of concerns: The abnormal pregnancy model doesn t appear to fit very well. I may believe that there are many ways for the pregnancy to fail and a model that tries to explain all of these in the same way won t work well. We need a model that allows multiple types of trajectories, but we don t necessarily know how many there should be. Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 4

Contents 1 Bayesian Nonparametric Model for Longitudinal Trajectory 2 Computational Considerations 3 Simulation Study 4 Data Application: Assisted pregnancy in Chilean women Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 5

Some Notation N = number of patients In pregnancy data: N = 173 D i = disease status (1 for disease, 0 for healthy) 49 (28.3%) abnormal pregnancies t ij = the jth measurement occasion for patient i n i = number of measurement occasions for patient i Measurement times are not aligned. n i ranges from 1 to 6, with 30% having only 1 measurement. Y ij = longitudinal biomarker measurement for patient i at the jth observation (time t ij ) Y ij = log 10 (β-hcg) Y i = (Y i1, Y i2,..., Y i,ni ) = vector of biomarkers for patient i Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 6

2-Component Model First, let s formalize the 2-component model that we compare our approach to (Marshall and Baron, 2000). D i Bern(φ) Y ij = f (t ij ; θ Di ) + γ i + ɛ ij f (t; θ) = θ 1 1 + exp{ θ 2 t θ 3 } This model has two components: γ i N(0, γ 2 ) ɛ i MVN ni ( 0ni, σ 2 R i (ρ) ) Healthy: probability 1 φ and parameter θ 0 = (θ 01, θ 02, θ 03 ) Disease: probability φ and parameter θ 1 = (θ 11, θ 12, θ 13 ) Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 7

Bayesian Nonparametric Model for Longitudinal Trajectory We extend the 2-component model to a Bayesian Nonparametric version. (θ i, φ i ) DP (α, MVN 3 (θ, Σ) Beta(a, b)) D i Bern(φ i ) Y ij = f (t ij ; θ i ) + γ i + ɛ ij γ i N(0, γ 2 ) ɛ i MVN ni ( 0ni, σ 2 R i (ρ) ), Due to the DP choice, there are many ties in the (θ i, φ i )s across patients. In essence, we have a small number of clusters with unique values of these parameters. We will have multiple types of clusters. Clusters with φ (c) near 0 will contain mainly healthy patients and clusters φ (c) near 1 mainly diseased patients. Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 8

Bayesian Nonparametric Model for Longitudinal Trajectory Let c i {1, 2, 3,...} be the cluster indicator for patient i, we can equivalently express the model as: k 1 pr(c i = k) = ψ k = V k (1 V j ) D i c i = k Bern(φ (k) ) j=1 Y ij c i = k = f (t ij ; θ (k) ) + γ i + ɛ ij γ i N(0, γ 2 ) ɛ i MVN ni ( 0ni, σ 2 R i ) θ (k) MVN 3 (θ, Σ) φ (k) Beta(a, b) V k Beta(1, α) Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 9

Bayesian Nonparametric Model for Longitudinal Trajectory If we observe a patient s longitudinal trajectory y, we can predict disease status as follows. pr(d = 1 y) = E{I(D = 1) y} = E [E {I(D = 1) C = k, y} y] H = φ (k) pr(c = k y) = k=1 H k=1 φ (k) ψ k MVN (y; f (t; θ (k) ), σ 2 R i (ρ) + γ 2 11 ) H h=1 ψ h MVN(y; f (t; θ (h) ), σ 2 R i (ρ) + γ 2 11 ) Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 10

Bayesian Nonparametric Model for Longitudinal Trajectory We will also need to specify priors for the remaining distributions: α Gamma(1, 1) γ 2 InvGamma(0.1, 0.1) σ 2 InvGamma(0.1, 0.1) ρ Unif(0, 1) θ MVN 3 (1 3, 10 2 I 3 ) Σ InvWish(5, I 3 ) a = b = 0.5 Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 11

Contents 1 Bayesian Nonparametric Model for Longitudinal Trajectory 2 Computational Considerations 3 Simulation Study 4 Data Application: Assisted pregnancy in Chilean women Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 12

Markov chain Monte Carlo sampling In order to obtain any posterior inference, we will need a Markov chain Monte Carlo (MCMC) sampling algorithm. However, there are some challenges to drawing inference in our model. Label Switching: The likelihood function is invariant to permutations of the cluster labels. (Stephens, 2000). For instance, an estimate φ (1) from the MCMC sample is non-sensical because the types of patient who define the first cluster changes over the course of the sampler. Obviously, cluster-defined parameters are not identifiable, but these two parameters are estimable: Prob. two patients are in the same cluster: pr(c i = C j y i, y j ) Disease predictions: pr(d = 1 y) = H k=1 φ(k) pr(c = k y) Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 13

Markov chain Monte Carlo sampling If all we need is to make predictions about disease status, then label switching is not an issue. But if we want to make conclusions about particular trajectories, we will need posterior samples with identifiable parameters. Based on the posterior pairwise probabilities pr(c i = C j y i, y j ), we estimate an optimal partition, the mostly likely clustering configuration. Given the optimal partition, we can run a Stage II MCMC algorithm without updating the cluster membership to get a usable posterior sample. Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 14

Estimating the optimal partition Dahl s Method (Dahl, 2006) Find ĉ = (ĉ 1, ĉ 2,..., ĉ n ) from MCMC sample that minimizes N i=1 j=1 N [I (ĉ i = ĉ j ) pr(c i = C j y)] 2. Hierarchical Clustering Using the pairwise clustering probabilities pr(c i = C j y i, y j ), we can obtain a dendrogram describing the clustering relationships. 0.0 0.5 1.0 1.5 2.0 14 15 12 13 10 11 6 There are various choices of the linkage criteria that can be used, and we consider the average linkage and Ward s method. 2 Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 15 9 8 7 5 4 1 3

Estimating the optimal partition Hierarchical Clusterings This gives us a partition for each number of clusters k, but we still have to choose the value k. Some automated methods designed to choose k based on minimizing some criteria: Gamma measure, Tau measure, Silhouette index (Charrad et al, 2014). But most approaches to selecting k require an actual (rectangular) data matrix. Under average linkage, dendrogram height h represents that for i and j assigned to different clusters pr(c i C j y) h, so we can specify a value of h, such as 0.75 or 0.9. Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 16

Model Choices Different Models under consideration: 1 Two-component model 2 Bayesian Nonparametric model with label switching Disease prediction through Bayesian model averaging (BMA) 3 Bayesian Nonparametric model with 2-stage estimation Choose optimal clustering by Dahl s method 4 Bayesian Nonparametric model with 2-stage estimation Choose optimal clustering through hierarchical clustering under each method for choosing k Because the number of clusters and the membership are unknown, the BMA choice most accurately represents what we can learn from the data. Theoretically, predictions under BMA should be best (lowest variance) by Rao-Blackwell considerations. Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 17

Contents 1 Bayesian Nonparametric Model for Longitudinal Trajectory 2 Computational Considerations 3 Simulation Study 4 Data Application: Assisted pregnancy in Chilean women Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 18

Simulation Study #1 All Groups ( n = 200 ) Group 1 ( n = 80 ) Group 2 ( n = 40 ) φ= 0 φ= 0.2 Group 3 ( n = 30 ) Group 4 ( n = 30 ) Group 5 ( n = 20 ) φ= 0.2 φ= 0.9 φ= 1 Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 19

Simulation Study #2 All Groups ( n = 200 ) Group 1 ( n = 150 ) Group 2 ( n = 50 ) φ= 0 φ= 1 In each case, we generate 200 datasets with 200 observations and apply each of the methods. Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 20

Simulation Study Results Out of sample Within sample Model Clusters % error AUC % error AUC Simulation Study #1 (5 components) 2-component 2 25.4% (0.9) 0.768 (.013) 25.0% (2.5) 0.783 (.013) BMA 8.1 (1.1) 21.7% (0.7) 0.823 (.007) 20.1% (3.0) 0.848 (.029) Dahl 9.0 (2.9) 22.6% (0.9) 0.813 (.009) 20.2% (2.9) 0.844 (.030) Avg(h =.75) 5.0 (1.1) 22.8% (1.5) 0.809 (.022) 20.7% (3.3) 0.837 (.035) Avg(median) 7.7 (1.1) 22.9% (1.8) 0.806 (.033) 20.6% (3.3) 0.837 (.043) Avg(Silhouette) 4.2 (0.9) 23.0% (2.2) 0.806 (.018) 21.2% (3.2) 0.831 (.036). Simulation Study #2 (2 components) 2-component 2 18.2% (1.0) 0.853 (.007) 16.0% (2.1) 0.877 (.027) BMA 3.0 (0.6) 18.0% (1.0) 0.854 (.008) 16.0% (2.0) 0.875 (.028) Dahl 2.5 (0.8) 18.0% (1.0) 0.856 (.008) 16.1% (2.2) 0.875 (.027) Avg(h =.75) 2.1 (0.2) 18.0% (1.1) 0.857 (.009) 16.1% (2.0) 0.874 (.028) Avg(median) 2.7 (0.7) 18.2% (1.6) 0.851 (.032) 16.3% (2.7) 0.871 (.045) Avg(Silhouette) 2.1 (0.4) 18.0% (1.1) 0.856 (.009) 16.1% (2.2) 0.874 (.029). Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 21

Simulation Study Results Comments: If the two-component model is not correct, it produces substantially worse predictions; when two-component model is correct, BMA and two-stage BNP estimators do just as well. BMA beats the two-stage BNP predictions but not by too much. The minor loss in prediction accuracy may be justified by more clear interpretations. The Dahl method tends to have too many clusters, even though it is among the best two-stage methods in prediction. We recommend prediction using the BMA estimates and interpretation through two-stage procedure under the average linkage with fixed h = 0.75, followed by the Silhouette index (either average or Ward linkage) and Dahl s method. Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 22

Contents 1 Bayesian Nonparametric Model for Longitudinal Trajectory 2 Computational Considerations 3 Simulation Study 4 Data Application: Assisted pregnancy in Chilean women Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 23

Data Application: Assisted pregnancy in Chilean women We estimate the accuracy using within sample prediction from the full data (n = 173) and from a 25-fold cross validation (n=35, 20% test set). Full data 25-fold CV Model Clusters % error AUC % error AUC 2-component 2 16.2% 0.863 19.5% (1.2) 0.865 (.004) BMA 8.6 13.3% 0.900 16.2% (1.1) 0.900 (.003) Dahl 10 12.7% 0.898 16.8% (1.0) 0.888 (.004) Avg(h = 0.75) 5 13.3% 0.883 17.2% (1.4) 0.881 (.005) Avg(Silhouette) 5 13.3% 0.880 16.3% (1.0) 0.890 (.005) Ward(Silhouette) 3 14.5% 0.889 17.0% (1.0) 0.884 (.005) Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 24

Data Application: Assisted pregnancy in Chilean women We consider the interpretable conclusions based on the Stage II model fits from the Silhouette index with average linkage. Cluster A Cluster B n=16 Disease prob=0.971 (0.86, 1) n=25 Disease prob=0.942 (0.83, 1) Index Cluster C Cluster D Cluster E n=2 Disease prob=0.494 (0.06, 0.93) n=13 Disease prob=0.181 (0.03, 0.42) n=117 Disease prob=0.056 (0.02, 0.1) Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 25

References Gaskins, J., C. Fuentes, R. de la Cruz. (2017) A Bayesian Nonparametric Model for Predicting Pregnancy Outcomes Using Longitudinal Profiles. arxiv:1711.01512 Charrad, M., H. Ghazzali, V. Boiteau, and A. Niknafs. (2014) NbClust: An R package for determining the relevant number of clusters in a data set. Journal of Statistical Software 61:1 36. Dahl, D.B. (2006), Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model, in Bayesian Inference for Gene Expression and Proteomics. K.-A. Do, P. Muller, and M. Vannucci (eds.) Cambridge University Press. Marshall, G., and A.E. Baron. (2000) Linear discriminant models for unbalanced longitudinal data. Stat in Med. 19:1969 1981. Stephens, M. (2000) Dealing with label switching in mixture models. JRSSB 62:795 809. Jeremy Gaskins University of Louisville BNP Disease Classification from Longitudinal Data 26