Scalable Joint Modeling of Longitudinal and Point Process Data:

Size: px

Start display at page:

Download "Scalable Joint Modeling of Longitudinal and Point Process Data:"

Allyson Ball
5 years ago
Views:

Management of Chronic Kidney Disease Joe Futoma

1 Scalable Joint Modeling of Longitudinal and Point Process Data: Disease Trajectory Prediction and Improving Management of Chronic Kidney Disease Joe Futoma UAI Bayesian Applications Workshop, 2016 June 29,

2 Outline Motivation Working with EHR Data Proposed Joint Model Experiments & Results In Clinical Practice 2

3 egfr 1 st Nephrology Visit Kidney Nephrology Visit Function (GFR) PCP Visit ED Visit Hospital Admission Acute MI Age Death

4 Untreated diabetes & high blood pressure. Normal kidney function, but with evidence of kidney damage. egfr 1 st Nephrology Visit Nephrology Visit PCP Visit No regular medical care. ED Visit Hospital Admission Acute MI Death

5 egfr Age 48 A few more ER trips Kidney function falling 1 st Nephrology Visit Nephrology Visit PCP Visit ED Visit Hospital Admission Acute MI Death

6 egfr 1 st Nephrology Visit Nephrology Visit PCP Visit ED Visit Age 49 Kidney function now 50% Hospital Admission Acute MI Death

7 egfr 1 st Nephrology Visit Nephrology Visit PCP Visit ED Visit Hospital Admission Acute MI Death Age 50 Gets a Duke Primary Care doc Kidney function 30%

8 egfr 1 st Nephrology Visit Nephrology Visit PCP Visit ED Visit Hospital Admission Acute MI Death Age 51 CKD first noted on problem list

9 egfr 1 st Nephrology Visit Nephrology Visit PCP Visit ED Visit Hospital Admission Acute MI Death Referred to Nephrology

10 egfr 1 st Nephrology Visit Dialysis Begins Nephrology Visit PCP Visit ED Visit Hospital Admission Acute MI Death Three months later presents to ER with kidney failure symptoms and crash starts dialysis

11 egfr 1 st Nephrology Visit Dialysis Begins Nephrology Visit PCP Visit ED Visit Hospital Admission Acute MI Death

12 egfr Missed Opportunities: 1 st Nephrology Visit Nephrology Visit To prevent or delay kidney failure PCP Visit ED Visit Hospital Admission To prepare for kidney failure Acute MI Death

13 42% starting dialysis have no prior nephrology care 13

14 <10% with moderate CKD <50% with severe CKD even aware of illness! 14

15 Chronic Kidney Disease (CKD) Progressive loss of kidney function with significant morbidity, mortality, health system utilization, economic costs 15

16 Chronic Kidney Disease (CKD) Heart disease Progressive loss of kidney function with significant morbidity, mortality, health system utilization, economic costs 15

17 Chronic Kidney Disease (CKD) Heart disease Diabetes Progressive loss of kidney function with significant morbidity, mortality, health system utilization, economic costs 15

18 Outline Motivation Working with EHR Data Proposed Joint Model Experiments & Results In Clinical Practice 16

19 Data Expertise Statistical Expertise Clinical Expertise 17

20 Real EHR data poses many challenges! 18

ICD9) Procedures (10,000 CPT) Medications

21 Chronicles Extract, Transform, and Load Demographics Encounters Diagnoses (13,000 ICD9) Procedures (10,000 CPT) Medications Lab Results Decision Support Repository Physician Orders Culture Radiation/Oncology Notes Results Vitals Social History Problem Lists Allergies Nursing notes Geography 19 Extract, Transform, and Load Clarity

22 Data acquisition from DEDUCE 630,000 patients pulled, one encounter in previous year (winter 2015) 393,000 with at least 1 serum creatinine lab 115,000 with 10+ labs 30,000 patients age 65+ and meet criteria for moderate stage CKD 20

23 Transforming Data

24 22

25 Quantifying CKD Progression Estimated glomerular filtration rate (egfr) is an extremely noisy estimate of kidney function. Most common validated equation: egf R = 141 min(s cr /apple, 1) max(s cr /apple, 1) Age I(female) I(black) S cr : Serum creatinine (mg/dl) apple :0.7 (female), 0.9 (male) : (female), (male) 23

26 Quantifying CKD Progression Estimated glomerular filtration rate (egfr) is an extremely noisy estimate of kidney function. Most common validated equation: egf R = 141 min(s cr /apple, 1) max(s cr /apple, 1) Age I(female) I(black) S cr : Serum creatinine (mg/dl) apple :0.7 (female), 0.9 (male) : (female), (male) 23

27 Outline Motivation Working with EHR Data Proposed Joint Model Experiments & Results In Clinical Practice 24

28 Proposed Joint Model Goal: jointly model risks of future loss of kidney function, cardiac events Heart attacks (AMI), Stroke (CVA) Hierarchical latent variable model: captures dependencies between disease trajectory and event risk Submodels for longitudinal, event data with shared latent variables ~y i ~t i ~u i x i : egfrs at times ; : event times (may be none); covariates Conditional independence in joint likelihood: p(~y i,~u i z i,b i,f i,v i ; x i )=p(~y i z i,b i,f i ; x i )p( ~u i z i,b i,f i,v i ; x i ) 25

29 Longitudinal Submodel 26

30 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= YN i j=1 p(y ij z i,b i,f i ) 26

31 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean 26

32 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) 26

33 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean iid random noise y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) 26

34 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) 26

35 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) Population component (fixed intercept and slope) 26

36 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). basis expansion, in practice [1,t] coefficient matrix baseline covariates Population component (fixed intercept and slope) 26 2 )

37 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) 26

38 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) Subpopulation component (unique disease trajectory with splines) 26

39 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). fixed B-spline expansion of time 2 ) coefficient subpopulation vector assignment (1-G), p(z i = g) / exp{w g > x iz } Subpopulation component (unique disease trajectory with splines) 26

40 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) 26

41 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) Individual component (random intercept and slope) 26

42 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) basis expansion, in practice [1,t] random effect, b i N(0, b ) Individual component (random intercept and slope) 26

43 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) 26

44 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) Structured noise process (GP noise, transient trends) 26

45 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) Structured noise process (GP noise, transient trends) 26 zero mean GP, OU kernel: K OU (t 1,t 2 )= f 2 exp{ t 1 t 2 } l

46 Longitudinal Submodel Longitudinal values conditionally independent: p(~y i z i,b i,f i )= Values normally distributed, with mean a sum of 4 terms YN i j=1 p(y ij z i,b i,f i ) Can also view as GP with highly structured mean y i (t) =m i (t)+ i (t), i (t) iid N(0, m i (t) = p (t) > x ip + z (t) > z i + l (t) > b i + f i (t). 2 ) 26

47 Point Process Submodel 27

48 Point Process Submodel Poisson Process model, conditional likelihood on [T i,t + i ], events at {u ik } K i k=1: 27

49 Point Process Submodel Poisson Process model, conditional likelihood on [T i,t + i ], events at {u ik } K i k=1: YK i Z T + i p( ~u i z i,b i,f i,v i )= r i (u ik )exp{ r i (t)dt} k=1 T i 27

50 Point Process Submodel Poisson Process model, conditional likelihood on [T i,t + i ], events at {u ik } K i k=1: YK i Z T + i p( ~u i z i,b i,f i,v i )= r i (u ik )exp{ r i (t)dt} k=1 Rate function: hazard function from Cox proportional hazards model T i Common choice in survival analysis 27

51 Point Process Submodel Poisson Process model, conditional likelihood on [T i,t + i ], events at {u ik } K i k=1: YK i Z T + i p( ~u i z i,b i,f i,v i )= r i (u ik )exp{ r i (t)dt} k=1 Rate function: hazard function from Cox proportional hazards model T i Common choice in survival analysis r i (t) =r 0 (t) exp{ > x ir + m i (t)+ m 0 i(t)+v i } 27

52 Point Process Submodel Poisson Process model, conditional likelihood on [T i,t + i ], events at {u ik } K i k=1: YK i Z T + i p( ~u i z i,b i,f i,v i )= r i (u ik )exp{ r i (t)dt} k=1 Rate function: hazard function from Cox proportional hazards model T i Common choice in survival analysis r i (t) =r 0 (t) exp{ > x ir + m i (t)+ m 0 i(t)+v i } piecewise constant baseline rate 27

53 Point Process Submodel Poisson Process model, conditional likelihood on [T i,t + i ], events at {u ik } K i k=1: YK i Z T + i p( ~u i z i,b i,f i,v i )= r i (u ik )exp{ r i (t)dt} k=1 Rate function: hazard function from Cox proportional hazards model T i Common choice in survival analysis r i (t) =r 0 (t) exp{ > x ir + m i (t)+ m 0 i(t)+v i } piecewise constant baseline rate coefficient vector 27

54 Point Process Submodel Poisson Process model, conditional likelihood on [T i,t + i ], events at {u ik } K i k=1: YK i Z T + i p( ~u i z i,b i,f i,v i )= r i (u ik )exp{ r i (t)dt} k=1 Rate function: hazard function from Cox proportional hazards model T i Common choice in survival analysis r i (t) =r 0 (t) exp{ > x ir + m i (t)+ m 0 i(t)+v i } piecewise constant baseline rate coefficient vector baseline covariates 27

55 Point Process Submodel Poisson Process model, conditional likelihood on [T i,t + i ], events at {u ik } K i k=1: YK i Z T + i p( ~u i z i,b i,f i,v i )= r i (u ik )exp{ r i (t)dt} k=1 Rate function: hazard function from Cox proportional hazards model T i Common choice in survival analysis r i (t) =r 0 (t) exp{ > x ir + m i (t)+ m 0 i(t)+v i } piecewise constant baseline rate coefficient vector baseline covariates association between event risk and expected mean/slope of egfr 27

56 Point Process Submodel Poisson Process model, conditional likelihood on [T i,t + i ], events at {u ik } K i k=1: YK i Z T + i p( ~u i z i,b i,f i,v i )= r i (u ik )exp{ r i (t)dt} k=1 Rate function: hazard function from Cox proportional hazards model T i Common choice in survival analysis r i (t) =r 0 (t) exp{ > x ir + m i (t)+ m 0 i(t)+v i } piecewise constant baseline rate coefficient vector baseline covariates association between event risk and expected mean/slope of egfr 27 random effect (frailty term): v i N(0, 2 v)

57 Inference Variational inference: find distribution q in approx. family close in KL to true posterior Equivalently, maximize a lower bound on marginal likelihood: p(y, u) L(q) E q [log p(y, u, z, b, f, v, ) log q(z,b,f,v, )] Mean-field assumption, fully factorized variational family: NY q(z,b,f,v, ) = q( ) q i (z i zi )q i (b i µ bi, bi )q i (v i µ vi, i=1 2 v i )q i (f i ) Variational distributions have same form as prior (multinomial, MVN, N) For f, adapt ideas from sparse GPs, use observation times as pseudo-inputs [Lloyd et al, 2014] Goal: learn optimal var. params. 2, pt. est. i = { zi,µ bi, bi,µ vi, v i,µ fi, fi } ˆ ELBO has closed form, exact gradients with automatic differentiation Stochastic optimization, subsample observations for noisy unbiased gradients wrt 28

58 Related Work Longitudinal model from [Schulam & Saria, 2015] Joint models in biostatistics: [Rizopoulos, 2012], [Proust- Lima et al., 2014] for introductions Typically fit via EM for MLE or MCMC for Bayesian setting In medicine: cross-sectional, data from single time point E.g. [Tangri et al. 2011]; no dynamic predictions, limits clinical utility 29

59 Outline Motivation Working with EHR Data Proposed Joint Model Experiments & Results In Clinical Practice 30

60 Dataset 23,450 patients with moderate stage CKD and 10+ egfr readings CKD definition: 2 egfr readings < 60mL/min, separated by 90+ days Preprocessing: mean in monthly bins egfr only valid estimate of kidney function at steady state 22.9 readings on average (std. dev. 13.6, median 19.0) Alignment: set t=0 to be first egfr reading < 60mL/min Adverse events: AMI, CVA identified using ICD9 codes. Max 1 event / month 13.4% had 1+ AMI code (mean w/ 1+: 4.1, std dev: 7.1, median: 2.0) 17.4% had 1+ AMI code (mean w/ 1+: 6.4, std dev: 13.3, median: 3.0) Baseline covariates: baseline age, race, gender; hypertension, diabetes 31

61 Experimental Setup Use first 60% of egfr trajectory/events to predict last 40% Evaluation metrics: MSE and MAE for held-out egfr values AUROC, AUPR for predicting any event in [T,T+c] as binary classification Longitudinal baseline: [Schulam & Saria, 2015] Point process baselines: Cox regression; rate: r i (t) =r 0 (t) exp{ > x ir } Time-varying Cox using observed egfr; rate: r i (t) =r 0 (t) exp{ > x ir + y i (t)} 32

62 Quantitative Results Longitudinal Submodels MSE MAE [Schulam & Saria, 2015] Joint Model (CVA) Joint Model (AMI) CVA: AUROCs 1 yr. 2 yr. 3 yr. 4 yr. 5 yr. Joint Model Cox Time-varying Cox AMI: AUROCs Joint Model Cox Time-varying Cox CVA: AUPRs Joint Model Cox Time-varying Cox AMI: AUPRs Joint Model Cox Time-varying Cox

63 Joint Model Results 34

64 Outline Motivation Working with EHR Data Proposed Joint Model Experiments & Results In Clinical Practice 35

65 In clinical practice Implementation underway: at Duke Connected Care, an Accountable Care Organization responsible for cost and quality of care of 45,000 Medicare patients 12,000 patients with at least moderate CKD, 1,000 end-stage kidney disease Cost of end-stage disease >$60k per year vs <$10k for average patient Goal: incorporate risk stratification and joint model predictions into dashboard application Identify, better manage care of high-risk patients 36

66 37

67 38

68 Conclusion Novel joint model for longitudinal, point process data First stochastic variational inference algorithm for joint models Actionable implementation in use for CKD patients! Future work: Multivariate in both longitudinal variables and events More flexible models (e.g. beyond Cox assumption) More clinically actionable metrics to evaluate models 39

69 Thank you! 40

70 Acknowledgments Contact: Joint work with: Mark Sendak, M.P.P./M.D. Candidate C. Blake Cameron, M.D. Katherine Heller, Ph.D. 41

71 References C. Lloyd, T. Gunter, M. A. Osborne, and S. J. Roberts. Variational inference for Gaussian process modulated Poisson proceses. ICML, A. Perotte, R. Ranganath, J. S. Hirsch, D. Blei, and N. Elhadad. Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis. Journal of the American Medical Informatics Association, C. Proust-Lima, M. Sene, J. Taylor, H. Jacqmin-Gadda. Joint latent class models for longitudinal and timeto-event data: A review. Statistical Methods in Medical Research 23(1):74-90, D. Rizopoulos. Joint Models for Longitudinal and Time-to-Event Data: With Applications in R. Chapman and Hall / CRC Biostatistics Series, P. Schulam and S. Saria. A Framework for Individualizing Predictions of Disease Trajectories by Exploiting Multi-Resolution Structure. NIPS, N. Tangri, L. A. Stevens, et al. A Predictive Model for Progression of Chronic Kidney Disease to Kidney Failure. JAMA 305(15): ,

72 More Inference Details Sparse GP / pseudo-inputs: full MVN variational distribution at egfr times, true conditional p for rest 43

73 More Inference Details Sparse GP / pseudo-inputs: full MVN variational distribution at egfr times, true conditional p for rest q i (f i (~t i ),f i ( ~u i ),f i (t grid i )) = p(f i ( ~u i ),f i (t grid i ) f i (~t i ))q(f i (~t i ) µ fi, fi ) 43

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers