Deep Temporal Generative Models of. Rahul Krishnan, Uri Shalit, David Sontag

Size: px

Start display at page:

Download "Deep Temporal Generative Models of. Rahul Krishnan, Uri Shalit, David Sontag"

Giles Julius Manning
5 years ago
Views:

1 Deep Temporal Generative Models of Rahul Krishnan, Uri Shalit, David Sontag

2 Patient timeline Jan 1 Feb 12 May 15 Blood pressure = 130 WBC count = 6*10 9 /L Temperature = 98 F A1c = 6.6% Precancerous cells = 10 4 # flu viruses = 10 6 Thickness of heart artery plaque = 3mm Blood pressure = 135 WBC count = 5.8*10 9 /L Temperature = 99 F A1c = 7.1% Precancerous cells = 10 4 # flu viruses = 10 6 Thickness of heart artery plaque = 3mm Blood pressure = 150 WBC count = 6.8*10 9 /L Temperature = 98 F A1c = 7.7% Precancerous cells = 10 4 # flu viruses = 10 7 Thickness of heart artery plaque = 3.5mm......

3 Patient timeline EHR lens Jan 1 Feb 1 May 1??? Blood pressure = 135 WBC count = Temperature = 99 F A1c =? Precancerous cells =? # flu viruses =? Thickness of heart artery plaque =?.. Blood pressure = 150 WBC count = 6.8*10 9 /L Temperature = 98 F A1c = 7.7% Precancerous cells =? # flu viruses =? Thickness of heart artery plaque =? ICD9 = Diabetes ICD9 = Hypertension..

4 Our goal: model the true patient state True state.. What the records show? Blood pressure = 135 Temperature = 99 F Blood pressure = 150 WBC count = 6.8*10 9 /L Temperature = 98 F A1c = 7.7% ICD9 = Diabetes

5 Our goal: model the true patient state Health interventions Prescribe insulin and Metformin Prescribe statin True state.. What the records show? Blood pressure = 135 Temperature = 99 F Blood pressure = 150 WBC count = 6.8*10 9 /L Temperature = 98 F A1c = 7.7% ICD9 = Diabetes

6 Our goal: model the true patient state Health interventions Prescribe insulin and Metformin Prescribe statin True state What the records show Learn how to go from observed? Blood pressure = 135 Temperature = 99 F.. health records timeline to unobserved patient timline, and back Hard problem - requires a powerful algorithm Blood pressure = 150 WBC count = 6.8*10 9 /L Temperature = 98 F A1c = 7.7% ICD9 = Diabetes

7 Prescribed insulin and Metformin Aug 1 May 1 Blood pressure = 145 A1c = 7.0% ICD9 = Hypertension Blood pressure = 150 A1c = 7.7% precancerous cells =? # flu viruses in sinuses =? Thickness of heart artery plaque =? ICD9 = Diabetes ICD9 = Hypertension Could have prescribed Simvastatin and Glyburide Aug 1 Blood pressure = 135 A1c = 6.5% ICD9 = none

8 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

9 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

10 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

11 Linear Kalman filters Actions u t (e.g., prescribing a medication, performing a surgery) u 1 u T 1 Patient latent state z t R d z 1 z 2... z T Observations x t : Lab test results, diagnosis codes, etc. x 1 x 2... x T Action-transition: z +,- = G + z + + B + u + + ε + Emission: x + = F + z + +η + ε + ~N 0, Σ +, η + ~N 0, Γ +

12 Linear Kalman filters Actions u t (e.g., prescribing a medication, performing a surgery) u 1 u T 1 Patient latent state z t R d z 1 z 2... z T Observations x t : Lab test results, diagnosis codes, etc. x 1 x 2... x T Initial state: z - ) Action-transition: z + ~N(G + z +B- + B + u +B-, Σ + ) Emission: x + ~N(F + z +, Γ + )

13 Linear models are not enough Non linear transitions Non linear emissions u 1 u T 1 z 1 z 2... z T x 1 x 2... x T

14 Deep Kalman filters Actions u t (e.g., prescribing a medication, performing a surgery) u 1 u T 1 Patient latent state z t R d z 1 z 2... z T Observations x t : Lab test results, diagnosis codes, etc. x 1 x 2... x T Action-transition: z +,- = G C (z +, u + ) + ε + Emission: x + = F D z + ε + ~N 0, S F (z +, u + )

15 Deep Kalman filters Actions u t (e.g., prescribing a medication, performing a surgery) u 1 u T 1 Patient latent state z t R d z 1 z 2... z T Observations x t : Lab test results, diagnosis codes, etc. x 1 x 2... x T Initial state: z - ) Action-transition: z + ~N G C z +B-, u +B-, S F z +B-, u +B- Emission: x + ~Π(F D (z + ))

16 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

17 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

18 Deep Kalman filters maximum likelihood Initial state: z - ) θ = α, β, κ Maximum likelihood: Action-transition: z + ~N G C z +B-, u +B-, S F z +B-, u +B- Emission: max x + ~Π(F D (z + )) O p O(x -,, x R u -,, u R )

19 Variational inference x = x -,, x R u = (u -,, u R ) θ = α, β, κ Maximum likelihood: max O p O(x -,, x R u -,, u R )

20 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O x = x -,, x R u = (u -,, u R )

21 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O Z log p (~x ~u) = log Z z p (~x, ~z ~u) d~z = x = x -,, x R u = (u -,, u R )

22 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O Z log p (~x ~u) = log Z log Z z z p (~x, ~z ~u) d~z = q (~z ~x, ~u) p (~x, ~z ~u) q (~z ~x, ~u) d~z x = x -,, x R u = (u -,, u R ) Introduce variational distribution q Y (z x u)

23 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O Z log p (~x ~u) = log Z log Z z z z p (~x, ~z ~u) d~z = q (~z ~x, ~u) p (~x, ~z ~u) q (~z ~x, ~u) d~z q (~z ~x, ~u) log p (~x, ~z ~u) q (~z ~x, ~u) d~z = x = x -,, x R u = (u -,, u R ) Introduce variational distribution q Y (z x u) Jensen s inequality E k

24 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O Z log p (~x ~u) = log p (~x, ~z ~u) d~z = z Z log q (~z ~x, ~u) p (~x, ~z ~u) z q (~z ~x, ~u) d~z Z q (~z ~x, ~u) log p (~x, ~z ~u) q (~z ~x, ~u) d~z = z E q (~z ~x,~u) [log p (~x ~z,~u)] x = x -,, x R u = (u -,, u R ) Introduce variational distribution q Y (z x u) Jensen s inequality KL [q (~z ~x, ~u) k p (~z ~u)] Expected log-likelihood under q Y

25 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O Z log p (~x ~u) = log p (~x, ~z ~u) d~z = z Z log q (~z ~x, ~u) p (~x, ~z ~u) z q (~z ~x, ~u) d~z Z q (~z ~x, ~u) log p (~x, ~z ~u) q (~z ~x, ~u) d~z = z E q (~z ~x,~u) [log p (~x ~z,~u)] x = x -,, x R u = (u -,, u R ) Introduce variational distribution q Y (z x u) Jensen s inequality KL [q (~z ~x, ~u) k p (~z ~u)] Expected log-likelihood under q Y Regularization

26 Variational inference evidence lower bound Z E q (~z ~x,~u) [log p (~x ~z,~u)] KL [q (~z ~x, ~u) k p (~z ~u)] = L (~x;(, )) apple log p (~x ~u)

27 True and approximate posterior q Y (z x, u) is an approximation of the true posterior p O z x, u Using the Markov chain structure we know that: p (~z ~x, ~u) = TY p (z 1 ~x, ~u) p (z t z t 1,x t,...,x T,u t 1,...,u T 1 ) t=2 Use the true factorization in designing q Y (z x, u)

28 The structured variational inference network q Y (z x, u)

29 Deep Kalman filter: summary Optimize jointly over generative model p θ (x u) and variational approximation q φ (z x, u) Stochastic backpropagation (Rezende et al. 2014, Kingma & Welling, 2014) p (~x ~u)

30 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

31 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

32 Learning the effect of anti-diabetic medications 8000 diabetic and pre-diabetic patients: 4 years of data Actions: 9 diabetic drugs including Metformin and Insulin (u + )

33 Learning the effect of anti-diabetic medications 8000 diabetic and pre-diabetic patients: 4 years of data Actions: 9 diabetic drugs including Metformin and Insulin (u + )

34 Learning the effect of anti-diabetic medications 8000 diabetic and pre-diabetic patients: 4 years of data Actions: 9 diabetic drugs including Metformin and Insulin (u + ) Medication u + Patient latent state z` R a Lab test results, diagnosis codes x +

35 The importance of non-linearity Emission linear non-linear linear non-linear Transition non-linear non-linear linear linear

36 Counterfactual reasoning u 1 u T 1 z 1 z 2... z T x 1 x 2... x T

37 Counterfactual reasoning u 1 u T 1 u T = Metformin z 1 z 2... z T z T +1 x 1 x 2... x T x T +1 =?

38 Counterfactual reasoning u 1 u T 1 u T =Glipizide z 1 z 2... z T z T +1 x 1 x 2... x T x T +1 =?

39 Effect of diabetes treatments on glucose Sampling future using the treatments observed in the data

40 Effect of diabetes treatments on glucose Sampling with no treatment

41 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

42 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

43 Broad applications Conducting virtual experiments Run numerous trials using samples from the model Example: best expected outcome for 2 nd line diabetes medication for obese subpopulation Personalized medicine Example: for a specific patient, estimate expected HDL/LDL levels after 6 months of taking each of three potential statins Finding similar patients When a physician is faced with a patient with no clear treatment guidelines, show similar patients, how they were treated and what were the outcomes

44 Sample from model: synthetic patient time

45 Sample from model: synthetic patient time

46 Future work - using prior knowledge Non linear transitions E.g. explicitly modelling transition of seasons Non linear emissions E.g. mechanistic model of lung cancer and radiation u 1 u T 1 z 1 z 2... z T x 1 x 2... x T

47 Broader Applications Time-series model combining deep learning and probabilistic modeling, with efficient learning algorithm Explicit modeling of the effect of interventions on disease progression Framework broadly applicable: Education/MOOCs (What is learned when we ask a specific question?) Climate modeling (What effect does decreasing emissions x% have?) Political science (How much does a politician influence public opinion when he/she posts to Twitter?)

48 Thank you Questions?

Lecture 3: Causal inference

MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal inference Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Uri Shalit for many of the slides) *Last week: Type 2 diabetes 1994 2000