Deep Temporal Generative Models of. Rahul Krishnan, Uri Shalit, David Sontag

Size: px
Start display at page:

Download "Deep Temporal Generative Models of. Rahul Krishnan, Uri Shalit, David Sontag"

Transcription

1 Deep Temporal Generative Models of Rahul Krishnan, Uri Shalit, David Sontag

2 Patient timeline Jan 1 Feb 12 May 15 Blood pressure = 130 WBC count = 6*10 9 /L Temperature = 98 F A1c = 6.6% Precancerous cells = 10 4 # flu viruses = 10 6 Thickness of heart artery plaque = 3mm Blood pressure = 135 WBC count = 5.8*10 9 /L Temperature = 99 F A1c = 7.1% Precancerous cells = 10 4 # flu viruses = 10 6 Thickness of heart artery plaque = 3mm Blood pressure = 150 WBC count = 6.8*10 9 /L Temperature = 98 F A1c = 7.7% Precancerous cells = 10 4 # flu viruses = 10 7 Thickness of heart artery plaque = 3.5mm......

3 Patient timeline EHR lens Jan 1 Feb 1 May 1??? Blood pressure = 135 WBC count = Temperature = 99 F A1c =? Precancerous cells =? # flu viruses =? Thickness of heart artery plaque =?.. Blood pressure = 150 WBC count = 6.8*10 9 /L Temperature = 98 F A1c = 7.7% Precancerous cells =? # flu viruses =? Thickness of heart artery plaque =? ICD9 = Diabetes ICD9 = Hypertension..

4 Our goal: model the true patient state True state.. What the records show? Blood pressure = 135 Temperature = 99 F Blood pressure = 150 WBC count = 6.8*10 9 /L Temperature = 98 F A1c = 7.7% ICD9 = Diabetes

5 Our goal: model the true patient state Health interventions Prescribe insulin and Metformin Prescribe statin True state.. What the records show? Blood pressure = 135 Temperature = 99 F Blood pressure = 150 WBC count = 6.8*10 9 /L Temperature = 98 F A1c = 7.7% ICD9 = Diabetes

6 Our goal: model the true patient state Health interventions Prescribe insulin and Metformin Prescribe statin True state What the records show Learn how to go from observed? Blood pressure = 135 Temperature = 99 F.. health records timeline to unobserved patient timline, and back Hard problem - requires a powerful algorithm Blood pressure = 150 WBC count = 6.8*10 9 /L Temperature = 98 F A1c = 7.7% ICD9 = Diabetes

7 Prescribed insulin and Metformin Aug 1 May 1 Blood pressure = 145 A1c = 7.0% ICD9 = Hypertension Blood pressure = 150 A1c = 7.7% precancerous cells =? # flu viruses in sinuses =? Thickness of heart artery plaque =? ICD9 = Diabetes ICD9 = Hypertension Could have prescribed Simvastatin and Glyburide Aug 1 Blood pressure = 135 A1c = 6.5% ICD9 = none

8 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

9 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

10 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

11 Linear Kalman filters Actions u t (e.g., prescribing a medication, performing a surgery) u 1 u T 1 Patient latent state z t R d z 1 z 2... z T Observations x t : Lab test results, diagnosis codes, etc. x 1 x 2... x T Action-transition: z +,- = G + z + + B + u + + ε + Emission: x + = F + z + +η + ε + ~N 0, Σ +, η + ~N 0, Γ +

12 Linear Kalman filters Actions u t (e.g., prescribing a medication, performing a surgery) u 1 u T 1 Patient latent state z t R d z 1 z 2... z T Observations x t : Lab test results, diagnosis codes, etc. x 1 x 2... x T Initial state: z - ) Action-transition: z + ~N(G + z +B- + B + u +B-, Σ + ) Emission: x + ~N(F + z +, Γ + )

13 Linear models are not enough Non linear transitions Non linear emissions u 1 u T 1 z 1 z 2... z T x 1 x 2... x T

14 Deep Kalman filters Actions u t (e.g., prescribing a medication, performing a surgery) u 1 u T 1 Patient latent state z t R d z 1 z 2... z T Observations x t : Lab test results, diagnosis codes, etc. x 1 x 2... x T Action-transition: z +,- = G C (z +, u + ) + ε + Emission: x + = F D z + ε + ~N 0, S F (z +, u + )

15 Deep Kalman filters Actions u t (e.g., prescribing a medication, performing a surgery) u 1 u T 1 Patient latent state z t R d z 1 z 2... z T Observations x t : Lab test results, diagnosis codes, etc. x 1 x 2... x T Initial state: z - ) Action-transition: z + ~N G C z +B-, u +B-, S F z +B-, u +B- Emission: x + ~Π(F D (z + ))

16 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

17 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

18 Deep Kalman filters maximum likelihood Initial state: z - ) θ = α, β, κ Maximum likelihood: Action-transition: z + ~N G C z +B-, u +B-, S F z +B-, u +B- Emission: max x + ~Π(F D (z + )) O p O(x -,, x R u -,, u R )

19 Variational inference x = x -,, x R u = (u -,, u R ) θ = α, β, κ Maximum likelihood: max O p O(x -,, x R u -,, u R )

20 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O x = x -,, x R u = (u -,, u R )

21 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O Z log p (~x ~u) = log Z z p (~x, ~z ~u) d~z = x = x -,, x R u = (u -,, u R )

22 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O Z log p (~x ~u) = log Z log Z z z p (~x, ~z ~u) d~z = q (~z ~x, ~u) p (~x, ~z ~u) q (~z ~x, ~u) d~z x = x -,, x R u = (u -,, u R ) Introduce variational distribution q Y (z x u)

23 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O Z log p (~x ~u) = log Z log Z z z z p (~x, ~z ~u) d~z = q (~z ~x, ~u) p (~x, ~z ~u) q (~z ~x, ~u) d~z q (~z ~x, ~u) log p (~x, ~z ~u) q (~z ~x, ~u) d~z = x = x -,, x R u = (u -,, u R ) Introduce variational distribution q Y (z x u) Jensen s inequality E k

24 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O Z log p (~x ~u) = log p (~x, ~z ~u) d~z = z Z log q (~z ~x, ~u) p (~x, ~z ~u) z q (~z ~x, ~u) d~z Z q (~z ~x, ~u) log p (~x, ~z ~u) q (~z ~x, ~u) d~z = z E q (~z ~x,~u) [log p (~x ~z,~u)] x = x -,, x R u = (u -,, u R ) Introduce variational distribution q Y (z x u) Jensen s inequality KL [q (~z ~x, ~u) k p (~z ~u)] Expected log-likelihood under q Y

25 Variational inference θ = α, β, κ Maximum likelihood: max log(p O(x u)) O Z log p (~x ~u) = log p (~x, ~z ~u) d~z = z Z log q (~z ~x, ~u) p (~x, ~z ~u) z q (~z ~x, ~u) d~z Z q (~z ~x, ~u) log p (~x, ~z ~u) q (~z ~x, ~u) d~z = z E q (~z ~x,~u) [log p (~x ~z,~u)] x = x -,, x R u = (u -,, u R ) Introduce variational distribution q Y (z x u) Jensen s inequality KL [q (~z ~x, ~u) k p (~z ~u)] Expected log-likelihood under q Y Regularization

26 Variational inference evidence lower bound Z E q (~z ~x,~u) [log p (~x ~z,~u)] KL [q (~z ~x, ~u) k p (~z ~u)] = L (~x;(, )) apple log p (~x ~u)

27 True and approximate posterior q Y (z x, u) is an approximation of the true posterior p O z x, u Using the Markov chain structure we know that: p (~z ~x, ~u) = TY p (z 1 ~x, ~u) p (z t z t 1,x t,...,x T,u t 1,...,u T 1 ) t=2 Use the true factorization in designing q Y (z x, u)

28 The structured variational inference network q Y (z x, u)

29 Deep Kalman filter: summary Optimize jointly over generative model p θ (x u) and variational approximation q φ (z x, u) Stochastic backpropagation (Rezende et al. 2014, Kingma & Welling, 2014) p (~x ~u)

30 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

31 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

32 Learning the effect of anti-diabetic medications 8000 diabetic and pre-diabetic patients: 4 years of data Actions: 9 diabetic drugs including Metformin and Insulin (u + )

33 Learning the effect of anti-diabetic medications 8000 diabetic and pre-diabetic patients: 4 years of data Actions: 9 diabetic drugs including Metformin and Insulin (u + )

34 Learning the effect of anti-diabetic medications 8000 diabetic and pre-diabetic patients: 4 years of data Actions: 9 diabetic drugs including Metformin and Insulin (u + ) Medication u + Patient latent state z` R a Lab test results, diagnosis codes x +

35 The importance of non-linearity Emission linear non-linear linear non-linear Transition non-linear non-linear linear linear

36 Counterfactual reasoning u 1 u T 1 z 1 z 2... z T x 1 x 2... x T

37 Counterfactual reasoning u 1 u T 1 u T = Metformin z 1 z 2... z T z T +1 x 1 x 2... x T x T +1 =?

38 Counterfactual reasoning u 1 u T 1 u T =Glipizide z 1 z 2... z T z T +1 x 1 x 2... x T x T +1 =?

39 Effect of diabetes treatments on glucose Sampling future using the treatments observed in the data

40 Effect of diabetes treatments on glucose Sampling with no treatment

41 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

42 Outline Deep Kalman filters Probabilistic Model Inference and learning Experiments & Counterfactual reasoning Conclusions and future work

43 Broad applications Conducting virtual experiments Run numerous trials using samples from the model Example: best expected outcome for 2 nd line diabetes medication for obese subpopulation Personalized medicine Example: for a specific patient, estimate expected HDL/LDL levels after 6 months of taking each of three potential statins Finding similar patients When a physician is faced with a patient with no clear treatment guidelines, show similar patients, how they were treated and what were the outcomes

44 Sample from model: synthetic patient time

45 Sample from model: synthetic patient time

46 Future work - using prior knowledge Non linear transitions E.g. explicitly modelling transition of seasons Non linear emissions E.g. mechanistic model of lung cancer and radiation u 1 u T 1 z 1 z 2... z T x 1 x 2... x T

47 Broader Applications Time-series model combining deep learning and probabilistic modeling, with efficient learning algorithm Explicit modeling of the effect of interventions on disease progression Framework broadly applicable: Education/MOOCs (What is learned when we ask a specific question?) Climate modeling (What effect does decreasing emissions x% have?) Political science (How much does a politician influence public opinion when he/she posts to Twitter?)

48 Thank you Questions?

Lecture 3: Causal inference

Lecture 3: Causal inference MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal inference Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Uri Shalit for many of the slides) *Last week: Type 2 diabetes 1994 2000

More information

Learning Representations for Counterfactual Inference. Fredrik Johansson 1, Uri Shalit 2, David Sontag 2

Learning Representations for Counterfactual Inference. Fredrik Johansson 1, Uri Shalit 2, David Sontag 2 Learning Representations for Counterfactual Inference Fredrik Johansson 1, Uri Shalit 2, David Sontag 2 1 2 Counterfactual inference Patient Anna comes in with hypertension. She is 50 years old, Asian

More information

arxiv: v2 [stat.ml] 25 Nov 2015

arxiv: v2 [stat.ml] 25 Nov 2015 Deep Kalman Filters arxiv:1511.05121v2 [stat.ml] 25 Nov 2015 ahul G. Krishnan Uri Shalit David Sontag Courant Institute of Mathematical Sciences New York University Abstract Kalman Filters are one of the

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain Variational Inference in TensorFlow Danijar Hafner Stanford CS 20 2018-02-16 University College London, Google Brain Outline Variational Inference Tensorflow Distributions VAE in TensorFlow Variational

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Causality II: How does causal inference fit into public health and what it is the role of statistics?

Causality II: How does causal inference fit into public health and what it is the role of statistics? Causality II: How does causal inference fit into public health and what it is the role of statistics? Statistics for Psychosocial Research II November 13, 2006 1 Outline Potential Outcomes / Counterfactual

More information

ECE521 Lecture 19 HMM cont. Inference in HMM

ECE521 Lecture 19 HMM cont. Inference in HMM ECE521 Lecture 19 HMM cont. Inference in HMM Outline Hidden Markov models Model definitions and notations Inference in HMMs Learning in HMMs 2 Formally, a hidden Markov model defines a generative process

More information

Probabilistic Causal Models

Probabilistic Causal Models Probabilistic Causal Models A Short Introduction Robin J. Evans www.stat.washington.edu/ rje42 ACMS Seminar, University of Washington 24th February 2011 1/26 Acknowledgements This work is joint with Thomas

More information

Causal Bayesian networks. Peter Antal

Causal Bayesian networks. Peter Antal Causal Bayesian networks Peter Antal antal@mit.bme.hu A.I. 4/8/2015 1 Can we represent exactly (in)dependencies by a BN? From a causal model? Suff.&nec.? Can we interpret edges as causal relations with

More information

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning Diederik (Durk) Kingma Danilo J. Rezende (*) Max Welling Shakir Mohamed (**) Stochastic Gradient Variational Inference Bayesian

More information

Variational Autoencoders (VAEs)

Variational Autoencoders (VAEs) September 26 & October 3, 2017 Section 1 Preliminaries Kullback-Leibler divergence KL divergence (continuous case) p(x) andq(x) are two density distributions. Then the KL-divergence is defined as Z KL(p

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

Bayesian Statistics for Personalized Medicine. David Yang

Bayesian Statistics for Personalized Medicine. David Yang Bayesian Statistics for Personalized Medicine David Yang Outline Why Bayesian Statistics for Personalized Medicine? A Network-based Bayesian Strategy for Genomic Biomarker Discovery Part One Why Bayesian

More information

Scalable Joint Modeling of Longitudinal and Point Process Data:

Scalable Joint Modeling of Longitudinal and Point Process Data: Scalable Joint Modeling of Longitudinal and Point Process Data: Disease Trajectory Prediction and Improving Management of Chronic Kidney Disease Joe Futoma (jdf38@duke.edu) UAI Bayesian Applications Workshop,

More information

Lessons learned from a decade of battling chronic disease on the front lines of local companies.

Lessons learned from a decade of battling chronic disease on the front lines of local companies. Lessons learned from a decade of battling chronic disease on the front lines of local companies. Effecting Positive Change in Employee Health AGENDA Our Experience and Methods Data on employee health What

More information

Disease mapping with Gaussian processes

Disease mapping with Gaussian processes EUROHEIS2 Kuopio, Finland 17-18 August 2010 Aki Vehtari (former Helsinki University of Technology) Department of Biomedical Engineering and Computational Science (BECS) Acknowledgments Researchers - Jarno

More information

Bayesian Statistics Comparing Two Proportions or Means

Bayesian Statistics Comparing Two Proportions or Means Bayesian Statistics Comparing Two Proportions or Means Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma Health Sciences Center May

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION SunLab Enlighten the World FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL

More information

Switching-state Dynamical Modeling of Daily Behavioral Data

Switching-state Dynamical Modeling of Daily Behavioral Data Switching-state Dynamical Modeling of Daily Behavioral Data Randy Ardywibowo Shuai Huang Cao Xiao Shupeng Gui Yu Cheng Ji Liu Xiaoning Qian Texas A&M University University of Washington IBM T.J. Watson

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Bayesian networks as causal models. Peter Antal

Bayesian networks as causal models. Peter Antal Bayesian networks as causal models Peter Antal antal@mit.bme.hu A.I. 3/20/2018 1 Can we represent exactly (in)dependencies by a BN? From a causal model? Suff.&nec.? Can we interpret edges as causal relations

More information

Causal Bayesian networks. Peter Antal

Causal Bayesian networks. Peter Antal Causal Bayesian networks Peter Antal antal@mit.bme.hu A.I. 11/25/2015 1 Can we represent exactly (in)dependencies by a BN? From a causal model? Suff.&nec.? Can we interpret edges as causal relations with

More information

The Lady Tasting Tea. How to deal with multiple testing. Need to explore many models. More Predictive Modeling

The Lady Tasting Tea. How to deal with multiple testing. Need to explore many models. More Predictive Modeling The Lady Tasting Tea More Predictive Modeling R. A. Fisher & the Lady B. Muriel Bristol claimed she prefers tea added to milk rather than milk added to tea Fisher was skeptical that she could distinguish

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Machine Learning. Probabilistic KNN.

Machine Learning. Probabilistic KNN. Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007

More information

Efficient Likelihood-Free Inference

Efficient Likelihood-Free Inference Efficient Likelihood-Free Inference Michael Gutmann http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 8th November 2017

More information

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy

More information

Topic 3: Hypothesis Testing

Topic 3: Hypothesis Testing CS 8850: Advanced Machine Learning Fall 07 Topic 3: Hypothesis Testing Instructor: Daniel L. Pimentel-Alarcón c Copyright 07 3. Introduction One of the simplest inference problems is that of deciding between

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Extending the results of clinical trials using data from a target population

Extending the results of clinical trials using data from a target population Extending the results of clinical trials using data from a target population Issa Dahabreh Center for Evidence-Based Medicine, Brown School of Public Health Disclaimer Partly supported through PCORI Methods

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Deep latent variable models

Deep latent variable models Deep latent variable models Pierre-Alexandre Mattei IT University of Copenhagen http://pamattei.github.io @pamattei 19 avril 2018 Séminaire de statistique du CNAM 1 Overview of talk A short introduction

More information

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan Bayesian Learning CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Bayes Theorem MAP Learners Bayes optimal classifier Naïve Bayes classifier Example text classification Bayesian networks

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

COMP2610/COMP Information Theory

COMP2610/COMP Information Theory COMP2610/COMP6261 - Information Theory Lecture 9: Probabilistic Inequalities Mark Reid and Aditya Menon Research School of Computer Science The Australian National University August 19th, 2014 Mark Reid

More information

Applied Probability. School of Mathematics and Statistics, University of Sheffield. (University of Sheffield) Applied Probability / 8

Applied Probability. School of Mathematics and Statistics, University of Sheffield. (University of Sheffield) Applied Probability / 8 Applied Probability School of Mathematics and Statistics, University of Sheffield 2018 19 (University of Sheffield) Applied Probability 2018 19 1 / 8 Introduction You will have seen probability models.

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Learning Causality. Sargur N. Srihari. University at Buffalo, The State University of New York USA

Learning Causality. Sargur N. Srihari. University at Buffalo, The State University of New York USA Learning Causality Sargur N. Srihari University at Buffalo, The State University of New York USA 1 Plan of Discussion Bayesian Networks Causal Models Learning Causal Models 2 BN and Complexity of Prob

More information

CHAPTER 3 HEART AND LUNG TRANSPLANTATION. Editors: Mr Mohamed Ezani Md. Taib Dato Dr David Chew Soon Ping Dr Ashari Yunus

CHAPTER 3 HEART AND LUNG TRANSPLANTATION. Editors: Mr Mohamed Ezani Md. Taib Dato Dr David Chew Soon Ping Dr Ashari Yunus CHAPTER 3 HEART AND LUNG TRANSPLANTATION Editors: Mr Mohamed Ezani Md. Taib Dato Dr David Chew Soon Ping Dr Ashari Yunus Expert Panel: Mr Mohamed Ezani Md. Taib (Chairperson) Dr Abdul Rais Sanusi Datuk

More information

Evaluating Interventions on Drug Utilization: Analysis Methods

Evaluating Interventions on Drug Utilization: Analysis Methods Evaluating Interventions on Drug Utilization: Analysis Methods Nicole Pratt Quality Use of Medicines and Pharmacy Research Centre University of South Australia For: Colin Dormuth, ScD Associate Professor,

More information

Sections 3.4 and 3.5

Sections 3.4 and 3.5 Sections 3.4 and 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 20 Continuous variables So far we ve

More information

Probability. Chapter 1 Probability. A Simple Example. Sample Space and Probability. Sample Space and Event. Sample Space (Two Dice) Probability

Probability. Chapter 1 Probability. A Simple Example. Sample Space and Probability. Sample Space and Event. Sample Space (Two Dice) Probability Probability Chapter 1 Probability 1.1 asic Concepts researcher claims that 10% of a large population have disease H. random sample of 100 people is taken from this population and examined. If 20 people

More information

Distributed analysis in multi-center studies

Distributed analysis in multi-center studies Distributed analysis in multi-center studies Sharing of individual-level data across health plans or healthcare delivery systems continues to be challenging due to concerns about loss of patient privacy,

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

Medical Question Answering for Clinical Decision Support

Medical Question Answering for Clinical Decision Support Medical Question Answering for Clinical Decision Support Travis R. Goodwin and Sanda M. Harabagiu The University of Texas at Dallas Human Language Technology Research Institute http://www.hlt.utdallas.edu

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Sections 3.4 and 3.5

Sections 3.4 and 3.5 Sections 3.4 and 3.5 Shiwen Shen Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (STAT 205) Continuous variables So far we ve dealt with

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

CHAPTER 3 HEART AND LUNG TRANSPLANTATION. Editors: Mr Mohamed Ezani Md. Taib Dato Dr David Chew Soon Ping Dr Ashari Yunus

CHAPTER 3 HEART AND LUNG TRANSPLANTATION. Editors: Mr Mohamed Ezani Md. Taib Dato Dr David Chew Soon Ping Dr Ashari Yunus CHAPTER 3 Editors: Mr Mohamed Ezani Md. Taib Dato Dr David Chew Soon Ping Dr Ashari Yunus Expert Panel: Mr Mohamed Ezani Md. Taib (Chairperson) Dr Abdul Rais Sanusi Datuk Dr Aizai Azan Abdul Rahim Dr Ashari

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Bayesian Updating: Discrete Priors: Spring

Bayesian Updating: Discrete Priors: Spring Bayesian Updating: Discrete Priors: 18.05 Spring 2017 http://xkcd.com/1236/ Learning from experience Which treatment would you choose? 1. Treatment 1: cured 100% of patients in a trial. 2. Treatment 2:

More information

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007 Cluster Analysis using SaTScan Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007 Outline Clusters & Cluster Detection Spatial Scan Statistic Case Study 28 September 2007 APHEO Conference

More information

Approximate Bayesian inference

Approximate Bayesian inference Approximate Bayesian inference Variational and Monte Carlo methods Christian A. Naesseth 1 Exchange rate data 0 20 40 60 80 100 120 Month Image data 2 1 Bayesian inference 2 Variational inference 3 Stochastic

More information

Data-Efficient Information-Theoretic Test Selection

Data-Efficient Information-Theoretic Test Selection Data-Efficient Information-Theoretic Test Selection Marianne Mueller 1,Rómer Rosales 2, Harald Steck 2, Sriram Krishnan 2,BharatRao 2, and Stefan Kramer 1 1 Technische Universität München, Institut für

More information

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and

More information

Modeling and reasoning with uncertainty

Modeling and reasoning with uncertainty CS 2710 Foundations of AI Lecture 18 Modeling and reasoning with uncertainty Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square KB systems. Medical example. We want to build a KB system for the diagnosis

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Fast Likelihood-Free Inference via Bayesian Optimization

Fast Likelihood-Free Inference via Bayesian Optimization Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology

More information

Use of frequentist and Bayesian approaches for extrapolating from adult efficacy data to design and interpret confirmatory trials in children

Use of frequentist and Bayesian approaches for extrapolating from adult efficacy data to design and interpret confirmatory trials in children Use of frequentist and Bayesian approaches for extrapolating from adult efficacy data to design and interpret confirmatory trials in children Lisa Hampson, Franz Koenig and Martin Posch Department of Mathematics

More information

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models. Terminology, Representation and Basic Problems Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters

More information

Lecture Slides - Part 1

Lecture Slides - Part 1 Lecture Slides - Part 1 Bengt Holmstrom MIT February 2, 2016. Bengt Holmstrom (MIT) Lecture Slides - Part 1 February 2, 2016. 1 / 36 Going to raise the level a little because 14.281 is now taught by Juuso

More information

Probabilistic Models for Learning Data Representations. Andreas Damianou

Probabilistic Models for Learning Data Representations. Andreas Damianou Probabilistic Models for Learning Data Representations Andreas Damianou Department of Computer Science, University of Sheffield, UK IBM Research, Nairobi, Kenya, 23/06/2015 Sheffield SITraN Outline Part

More information

ECO 513 Fall 2008 C.Sims KALMAN FILTER. s t = As t 1 + ε t Measurement equation : y t = Hs t + ν t. u t = r t. u 0 0 t 1 + y t = [ H I ] u t.

ECO 513 Fall 2008 C.Sims KALMAN FILTER. s t = As t 1 + ε t Measurement equation : y t = Hs t + ν t. u t = r t. u 0 0 t 1 + y t = [ H I ] u t. ECO 513 Fall 2008 C.Sims KALMAN FILTER Model in the form 1. THE KALMAN FILTER Plant equation : s t = As t 1 + ε t Measurement equation : y t = Hs t + ν t. Var(ε t ) = Ω, Var(ν t ) = Ξ. ε t ν t and (ε t,

More information

Bayesian belief networks

Bayesian belief networks CS 2001 Lecture 1 Bayesian belief networks Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square 4-8845 Milos research interests Artificial Intelligence Planning, reasoning and optimization in the presence

More information

Survival Analysis with Time- Dependent Covariates: A Practical Example. October 28, 2016 SAS Health Users Group Maria Eberg

Survival Analysis with Time- Dependent Covariates: A Practical Example. October 28, 2016 SAS Health Users Group Maria Eberg Survival Analysis with Time- Dependent Covariates: A Practical Example October 28, 2016 SAS Health Users Group Maria Eberg Outline Why use time-dependent covariates? Things to consider in definition of

More information

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014

Assess Assumptions and Sensitivity Analysis. Fan Li March 26, 2014 Assess Assumptions and Sensitivity Analysis Fan Li March 26, 2014 Two Key Assumptions 1. Overlap: 0

More information

1. Poisson distribution is widely used in statistics for modeling rare events.

1. Poisson distribution is widely used in statistics for modeling rare events. Discrete probability distributions - Class 5 January 20, 2014 Debdeep Pati Poisson distribution 1. Poisson distribution is widely used in statistics for modeling rare events. 2. Ex. Infectious Disease

More information

Case Studies in Bayesian Data Science

Case Studies in Bayesian Data Science Case Studies in Bayesian Data Science 4: The Bootstrap as an Approximate BNP Method David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ucsc.edu Short

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Nuclear Medicine RADIOPHARMACEUTICAL CHEMISTRY

Nuclear Medicine RADIOPHARMACEUTICAL CHEMISTRY Nuclear Medicine RADIOPHARMACEUTICAL CHEMISTRY An alpha particle consists of two protons and two neutrons Common alpha-particle emitters Radon-222 gas in the environment Uranium-234 and -238) in the environment

More information

Study of Changes in Climate Parameters at Regional Level: Indian Scenarios

Study of Changes in Climate Parameters at Regional Level: Indian Scenarios Study of Changes in Climate Parameters at Regional Level: Indian Scenarios S K Dash Centre for Atmospheric Sciences Indian Institute of Technology Delhi Climate Change and Animal Populations - The golden

More information

Dependence structures with applications to actuarial science

Dependence structures with applications to actuarial science with applications to actuarial science Department of Statistics, ITAM, Mexico Recent Advances in Actuarial Mathematics, OAX, MEX October 26, 2015 Contents Order-1 process Application in survival analysis

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

BAYESIAN MACHINE LEARNING.

BAYESIAN MACHINE LEARNING. BAYESIAN MACHINE LEARNING frederic.pennerath@centralesupelec.fr What is this Bayesian Machine Learning course about? A course emphasizing the few essential theoretical ingredients Probabilistic generative

More information

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,

More information

Causal Inference. Prediction and causation are very different. Typical questions are:

Causal Inference. Prediction and causation are very different. Typical questions are: Causal Inference Prediction and causation are very different. Typical questions are: Prediction: Predict Y after observing X = x Causation: Predict Y after setting X = x. Causation involves predicting

More information

Bayesian Updating: Discrete Priors: Spring

Bayesian Updating: Discrete Priors: Spring Bayesian Updating: Discrete Priors: 18.05 Spring 2017 http://xkcd.com/1236/ Learning from experience Which treatment would you choose? 1. Treatment 1: cured 100% of patients in a trial. 2. Treatment 2:

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

State Space and Hidden Markov Models

State Space and Hidden Markov Models State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Mixture Models for Capture- Recapture Data

Mixture Models for Capture- Recapture Data Mixture Models for Capture- Recapture Data Dankmar Böhning Invited Lecture at Mixture Models between Theory and Applications Rome, September 13, 2002 How many cases n in a population? Registry identifies

More information

Variational Inference via Stochastic Backpropagation

Variational Inference via Stochastic Backpropagation Variational Inference via Stochastic Backpropagation Kai Fan February 27, 2016 Preliminaries Stochastic Backpropagation Variational Auto-Encoding Related Work Summary Outline Preliminaries Stochastic Backpropagation

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Causal Discovery by Computer

Causal Discovery by Computer Causal Discovery by Computer Clark Glymour Carnegie Mellon University 1 Outline 1. A century of mistakes about causation and discovery: 1. Fisher 2. Yule 3. Spearman/Thurstone 2. Search for causes is statistical

More information

Markov Models and Hidden Markov Models

Markov Models and Hidden Markov Models Markov Models and Hidden Markov Models Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA Markov Models We have already seen that an MDP provides

More information