odeling atient ortality from linical ote

Size: px

Start display at page:

Download "odeling atient ortality from linical ote"

Priscilla Ray
5 years ago
Views:

1 odeling atient ortality from linical ote M P M C N ombining opic odeling and ntological eature earning with roup egularization for ext lassification C T M G O F T C eong in ee, harmgil ong, and ilos auskrecht J M C H M H

2 Outline Background Objective Approach: DA and Concept Mapping Data Extraction Model Building Experiment Conclusion

ackground B ortality of patient in ntensive are nit ( ) is an important index since it shows a patient's severity linical progress note, written by physicians and nurses, contains

3 ackground B ortality of patient in ntensive are nit ( ) is an important index since it shows a patient's severity linical progress note, written by physicians and nurses, contains information about physiology of patient in detail ut the note is unstructured free text and building a model with it is challenging task since it resides in high dimensional vector space M C B I C U ICU

4 Objective Make a model that learns to predict (classify) mortality of patient from text data X: Clinical note, represented as bag of word vectors Y: Binary label of whether patient died or not during a hospital admission

5 ord as vector W ords can be represented as vectors he bag of word feature (also called one hot vector) epresent every word as an d 1 with all 0s and one 1 he 1 is at the index of that word in the corpus d: number of all words in corpus W T T

6 Example of bag of word feature et's represent words in a sentence: The fox jumped over the lazy dog the : [1,0,0,0,0,0,0] f ox : [0,1,0,0,0,0,0] jumped : [0,0,1,0,0,0,0] over : [0,0,0,1,0,0,0] the : [0,0,0,0,1,0,0] lazy : [0,0,0,0,0,1,0] dog : [0,0,0,0,0,0,1]

7 ower dimensional Feature Generation atent Dirichlet Allocation Ontological Feature Mapping

8 atent Dirichlet Allocation (DA)

9 Generative Model DA is a generative model Best known as finding hidden topics from document of a corpus

10 enerative ( robabilistic) odel G P M bserved random variable is assumed to be generated from some probability distribution o knowing the type and the parameter of the probability distribution amounts to knowing the model that generated observed data O S

11 As generative model DA models Topics as Dirichlet random variable Assignment of each word to topic: multinomial random variable Document exhibits multiple topics: Dirichlet random variable of topics

12 n other words, each element in I DA is modeled like following ach document = a mixture of corpus wide topics ach topic = a distribution over words ach word = drawn from one of the topics E E E e from D. Blei, et al., atent Dirichlet Allocation (2003) Figur

13 DA oal: nfer the underlying topic and its structure with words and documene etsa. a e c e ca ef G Figur rom I. D Bl i, t l., t nt Diri hl t Allo tion (2003)

14 DA on Clinical Notes Number of topics: 50 Model learned with Gibbs sampling

15 Concept Feature Generation with Ontology

16 Concepts in Ontology Knowledge entity in ontology We use clinical ontology and concepts are medical entities such as drug, disease, lab test, etc. egard ontology, we used SNOMED CT (you can think of a medical dictionary)

17 e from Figur

18 Concept Mapping Process The concept feature consists of concepts matched with clinical text matched UMS (Unified Medical anguage System) concept text mapping framework, Meta Map, is used to matching concept between clinical text and SNOMED CT ontology

19 estrict matching concept to be in specific semantic types: Anatomical structure, clinical drug, diagnostic procedure, disease or syndrome, pharmacologic substance, sign or symptom, etc emoved concepts occurred less than 20 times esulting number of concept feature is 118

20 Data Extraction MIMIC 3 Database, publicly available de identified medical record in critical care dataset It contains medical records of ~46,000 patients who hospitalized in Beth Israel Deaconess hospital between 2001 and 2012

21 Data Extraction (cont'd) abels were created regarding whether a patient died while in the first hospital admission or not High class imbalance problem Only 10.3 percent patients died in the first hospital admission Subsampled data by removing negative class sample randomly (1 2 ratio)

22 Data Extraction (cont'd) We also limited patient to be between the age of 18 and 99 Notes to be categories of nursing and physician notes emoved all notes written on the day of death or discharge The resulting samples then consist of 5334 patients

23 Experiment Classification on different feature sets

24 Classification with egularization We find a compact representation by sparse group regularization on logistic regression We tried different regularization methods to see their effect in classification Note that the objective functions are the negative log likelihood loss with a class encoding of 1 and +1

25 1 egularization 1 regularization selects features by shrinking coefficient of certain features while solving linear square problem loss = 1 m m i=1 log(1 + exp( y (x w + c))) + λ w i i T 1 where w = 1 j w j

26 Elastic Net egularization Elastic net is a regularization method using both 1 and 2. loss = 1 m m i=1 log(1 + exp( y (x w + c)))) + λ w + λ w i i T where w = (w ) and w = w 2 j 2 j j 1 j

27 Sparse Group egularization It penalizes model parameter w with the pre defined structure of features formed as groups with 2 regularization in addition to 1 regularization for each feature. So parameters in same feature set are penalized together with the same amount of penalty. loss = 1 m m i=1 log(1 + exp( y (x w + c))) i i T +λ w + λ b w w denotes non overlapping groups of features b denotes weight for group j j G j g j=1 j G j 2

28 esult

29 ndividual eature ets I F eature set ord ord ord oncept oncept oncept opic opic opic F W W W C C C T T T S egularization o egularization (1) N (2) 1 (3) (1) lastic et ( 1+ 2) o egularization E N N (2) 1 (3) (1) lastic et ( 1+ 2) o egularization E N N (2) 1 (3) lastic et ( 1+ 2) E N onzero feature size N AUC

30 Mixed Feature Sets

33 Topic Proportion Changes over Days to Death

34 Conclusion Features obtained from medical ontologies and topic modeling improves the modeling of patient mortality using the clinical note It shows us the combination of semantically enriched concepts from ontology and probabilistically dimensionality reduction methods has potential to enhance tasks that have done before separately We also observed that regularizations have effect on improving classification especially when the dimensionality of feature is high by making the representation of feature sparser

35 Thanks

36 BACKUP SIDES

37 DA in more

38 Notations The corpus consists of D documents and each document is represented as a vector of size N in which a value of n th element present frequency of the n th word in the dth document. D : Dimensionality of documents, d = 1... D N : Dimensionality of words in corpus, n = 1... N K : Dimensionality of topics in corpus, k = 1... K

39 We will use following notations to describe the DA model W : Observed word Z : Per word topic assignment θ : Per document topic proportions d d,n d,n β : Per corpus topic distribution k α : Dirichlet parameter for θ η : Dirichlet parameter for β

40 Generative Process The generative process of DA is For each document d in corpus D Choose θ ~ Dir(α) For each of N words w : Choose a topic z ~ Multinomial(θ) n Choose a word w from p(w z, β), a multinomial probability conditioned on the topic z n n n n DA considers each document as a mixture of corpus wide topics β using topic proportions θ Each word W in each document is generated by the topic proportions θ of the document n

41 esults

42 ixed eature ets: opic + oncept M F S T egularization o egularization (1) N (2) 1 (3) (4) (4) (4) lastic et ( 1+ 2) parse roup egularization parse roup egularization (2nd group 0.5) parse roup egularization (2nd group +2.0) E N S G S G S G C onzero feature size N AUC

43 ixed eature ets: ord + oncept M F S W egularization o egularization (1) N (2) 1 (3) (4) (4) (4) lastic et ( 1+ 2) parse roup egularization parse roup egularization (2nd group 0.5) parse roup egularization (2nd group +2.0) E N S G S G S G C onzero feature size N AUC

44 ixed eature ets: ord + opic M F S W egularization o egularization (1) N (2) 1 (3) (4) (4) (4) lastic et ( 1+ 2) parse roup egularization parse roup egularization (2nd group 0.5) parse roup egularization (2nd group +2.0) E N S G S G S G T onzero feature size N AUC

45 ixed eature ets: ord + opic + oncept M F S W egularization o egularization (1) N (2) 1 (3) (4) (4) (4) lastic et ( 1+ 2) parse roup egularization parse roup egularization (2nd group 0.5) parse roup egularization (2nd group +2.0) E N S G S G S G T C onzero feature size N AUC

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same