odeling atient ortality from linical ote M P M C N ombining opic odeling and ntological eature earning with roup egularization for ext lassification C T M G O F T C eong in ee, harmgil ong, and ilos auskrecht J M C H M H
Outline Background Objective Approach: DA and Concept Mapping Data Extraction Model Building Experiment Conclusion
ackground B ortality of patient in ntensive are nit ( ) is an important index since it shows a patient's severity linical progress note, written by physicians and nurses, contains information about physiology of patient in detail ut the note is unstructured free text and building a model with it is challenging task since it resides in high dimensional vector space M C B I C U ICU
Objective Make a model that learns to predict (classify) mortality of patient from text data X: Clinical note, represented as bag of word vectors Y: Binary label of whether patient died or not during a hospital admission
ord as vector W ords can be represented as vectors he bag of word feature (also called one hot vector) epresent every word as an d 1 with all 0s and one 1 he 1 is at the index of that word in the corpus d: number of all words in corpus W T T
Example of bag of word feature et's represent words in a sentence: The fox jumped over the lazy dog the : [1,0,0,0,0,0,0] f ox : [0,1,0,0,0,0,0] jumped : [0,0,1,0,0,0,0] over : [0,0,0,1,0,0,0] the : [0,0,0,0,1,0,0] lazy : [0,0,0,0,0,1,0] dog : [0,0,0,0,0,0,1]
ower dimensional Feature Generation atent Dirichlet Allocation Ontological Feature Mapping
atent Dirichlet Allocation (DA)
Generative Model DA is a generative model Best known as finding hidden topics from document of a corpus
enerative ( robabilistic) odel G P M bserved random variable is assumed to be generated from some probability distribution o knowing the type and the parameter of the probability distribution amounts to knowing the model that generated observed data O S
As generative model DA models Topics as Dirichlet random variable Assignment of each word to topic: multinomial random variable Document exhibits multiple topics: Dirichlet random variable of topics
n other words, each element in I DA is modeled like following ach document = a mixture of corpus wide topics ach topic = a distribution over words ach word = drawn from one of the topics E E E e from D. Blei, et al., atent Dirichlet Allocation (2003) Figur
DA oal: nfer the underlying topic and its structure with words and documene etsa. a e c e ca ef G Figur rom I. D Bl i, t l., t nt Diri hl t Allo tion (2003)
DA on Clinical Notes Number of topics: 50 Model learned with Gibbs sampling
Concept Feature Generation with Ontology
Concepts in Ontology Knowledge entity in ontology We use clinical ontology and concepts are medical entities such as drug, disease, lab test, etc. egard ontology, we used SNOMED CT (you can think of a medical dictionary)
e from http://data.press.net/ Figur
Concept Mapping Process The concept feature consists of concepts matched with clinical text matched UMS (Unified Medical anguage System) concept text mapping framework, Meta Map, is used to matching concept between clinical text and SNOMED CT ontology
estrict matching concept to be in specific semantic types: Anatomical structure, clinical drug, diagnostic procedure, disease or syndrome, pharmacologic substance, sign or symptom, etc emoved concepts occurred less than 20 times esulting number of concept feature is 118
Data Extraction MIMIC 3 Database, publicly available de identified medical record in critical care dataset It contains medical records of ~46,000 patients who hospitalized in Beth Israel Deaconess hospital between 2001 and 2012
Data Extraction (cont'd) abels were created regarding whether a patient died while in the first hospital admission or not High class imbalance problem Only 10.3 percent patients died in the first hospital admission Subsampled data by removing negative class sample randomly (1 2 ratio)
Data Extraction (cont'd) We also limited patient to be between the age of 18 and 99 Notes to be categories of nursing and physician notes emoved all notes written on the day of death or discharge The resulting samples then consist of 5334 patients
Experiment Classification on different feature sets
Classification with egularization We find a compact representation by sparse group regularization on logistic regression We tried different regularization methods to see their effect in classification Note that the objective functions are the negative log likelihood loss with a class encoding of 1 and +1
1 egularization 1 regularization selects features by shrinking coefficient of certain features while solving linear square problem loss = 1 m m i=1 log(1 + exp( y (x w + c))) + λ w i i T 1 where w = 1 j w j
Elastic Net egularization Elastic net is a regularization method using both 1 and 2. loss = 1 m m i=1 log(1 + exp( y (x w + c)))) + λ w + λ w i i T 1 1 2 2 where w = (w ) and w = w 2 j 2 j j 1 j
Sparse Group egularization It penalizes model parameter w with the pre defined structure of features formed as groups with 2 regularization in addition to 1 regularization for each feature. So parameters in same feature set are penalized together with the same amount of penalty. loss = 1 m m i=1 log(1 + exp( y (x w + c))) i i T +λ w + λ b w 1 1 2 w denotes non overlapping groups of features b denotes weight for group j j G j g j=1 j G j 2
esult
ndividual eature ets I F eature set ord ord ord oncept oncept oncept opic opic opic F W W W C C C T T T S egularization o egularization (1) N (2) 1 (3) (1) lastic et ( 1+ 2) o egularization E N N (2) 1 (3) (1) lastic et ( 1+ 2) o egularization E N N (2) 1 (3) lastic et ( 1+ 2) E N onzero feature size N AUC 10229 0.8304 353 0.8561 307 0.8561 118 0.6488 52 0.6557 50 0.6547 50 0.8720 45 0.8725 43 0.8727
Mixed Feature Sets
Topic Proportion Changes over Days to Death
Conclusion Features obtained from medical ontologies and topic modeling improves the modeling of patient mortality using the clinical note It shows us the combination of semantically enriched concepts from ontology and probabilistically dimensionality reduction methods has potential to enhance tasks that have done before separately We also observed that regularizations have effect on improving classification especially when the dimensionality of feature is high by making the representation of feature sparser
Thanks
BACKUP SIDES
DA in more
Notations The corpus consists of D documents and each document is represented as a vector of size N in which a value of n th element present frequency of the n th word in the dth document. D : Dimensionality of documents, d = 1... D N : Dimensionality of words in corpus, n = 1... N K : Dimensionality of topics in corpus, k = 1... K
We will use following notations to describe the DA model W : Observed word Z : Per word topic assignment θ : Per document topic proportions d d,n d,n β : Per corpus topic distribution k α : Dirichlet parameter for θ η : Dirichlet parameter for β
Generative Process The generative process of DA is For each document d in corpus D Choose θ ~ Dir(α) For each of N words w : Choose a topic z ~ Multinomial(θ) n Choose a word w from p(w z, β), a multinomial probability conditioned on the topic z n n n n DA considers each document as a mixture of corpus wide topics β using topic proportions θ Each word W in each document is generated by the topic proportions θ of the document n
esults
ixed eature ets: opic + oncept M F S T egularization o egularization (1) N (2) 1 (3) (4) (4) (4) lastic et ( 1+ 2) parse roup egularization parse roup egularization (2nd group 0.5) parse roup egularization (2nd group +2.0) E N S G S G S G C onzero feature size N AUC 168 0.8703 63 0.8766 60 0.8758 63 0.8766 64 0.8776 63 0.8752
ixed eature ets: ord + oncept M F S W egularization o egularization (1) N (2) 1 (3) (4) (4) (4) lastic et ( 1+ 2) parse roup egularization parse roup egularization (2nd group 0.5) parse roup egularization (2nd group +2.0) E N S G S G S G C onzero feature size N AUC 10317 0.8416 375 0.8632 328 0.8631 375 0.8632 602 0.8704 1291 0.8669
ixed eature ets: ord + opic M F S W egularization o egularization (1) N (2) 1 (3) (4) (4) (4) lastic et ( 1+ 2) parse roup egularization parse roup egularization (2nd group 0.5) parse roup egularization (2nd group +2.0) E N S G S G S G T onzero feature size N AUC 10279 0.8378 255 0.8873 206 0.8872 255 0.8873 461 0.8859 6889 0.8875
ixed eature ets: ord + opic + oncept M F S W egularization o egularization (1) N (2) 1 (3) (4) (4) (4) lastic et ( 1+ 2) parse roup egularization parse roup egularization (2nd group 0.5) parse roup egularization (2nd group +2.0) E N S G S G S G T C onzero feature size N AUC 10397 0.8498 278 0.8925 222 0.8919 278 0.8925 491 0.8916 285 0.8927