Learning Partially Observable Markov Models from First Passage Times

Lerning Prtilly Oservle Mrkov s from First Pssge s Jérôme Cllut nd Pierre Dupont Europen Conferene on Mhine Lerning (ECML) 8 Septemer 7 Outline. FPT in models nd sequenes. Prtilly Oservle Mrkov s (POMMs). FPT dynmis in POMMs. POMM indution: POMMSTRUCT 5. Experimentl results Computing siene & engineering dept. (INGI) UCL Mhine Lerning Group HMM indution FPT in models nd sequenes Prolem: Estimte the model struture nd its proilisti prmeters from oserved sequenes. To do wht? Predit the future outomes of the proess Predit when future events will our. s = Speil fous: First Pssge s (FPT) etween events of interest FPT(, ) = {,,,, } Contriution: A novel indution lgorithm to indue models from FPT FPT sttistis n e omputed from models or from sequenes The FPT dynmis of proess denotes its FPT distriutions FPT fetures re time unounded fetures unlike N-grms hrterize long-term dependenies nd temporl dynmis

Prtilly Oservle Mrkov s (POMMs) A POMM is HMM suh tht ny stte emits single letter with proility. The sme letter n e emitted y severl sttes...8.5.8..8.9.9..6. We hve shown tht for ny HMM there is n equivlent POMM. We use POMMs s trget formlism (onvenient FPT omputtions).. FPT dynmis in POMMs. d... FPT(,).. d.. q k... to sorption Proility 5.5 to to sorption The distriutions of FPT in POMMs re of phse-type Merge sttes nd into n soring stte Strt in or ording to the reltive proportion of time spent in these sttes The multimodl distriution revels the presene of dominnt pth lengths Long-term dependenies re lso refleted in the FPT dynmis Prtilly Oservle Mrkov s (POMMs) A POMM is HMM suh tht ny stte emits single letter with proility. The sme letter n e emitted y severl sttes...8.5.8..8.9.9..6. We hve shown tht for ny HMM there is n equivlent POMM. We use POMMs s trget formlism (onvenient FPT omputtions). Sttes n e gthered in loks w.r.t. their unique emission FPT oserved in the sequenes onern these loks. POMM dynmis is poorly pproximted y MC. d... FPT(,).. d.. q k... to sorption Proility 5.5 to to sorption Order MC modeling of the POMM s distriution: (estimted from sequenes of length from the POMM) 5.5 FPT(,).58 d. 5.5. to sorption d Proility 5.5 to sorption 5

POMM indution: POMMStrut EP initilize(s, r); F P T extrtfpt(s); P seletdivpirs(ep, F P T, p); EP POMMPHIT(EP, F P T, P, n r ); Lik FPTLikelihood(EP, F P T ); i repet Lik lst Lik; proebloks(ep i, F P T ); EP i+ ddstteinblok(ep i, ); EP i+ POMMPHIT(EP i+, F P T, P, n r ); Lik FPTLikelihood(EP i+, F P T ); i i + until Lik Lik lst Lik lst return {EP,..., EP i } Feture seletion/weighting Proility Proility.5.. 6 8 5.5..5. 5.5 FPT* -> JS = 6.85e- Empiril FPT* -> JS =.7e- Empiril 5 5 Proility 5.5..5. 5.5.5.. D JS (P P ) = H(M) H(P ) H(P ) where M = (P + P ) nd H(.) is the Shnnon entropy Proility FPT* -> JS =.6e- Empiril 6 8 6 8 FPT* -> JS = 8.8e- Empiril 6 8 6 8 FPT pirs re filtered/weighted ording to their JS divergene 6 8 POMM indution: POMMStrut EP initilize(s, r); F P T extrtfpt(s); P seletdivpirs(ep, F P T, p); EP POMMPHIT(EP, F P T, P, n r ); Lik FPTLikelihood(EP, F P T ); i repet Lik lst Lik; proebloks(ep i, F P T ); EP i+ ddstteinblok(ep i, ); EP i+ POMMPHIT(EP i+, F P T, P, n r ); Lik FPTLikelihood(EP i+, F P T ); i i + until Lik Lik lst Lik lst return {EP,..., EP i } POMM indution: POMMStrut EP initilize(s, r); F P T extrtfpt(s); P seletdivpirs(ep, F P T, p); EP POMMPHIT(EP, F P T, P, n r ); Lik FPTLikelihood(EP, F P T ); i repet Lik lst Lik; proebloks(ep i, F P T ); EP i+ ddstteinblok(ep i, ); EP i+ POMMPHIT(EP i+, F P T, P, n r ); Lik FPTLikelihood(EP i+, F P T ); i i + until Lik Lik lst Lik lst return {EP,..., EP i } 7 9

Prmeter estimtion: POMMPHit FPT(,) = {z,,z } l FPT(,) = {z,,z } l +? ML estimtion POMMPHIT is novel EM-sed lgorithm to mximize the FPT likelihood Eh z i is prtil oservtion of ouple (z i, h i ) where h i is the sequene of sttes rehed during the FPT z i Re-estimtion formuls re derived to mximize E [P (Z, H ρ) Z] Additionlly, trimming proedure removes the trnsitions with the lowest expeted pssge times..5.5. POMM indution: POMMStrut EP initilize(s, r); F P T extrtfpt(s); P seletdivpirs(ep, F P T, p); EP POMMPHIT(EP, F P T, P, n r ); Lik FPTLikelihood(EP, F P T ); i repet Lik lst Lik; proebloks(ep i, F P T ); EP i+ ddstteinblok(ep i, ); EP i+ POMMPHIT(EP i+, F P T, P, n r ); Lik FPTLikelihood(EP i+, F P T ); i i + until Lik Lik lst Lik lst return {EP,..., EP i } Reestimtion formul Expettion step S, (q) = N, (q, q ) = Mximiztion step σ κ q = A qq = l k= l k= z k k σqβ (q, z k ) k q κ α, (q, z k ) t= α, (q, t)a qq β (q, z k t ) q κ α, (q, z k ) { (,) P} S, (q) q κ q Q { (,) P} S, (q) if q κ otherwise (,) P N, (q, q ) (,) P N, (q, q ) Computtionl omplexity: O(p L n t ) per itertion where L is the longest FPT nd n t is the numer of trnsitions Adding stte in lok p p k Pred( ) Su( ) q q l s s m Pred( ) Su( ) p P red( ), nd Su( ) need not e disjoint Lol trnsitions re initilized following their type (olor here) POMMPHIT first only estimtes lol trnsitions. The trimming proedure is pplied to these trnsitions. The omplete model is then reestimted with POMMPHIT nd ll trnsitions re ndidte to e trimmed p k q q l q s s m

POMM indution: POMMStrut EP initilize(s, r); F P T extrtfpt(s); P seletdivpirs(ep, F P T, p); EP POMMPHIT(EP, F P T, P, n r ); Lik FPTLikelihood(EP, F P T ); i repet Lik lst Lik; proebloks(ep i, F P T ); EP i+ ddstteinblok(ep i, ); EP i+ POMMPHIT(EP i+, F P T, P, n r ); Lik FPTLikelihood(EP i+, F P T ); i i + until Lik Lik lst Lik lst return {EP,..., EP i } Conlusion nd future work We proposed novel pproh to indue POMMs sed on the FPT dynmis oserved in the smple The FPT re informtive out the struture of the model Struturl indution is mde y itertive stte ddition nd y trnsition trimming Prmeter estimtion is performed y POMMPHIT whih mximizes the model likelihood w.r.t. the oserved FPT Future work Return HMM rther thn POMM Fit FPT etween higher order events 6 Experimentl results FPT Divergene.5..5. 5.5 GenDep : sttes, Σ = POMMStrut Bum-Welh Stolke. Trining dt rtio Perplexity.5.5.5 GenDep : sttes, Σ = POMMStrut Bum-Welh Stolke.5. Trining dt rtio HMM indution overview Prmeter estimtion Stte merging Disrete HMM indution Stte splitting/dding Disrimintive tehniques FPT Divergene....8.6. Splie : Exon -> Intron POMMStrut Bum-Welh Stolke.. Trining dt rtio Perplexity.5..5.95.9.85.8.75.7.65 Splie : Exon -> Intron POMMStrut Bum-Welh Stolke. Trining dt rtio Bum-Welh/ grdient-bsed Brnd ALERGIA/ MDI Stolke Orstendorf POMMStrut Conditionl likelihood Mrgin-sed tehniques - Lol optimiztion - Strong is towrds - Restrited to PDFA - No ler improvement over B-W left-to-right - - Restrited to - - No stndrd method sprse models -> not for prmeter estimtion Not lwys onerned - Long-term dynmis topology with HMM expliitely lwys pproprite dly estimted - Diffiult to estimte in - Often fous on prtie sequene leling - Unsupervized tehniques re time onsuming Contriution 5 7