Probablsc Model for Tme-seres Daa: Hdden Markov Model Hrosh Mamsuka Bonformacs Cener Kyoo Unversy Oulne Three Problems for probablsc models n machne learnng. Compung lkelhood 2. Learnng 3. Parsng (predcon Defne hdden Markov model (HMM Three problems of HMM Compung lkelhood by forward probables Learnng by Baum-Welch Parsng by Verb Summary Probablsc Model Learnng An approach of Machne learnng : fndng probablsc paerns/rules from gven daa Daa Learnng Rules/Paerns Predcon Probablsc Model Learnng Probablsc model: has probablsc (or probably parameers esmaed from gven daa Unsupervsed learnng One-class daa: No labels aached o gven examples Model M gves a score (a lkelhood for a ranng example X: X M, whch should be hgher by learnng Afer learnng, model M should gve a score for an arbrary example X: X M, whch s exacly predcon Probablsc Model Ex: Fne Mxure Model Cluserng: Groupng examples and assgnng a gven example o a cluser Two varables X: observable varable, correspondng o example : laen varable, correspondng o cluser (#clusers gven Two probablsc parameers : Probably of a cluser X : Probably of an example gven a cluser Lkelhood of a gven example,.e. X M: X X Probablsc Model Ex: Fne Mxure Model Learnng: Esmang X and p( Once learnng s done, he obecve of FMM s o compue X,.e. probably of he cluser assgnmen gven an example Queson: How can we compue X from X and? Answer: Follow he Bayes heorem: X X X X
Three Problems Mus be solved by a probablsc model o be used n real-world machne learnng applcaons. Compung lkelhood: lh compung how lkely l a gven example can be generaed from a model 2. Learnng: esmang probably parameers of a model from gven daa 3. Parsng: fndng he mos lkely se of parameers on an example gven a model Three Problems. Compung lkelhood Lkelhood: X M, score gven for an example by he model Compung lkelhood can be par of parameer esmaon (learnng, for example as maxmum lkelhood lh s used for learnng 2. Learnng Parameer esmaon, he mos sgnfcan par Typcal example: Maxmum lkelhood 3. Parsng Predcon and showng he reason of predcon Can be modfed from lkelhood compuaon Three Problems: Fne Mxure Model. Compung lkelhood Compung X due o he probablsc srucure: L ( X P ( X P ( X P ( 2. Learnng Esmae probablsc parameers: P X, ( 3. Parsng Show he cluser whch maxmes he lkelhood: ˆ arg max X Markov Model Markov propery Curren sae depends only on a fne number of pas saes s order Markov propery Curren sae depends on he prevous sae only Markov model (Markov chan: generaes a srng wh Markov propery Sae ranson: Srng: U (Up U (Up D (Down U (Up Hdden Markov Model (HMM Defned by a sae ranson dagram, showng possble sae ransons, wh Sae ranson probably a an edge Leer generaon probably a a node 0.3 s U: s3 0.2 U:0. Generaes a srng, say UUDU, by a sae ranson pah, say ss3s3, wh he lkelhood of xxxx0.9xx0. s s3 s3 0.9 0. U U D U -o-many Correspondence beween Srng and Sae Transon Pahs 0.3 s U: s3 UUDU 2 0.2 U:0. s s3s3 : xxxx0.9xx0. ss3s : xxxx0.9x0.2x ss s3 : x0.3xxxxx0. Sum lkelhood by he model Mos probable sae ranson pah s hdden! 2
Defne HMM More Formally Inpu Sae ranson dagram Sae n gven sae se: s S The se of saes: M Daa: Srngs me-seres examples Srng n gven srng se: Λ Maxmum lengh of a srng: Two ypes of probably parameers Sae ranson probably a an edge for saes o : a Leer generaon probably a node (of he +- h leer: : b + Lkelhood of sae ranson π Ξ for gven srng : L, π T Three Problems for HMMs. Compung lkelhood whch s he lkelhood gven o a srng by he model, beng equal o he sum of all lkelhoods by all sae ranson pahs 2. Learnng s o esmae wo ypes of probably parameers, gven srngs 3. Parsng s o fnd he sae ranson pah, whch gves he maxmum lkelhood 0.3 s U: Compung Lkelhood s3 UUDU 2 0.2 U:0. s : xxxx0.9xx0. L π s3s (, 3 ss3s : xxxx0.9x0.2x L, π 2 ss s3 : x0.3xxxxx0. L, π 3 Sum of he lkelhoods of all possble sae ranson pahs he lkelhood gven o he srng UUDU by he model: L(, π π Ξ Compung Lkelhood Need enumerang all sae ranson pahs, gven a srng and probably parameers Sum of he lkelhoods, each beng ha for a pah > combnaoral hardness: T O( M 0.3 s U: UUDU 2 s 3 S 0.2 U:0. : xxxx0.9xx0. ss3s3 ss3s : xxxx0.9x0.2x ss s3 : x0.3xxxxx0. Effcen compuaon manner needed: Dynamc Programmng! Revew: Dynamc Programmng In he case where subproblems can be solved repeaedly, solve smpler problems frs and save he resul Ex: Fbonacc number:,, 2, 3, 5, 8, 3, 2, Recursve algorhm for compung Fbonacc number whch looks bref and very nce Algorhm: fb(n { f( n < reurn ; else reurn fb(n - + fb(n - 2 ; } Revew: Dynamc Programmng Example: Fbonacc number Bu hs algorhm needs compung all pas numbers for each number Trace of he recursve calculaon of Fbonacc number: Makes complexy of fb(n an exponenal order! 3
Revew: Dynamc Programmng Example: Fbonacc number Soluon for hs problem: use a able o save, nsead of recursve compuaon! Complexy of new_fb(n: O(n Algorhm: new_fb(n { f( n < reurn ; las ; nextolas ; answer ; for( 2 ; < n ; ++ { answer las + nextolas ; nextolas las ; las answer ; } reurn answer ; } s a Trells Two-dmenson of Tme x Saes Makes easy o undersand he dynamc programmng process of HMM learnng A sae ranson on HMM s a lne char on Trells s 2 Model (k b s 3 : sae : ranson : Label oupu saes Tme (Srng Forward Probably: [, ] Gven a srng, he probably ha he curren sae s and subsrng [..] s generaed,.e. he probably coverng he frs par of he srng Can be compued by dynamc programmng over, due o Markov propery p Updang formula: a b α [ α, ] [, ] α Compung Lkelhood wh Forward Probables Compue forward probables, ncremenng, fnally havng he lkelhood gven a srng and a model: L, π αt ( Complexy: O ( M saes 2 T O ( M ( α 3 Can be compued n O( M 2 T where M s he se of saes and T s he srng lengh - T me Tranng HMM (Learnng Parameers of HMM Probably parameers raned (esmaed from srngs (me-seres examples A sandard manner s maxmum lkelhood for gven srngs, based on EM (Expecaon- Maxmaon algorhm UUDDU DUUDDD UDUUD UUDDUU DDDUUD Parameer esmaon Maxme he lkelhood of gven srngs 0.3 s U: s3 0.2 U:0. EM Algorhm n General Noaon Observable varable: X Laen varable: Parameer se: Dsrbuon: P Purpose Maxme he lkelhood of observable varables.e. oban parameers whch maxme he lkelhood: ˆ arg max ( X P 4
EM Algorhm n General Noaon Observable varable: X Laen varable: Parameer se: Dsrbuon: P Q funcon: Q ( ; ' P ( X, log P X, Nce propery of Q funcon: Q( ; ' > Q( ; P X > P ( X ' Q ( ; ' > Q( ; Ths means f we fnd sasfyng, we can make P X > P ( X Q( ; ' > Q( ; P X > P ( X Proof: Q( ; ' Q( ; P ( X, log P ( X, ' P X, P ( X, log P ( X, P X, P ( X, ( P ( X, P ( X, P ( X, P ( X P ( X ' ' P ( X, log P ( X, (log x x If Q( ; ' Q( ; s posve, P ( X P ( X mus be posve. ' EM Algorhm n General. Choose nal parameer values 2. Repea followng wo seps alernaely unl convergence E-sep: Compue Q funcon: Q( ; ' M-sep: Choose new arg max Q ' ( ; ' EM Algorhm for HMM Baum-Welch algorhm Correspondence Observable varable srng: Laen varable sae ranson pah: π Ξ Dsrbuon lkelhood: lk l L Q funcon: Q( ; ' P ( X, log P X, L, π log L, π π Ξ ' Problem: Fnd new arg max Q ' ( ; ' Dervaon of Baum-Welch (E-sep Assume { a } meanng ha we here focus on sae ranson probables only Q funcon can be derved: Q ( ; ' π Ξ L, π log L, π L, π π Ξ log( a' L, π, π L, π π means he expecaon value of sae ranson wh saes from o π ' log( a' π π + ( π π π,..., π E-sep of Baum-Welch Expecaon value compuaon needed Coun he number of ranson pahs from sae o sae EP [#((,, ] L, π π Enumerae all sae ranson pahs, havng he ranson from sae o sae Is enumerang all hese sae ranson pahs possble??? 5
Expecaon Value Compuaon Enumerang all possble pahs havng ceran sae ranson T > combnaoral hardness! : O( M Compung Expecaon Value for Saes o We wan o know #pahs havng saes o Frs, we fx, 0.3 s s 3 S saes U: UUDU 2 0.2 U:0. ss3s3 : xxxx0.9xx0. ss3s : xxxx0.9x0.2x ss s3 : x0.3xxxxx0. me Forward Probably Agan α [, ] :Gven a srng, he probably ha he curren sae s and subsrng [..] s generaed,.e. he probably coverng he frs par of he srng Can be compued by dynamc programmng over Updang formula: a b α [ α, ] [, ] Backward Probably: [, ] Gven a srng, he probably ha he curren sae s and subsrng [n] s generaed,.e. he probably coverng he las par of he srng Can be compued by dynamc programmng over n he reverse drecon, by he followng updang rule: Sae β [, ] ab + β [ +, ] β Can be compued n O( M 2 T + Tme Compung Expecaon Value for Transon of Saes o a Forward probables cover all possble sae ranson pahs a sae and me for he frs par of gven srng Backward probables cover all possble sae ranson pahs a sae and me + for he las par of gven srng By usng hese wo, we can have he expecaon value of he sae ranson pahs wh sae o saes α [, ] a b + β [ +, ] Compung Expecaon Value for Transon of Saes o We can furher sum he followng over all possble : α [, ] a b + β [ +, ] L, π α [, ] ab [ +, ] + β saes π - + +2 me - + +2 me 6
E-sep of Baum-Welch E-sep s o compue Q funcon, bu Baum- Welch nsead he expecaon values can be compued Tha s, expecaon values on he sae ranson from o : L, π α + [, ] ab [, ] + β π Sae α [, ] β [ +, ] Baum-Welch Algorhm. Choose nal values for probably parameers 2. Repea E- and M-seps alernaely E-sep: Compues expecaon values (#couns for each sae ranson (or leer generaon M-sep: Updaes probably parameers usng expecaon values - + +2 Tme Dervaon of Baum-Welch (M-sep Derved Q funcon: Q( ; ' log( a' L, π, π The problem s o maxme K f ( x,..., xk c log( x c Ths problem s maxmed by x K f x, x 0 c Ths drecly derves he updang rule of M-sep: L, π α ( ab + β+ ( α ( ab + π aˆ L, π α ( ab + β+ ( α ( β ( β+ π Ξ,, ( sae M-sep of Baum-Welch Updae sae ranson probably by usng he expecaon value and he lkelhood aˆ α ( ab + β+ ( α ( β ( ( α a(, β + ( - + +2 Lkelhood of all pahs wh Lkelhood of all pahs me Baum-Welch Algorhm. Choose nal values for probably parameers 2. Ieraes E- and M-seps alernaely unl convergence E-sep: α [, ]. Compue forward probables: 2. Compue backward probables: β [, ] 3. Compue he expecaon value of sae ranson from o usng forward and backward probables: E P [#((,, ] α [, ] ab( + β [ +, ] M-sep:. Updae ranson probably a usng expecaon values: EP [#((,, ] aˆ E [#((,, ] P Summary of Baum-Welch Algorhm for esmang probably parameers of HMM.e. Algorhm for ranng HMM EM (Expecaon-Maxmaon xm algorhm, meanng ha he soluon s local opmum of maxmum lkelhood Makes smple enumeraon effcen by 3 dynamc programmng: O( M T O( M 7
Parsng for HMM Gven a srng, we can compue lkelhoods for all possble sae ranson pahs Among hem, we call he sae ranson whch gves he maxmum he maxmum lkelhood pah, whch s exacly he soluon of parsng Queson: How can we compue ha effcenly? Parsng for HMM Queson: How can we compue ha effcenly? If we ry o enumerae all possble sae ranson pahs, compuaonal hardness agan! Soluon: Remember forward probables Replace wh `max Keep he maxmum pah α+ ( α ( ab + α+ ( max α ( ab + Parsng for HMM Verb Algorhm Compung maxmum a each me (leer and remember he prevous sae so ha he maxmum pah s raceable fnally saes - ( α me α + ( α ( ab + α+ ( max α ( ab + Three Problems for Hdden Markov Model. Compung lkelhood: Compung forward probables unl he las leer of a gven srng 2. Learnng Maxmng he lkelhood by Baum-Welch, an EM (Expecaon-Maxmaon algorhm 3. Parsng Verb algorhm, a modfcaon of compung forward probables Example: Profle HMM Allows o algn mulple srng (amno acd sequences o fnd conserved regon (called consensus or mof Sae ranson dagram Traned Leer generaon probables b (called profle Tranng Profle HMM Tranng srngs example: ADTC WAEC VEC ADC AEC Three ypes of saes: M: normal sae, for mporan (conserved amno acds D: any leer no generaed, for amno acds deleon I: a leer generaed accordng o a fxed unform dsrbuon, for unmporan (unconserved amno acds 8
Consensus by Profle HMM Fnd consensus from M saes Have mulple algnmen by checkng he mos lkely sae pah Ex. ADTC: Parsng! A (M:4 D (M2:0.4 T (I2:0.05 C (M3:0.92 Mulple algnmen Consensus Profle Fnal Remark Three Problems for probablsc models n machne learnng. Compung lkelhood 2. Learnng 3. Parsng (predcon Defne hdden Markov model (HMM Three problems of HMM Compung lkelhood by forward probables Learnng by Baum-Welch Parsng by Verb Example: Profle HMM 9