MACHINE LEARNING. Learning Bayesian networks

Size: px

Start display at page:

Download "MACHINE LEARNING. Learning Bayesian networks"

Lynette Tyler
5 years ago
Views:

1 Iowa Sae Unversy MACHINE LEARNING Vasan Honavar Bonformacs and Compuaonal Bology Program Cener for Compuaonal Inellgence, Learnng, & Dscovery Iowa Sae Unversy Iowa Sae Unversy Learnng Bayesan newors E B Daa + Pror nformaon L R A C E B P(A E,B) e b e b e b e b

2 Iowa Sae Unversy The Learnng Problem Complee Daa Incomplee Daa Known Srucure Sascal paramerc esmaon (closed-form eq.) Paramerc opmzaon (EM, graden descen...) Unnown Srucure Dscree opmzaon over srucures (dscree search) Combned (Srucural EM, mure models ) Iowa Sae Unversy Complee Daa Learnng Problem Known Srucure Sascal paramerc esmaon (closed-form eq.) Unnown Srucure Dscree opmzaon over srucures (dscree search) Incomplee Daa Paramerc opmzaon (EM, graden descen...) Combned (Srucural EM, mure models ) E B P(A E,B) e b e b e b e b???????? E E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y>.. <N,Y,Y> A B L E B E B P(A E,B) e b A e b e b e b

3 Iowa Sae Unversy Complee Daa Learnng Problem Known Srucure Sascal paramerc esmaon (closed-form eq.) Unnown Srucure Dscree opmzaon over srucures (dscree search) Incomplee Daa Paramerc opmzaon (EM, graden descen...) Combned (Srucural EM, mure models ) E B P(A E,B) e b e b e b e b???????? E E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y>.. <N,Y,Y> A B L E B E B P(A E,B) e b A e b e b e b Iowa Sae Unversy Complee Daa Learnng Problem Known Srucure Sascal paramerc esmaon (closed-form eq.) Unnown Srucure Dscree opmzaon over srucures (dscree search) Incomplee Daa Paramerc opmzaon (EM, graden descen...) Combned (Srucural EM, mure models ) E B P(A E,B) e b e b e b e b???????? E E, B, A <Y,N,?> <Y,?,Y> <N,N,Y> <?,Y,Y>.. <N,?,Y> A B L E B E B P(A E,B) e b A e b e b e b

4 Iowa Sae Unversy Complee Daa Learnng Problem Known Srucure Sascal paramerc esmaon (closed-form eq.) Unnown Srucure Dscree opmzaon over srucures (dscree search) Incomplee Daa Paramerc opmzaon (EM, graden descen...) Combned (Srucural EM, mure models ) E B P(A E,B) e b e b e b e b???????? E E, B, A <Y,N,?> <Y,?,Y> <N,N,Y> <?,Y,Y>.. <N,?,Y> A B L E B E B P(A E,B) e b A e b e b e b Iowa Sae Unversy Learnng Bayesan Newors Known Srucure Unnown Srucure Complee daa Incomplee daa» Parameer learnng: Complee daa (Revew) Sascal paramerc fng Mamum lelhood esmaon Bayesan nference Parameer learnng: Incomplee daa Srucure learnng: Complee daa Applcaon: classfcaon Srucure learnng: Incomplee daa 4

5 Iowa Sae Unversy Esmang probables from daa (dscree case) Mamum lelhood esmaon Bayesan esmaon Mamum a poseror esmaon Iowa Sae Unversy Bayesan esmaon Trea he unnown parameers as random varables Assume a pror dsrbuon for he unnown parameers Updae he dsrbuon of he parameers based on daa Use Bayes rule o mae predcon 5

6 Iowa Sae Unversy Bayesan Newors and Bayesan Predcon θ Y X θ X θ X X[] X[2] X[M] X[M+] m X[m] θ Y X Y[] Y[2] Y[M] Y[M+] Y[m] Observed daa Query Plae noaon Prors for each parameer group are ndependen Daa nsances are ndependen gven he unnown parameers Iowa Sae Unversy Bayesan Newors and Bayesan Predcon θ Y X θ X θ X X[] X[2] X[M] X[M+] m X[m] θ Y X Y[] Y[2] Y[M] Y[M+] Y[m] Observed daa Query Plae noaon We can read from he newor: Complee daa poserors on parameers are ndependen 6

7 Iowa Sae Unversy Bayesan Predcon (con.) Snce poserors on parameers for each node are ndependen, we can compue hem separaely Poserors for parameers whn a node are also ndependen: θ X θ X m X[m] θ Y X Refned model m X[m] θ Y X0 θ Y X Y[m] Y[m] Complee daa he poserors on θ Y X0 and θ Y X are ndependen Iowa Sae Unversy Bayesan Predcon Gven hese observaons, we can compue he poseror for each mulnomal θ X pa ndependenly The poseror s Drchle wh parameers α(x pa )+N (X pa ),, α(x pa )+N (X pa ) The predcve dsrbuon s hen represened by he parameers ~ α (, pa ) + θ pa α ( pa ) + N (, pa N ( pa ) ) 7

8 Iowa Sae Unversy Assgnng Prors for Bayesan Newors We need he α(,pa ) for each node j We can use nal parameers Θ 0 as pror nformaon Need also an equvalen sample sze parameer M 0 Then, we le α(,pa ) M 0 P(,pa Θ 0 ) Ths allows updae of a newor n response o new daa Iowa Sae Unversy Learnng Parameers Comparng wo dsrbuon P() (rue model) vs. Q() (learned dsrbuon) -- Measure her KL Dvergence KL( P Q) P( ) P( )log Q( ) KL(P Q) 0 KL(P Q) 0 ff are P and Q equal 8

9 Iowa Sae Unversy Learnng Parameers: Summary Esmaon reles on suffcen sascs For mulnomal hese are of he form N (,pa ) Parameer esmaon ˆ θ pa N (, pa ) N ( pa ) MLE ~ θ α (, pa ) + N (, pa ) α ( pa ) N ( pa ) pa + Bayesan (Drchle) Bayesan mehods also requre choce of prors Boh MLE and Bayesan esmaes are asympocally equvalen and conssen bu he laer wor beer wh small samples Boh can be mplemened n an on-lne manner by accumulang suffcen sascs Iowa Sae Unversy Complee Daa Learnng Problem Known Srucure Sascal paramerc esmaon (closed-form eq.) Unnown Srucure Dscree opmzaon over srucures (dscree search) Incomplee Daa Paramerc opmzaon (EM, graden descen...) Combned (Srucural EM, mure models ) E B P(A E,B) e b e b e b e b???????? E E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y>.. <N,Y,Y> A B L E B E B P(A E,B) e b A e b e b e b

10 Iowa Sae Unversy Why do we need accurae srucure? Earhquae Alarm Se Burglary Sound Mssng an arc Eraneous arc Earhquae Alarm Se Burglary Earhquae Alarm Se Burglary Sound Sound Canno be compensaed for by fng parameers Incorrec ndependence assumpons Increases he number of parameers o be esmaed Incorrec ndependence assumpons Iowa Sae Unversy Approaches o BN Srucure Learnng Score based mehods assgn a score o each canddae BN srucure usng a suable scorng funcon Search he space of canddae newor srucures for a BN srucure wh he mamum score Independence esng based mehods Use ndependence ess o deermne he srucure of he newor 0

11 Iowa Sae Unversy Score-based BN Srucure Learnng Defne scorng funcon ha evaluaes how well a srucure maches he daa E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y>.. <N,Y,Y> E A B E A B B E A Search for a srucure ha mamzes he score Iowa Sae Unversy Need for parsmony

12 Iowa Sae Unversy Basc dea: Mnmum descrpon lengh (MDL) prncple h argma P( h D) h h MAP MAP MDL P( D h) P( h) argma h H P( D) argma P( D h) P( h) h H argmn h H h H argmn h H ( log P( D h) log P( h) ) C ( D h) + C ( ) ( ) D h h h We need o desgn a scorng funcon ha mnmzes he descrpon lengh of he hypohess and he descrpon lengh of he daa gven he hypohess. In hs case, he hypohess s a Bayesan newor whch represens a jon probably dsrbuon Iowa Sae Unversy Scorng funcon A BN scorng funcon consss of A erm ha corresponds o he number of bs needed o encode he BN srucure and parameers A erm ha corresponds o he number of bs needed o encode he daa gven he BN We proceed o specfy each of hese erms 2

13 Iowa Sae Unversy Encodng a Bayesan Newor I suffces o ls he parens of each node record he condonal probables assocaed wh each node Consder a BN wh n varables. Consder a node wh parens. We need log 2 n bs o ls s parens. Suppose he node (varable X ) aes s dsnc values. Suppose he j h paren aes s j dsnc values. Suppose we use d bs o sore each condonal probably. Iowa Sae Unversy Encodng a Bayesan Newor Under he encodng scheme descrbed, he descrpon lengh of a parcular Bayesan newor s gven by n ( ) ( ) log + 2 n d s s j X j Parens( X ) 3

14 Iowa Sae Unversy Encodng he Daa Suppose we have M ndependen observaons (nsanaons) of he random varables X...X n Le V be he doman of random varable X Each observaon corresponds o an aomc even e.. V V V2 n Le p be he probably of e When M s large, we epec M p occurrences of e among he M observaons. Under opmal encodng, he number of bs needed o encode he daa s M p log ( p ) e V... V n 2 Iowa Sae Unversy Encodng he Daa..usng a Bayesan newor Bu.. We do no now p - he probably of e! Wha we have nsead s a Bayesan newor whch assgns a probably q o e When we use he learned newor o encode he daa, he number of bs needed o encode he newor (and hence he daa usng he newor) s M e V... V n ( ) p log2 q 4

15 Iowa Sae Unversy Encodng he Daa..usng a Bayesan newor Theorem (Gbbs): M e V... V wh equaly holdng f and only f p n log 2 ( p ) M p log2 ( q ) e V... V Number of bs needed o encode he daa f rue probables of each aomc even are nown s less han or equal o he number of bs needed o encode he daa usng a code based on he esmaed probables. p q n Iowa Sae Unversy Pung he wo ogeher MDL prncple recommends mnmzng he sum of he encodng lenghs of he model (Bayes newor) and he encodng lengh of he daa usng he model N ( ) ( ) log2 n + d s s j M p log2( q ) X j Parens( X ) e V... Vn Problems wh evaluang he second erm: we do no now he probables p he second erm requres summaon over all aomc evens (all nsanaons of he n random varables) 5

16 Iowa Sae Unversy Kullbac-Lebler Dvergence o he rescue! Le P and Q be wo probably dsrbuons over he same even space such ha an even e s assgned probably p by P and q by Q KL( P Q) p KL( P Q) 0 KL( P Q) 0ff P Q ( logp logq ) Iowa Sae Unversy Kullbac-Lebler Dvergence o he rescue! Theorem: The encodng lengh of he daa s a monooncally ncreasng funcon of he KL dvergence beween he dsrbuon Q defned by he model and he rue dsrbuon P. Hence, we can use he esmaed KL dvergence as a proy for he encodng lengh of he daa (usng he model) o score a model. We can use local compuaons over a Bayes newor o evaluae KL( P Q) p log p logq ( ) 6

17 Iowa Sae Unversy Applyng he MDL Prncple Ehausve search over he space of all newors nfeasble! Evaluang KL-dvergence drecly s nfeasble! Hence we need o Resor o a heursc search o fnd a newor wh a near mnmal descrpon lengh Develop a more effcen mehod of evaluang KL dvergence of a canddae newor Iowa Sae Unversy Evaluang KL dvergence for a newor Theorem (Chou and Lu, 969). Suppose we defne muual nformaon beween any wo nodes X and X j as P ( ) ( ) ( X, X j ) W X, X j P X, X j log2 P X P X ( X, X ) j ( ) ( ) Then he cross enropy KL(P Q) over all ree-srucured dsrbuons s mnmzed when he graph represenng Q(X.. X n ) s a mamum wegh spannng ree of he graph where he edge beween nodes X and X j s assgned he wegh equal o W(X, X j ). The resulng ree srucured model can be shown o correspond o he mamum lelhood model among all ree srucured models j 7

18 Iowa Sae Unversy Evaluang KL dvergence for a newor Theorem (Lam and Bacchus, 994). Suppose we defne wegh measure beween a nodes X and a se of arbrary parens Parens(X ) W ( X, Parens( X )) P( X, Parens( X )) ( X, Parens( X )) P( X, Parens( X )) ( X ) P( Parens( X )) Then he cross enropy KL(P Q) for a Bayesan newor represenng Q(X.. X n ) s a monooncally decreasng n funcon of W X, Parens( X ), Parens( X ) ( ) Hence, KL(P Q) s mnmzed f and only f hs sum of weghs s mamzed log 2 P Iowa Sae Unversy In words If we fnd a Bayes newor ha mamzes n, Parens W ( X, Parens ( X )) ( X ) Then he probably dsrbuon Q modeled by newor wll be closes o he underlyng dsrbuon P from whch he daa have been sampled wh respec o KL(P Q) Theorem: I s always possble o decrease KL(P Q) by addng arcs o he newor. Hence he need for MDL! 8

19 Iowa Sae Unversy In summary We need o fnd a Bayes newor ha mamzes n W, Parens( X ) ( X, Parens( X )) Whle mnmzng N ( ) ( ) log + 2 n d s s j X j Parens( X ) Iowa Sae Unversy Each X s Alernave Scorng Funcons - Noaon aes r ( X ) gven he h nsanaon of s paren se Parens N N j j j are he pseudocouns (from he Drchle pror) ; dsnc values aes he jh value n s doman η ; ( X ) are he observed couns for he correspondng nsanaon j X Parens j j N r θ probably ha X η j η j j 9

20 Iowa Sae Unversy Bayesan scorng funcon Le B G, P θ be a Bayesan newor wh graph srucure G and probably dsrbuon P( θ ) over a se of n random varables X Pror probably dsrbuon p(b) over he newors Poseror probably gven daa D s gven by p ( G, θ D) ( ( )) p p p ( G, θ, D) p( D) p ( G, θ, D) p( G, θ, D) G, θ ( G) p( θ D, G) p( D G, θ ) p( G, θ ) p( D G, θ ) D, θ ( G) p( θ D, G) p( D G, θ ) p G, θ ( D, G, θ ) p( D, G, θ ) p( G,θ ) Iowa Sae Unversy p n r s ( D G θ ) Bayesan scorng funcon where n s he number of random varables r s he number of parens of node s s he number of nsanaons of he parens of node N he correspondng couns esmaed from D j p p η ( θ G), θ ( θ G, D) j n j r j n r s θ s j N j he correspondng pseudocouns j ηj j θ N j j + η j 20

21 Iowa Sae Unversy Geger Hecerman Scorng Funcon Geger-Hecerman Measure for a BN wh graph G and parameers Θ Q GH ( G, D) log p( G) + log p( D G, Θ) p( Θ G) dθ n s r Γ ( ) ( η ) Γ( η + ) j N j log p G + log + ( ) log ( ) Γ η + N j Γ η Drawbac Does no eplcly penalze comple newors j Iowa Sae Unversy Cooper-Hersovs Scorng Funcon Cooper-Hersovs Measure for a BN wh graph G and parameers Θ Q CH ( G, D) log p( G) + Γ( r ) ( r + N ) n s r log + log Γ( + Nj ) Γ Drawbac Does no eplcly penalze comple newors j 2

22 Iowa Sae Unversy Sandard Bayesan Measure Sandard Bayesan Measure for a BN wh graph G and parameers Θ Q G, D log p G Bayes ( ) ( ) + n r s ( Nj + ηj ) ( N + η ) j log j η j ( N + ( r ) ) Dm( G) log N 2 where Dm( G) s he number of parameers n he BN and N s he sample sze log N s he average number of bs needed o sore 2 a number beween and N Iowa Sae Unversy Q Sandard Bayesan Measure Asympoc verson Asympoc verson of he sandard Bayesan Measure for a BN wh graph G and parameers Θ AsymBayes ( G, D) Q ( G, D) MDL log p n r s N N j ( G) + N log Dm( G) log N j j 2 22

23 Iowa Sae Unversy Q I Asympoc Informaon Measures log p G ( B, D) ( ) f f + Nj j dm( B) f ( D ) where f ( D ) ( D ) 0 for ( D ) for ( D ) ( log N ) for N log N s a non -negave penaly funcon mamum lelhood nformaon creron Aae nformaon creron f Schwarz nformaon creron 2 Noe :MDL s a specal case of hs measure j Iowa Sae Unversy Srucure Search as Opmzaon Inpu: Tranng daa Scorng funcon Se of possble srucures Oupu: A newor ha mamzes he score Key Compuaonal Propery: Decomposably: score(g) score ( famly of X n G ) 23

24 Iowa Sae Unversy Tree-Srucured Newors MINVOLSET Trees: A mos one paren per varable Why rees? Elegan mahemacs we can eacly and effcenly solve he opmzaon problem Sparse parameerzaon PCWP avod overfng TPR HYPOVOLEMIA LVEDVOLUME CVP PULMEMBOLUS PAP CO SHUNT LVFAILURE SAO2 STROEVOLUME INTUBATION MINOVL HISTORY PVSAT INSUFFANESTH CATECHOL VENTLUNG VENTALV ARTCO2 HRBP KINKEDTUBE ERRBLOWOUTPUT EXPCO2 HR HREKG PRESS VENTMACH ERRCAUTER HRSAT VENITUBE DISCONNECT BP Iowa Sae Unversy Learnng Trees Le p() denoe paren of X We can wre he Bayesan score as Score ( G : D) Score( X : Pa ) ( ( X : X ) Score( X )) p( ) Score + Score( X ) Score sum of edge scores + consan Improvemen over empy newor Score of empy newor 24

25 Iowa Sae Unversy Learnng Trees Se w(j ) Score( X j X ) - Score(X ) Fnd ree (or fores) wh mamal wegh --Sandard ma spannng ree algorhm O(n 2 log n) Theorem: Ths procedure fnds ree wh ma score Iowa Sae Unversy Beyond Trees When we consder more comple newor, he problem s no as easy Suppose we allow a mos wo parens per node A greedy algorhm s no longer guaraneed o fnd he opmal newor In fac, no effcen algorhm ess Theorem: Fndng mamal scorng srucure wh a mos parens per node s NP-hard for > 25

26 Iowa Sae Unversy Heursc Search Defne a search space: search saes are possble srucures operaors mae small changes o srucure Traverse space loong for hgh-scorng srucures Search echnques: Greedy hll-clmbng Bes frs search Smulaed Annealng... Iowa Sae Unversy K2 Algorhm (Cooper and Hersovs) Sar wh an ordered ls of random varables For each varable X add o s paren se, a node ha s lower numbered han X and yelds he mamum mprovemen n score Repea unl score does no mprove or a complee newor s obaned Dsadvanage: Requres an ordered ls of nodes 26

27 Iowa Sae Unversy B Algorhm (Bunne) Sar wh he paren se for each random varables nalzed o an empy se A each sep, add a ln (a node o he paren se of some node), ha does no nroduce a cycle and yelds he mamum mprovemen n score Repea unl score does no mprove or a complee newor s obaned Iowa Sae Unversy Local Search Sar wh a gven newor empy newor bes ree a random newor A each eraon Evaluae all possble changes Apply change based on score Sop when no modfcaon mproves score 27

28 Iowa Sae Unversy Heursc Search Typcal operaons: S C To updae score afer local change, only rescore famles ha changed S E C S Delee C E E D C Add C D Reverse C E S E D Δscore S({C,E} D) -S({E} D) E C D D Iowa Sae Unversy 2 Learnng n Pracce: Alarm newor KL Dvergence from rue dsrbuon Srucure nown, f parameers Learn boh srucure and parameers #samples

29 Iowa Sae Unversy Local Search: Possble Pfalls Local search can ge suc n: Local Mama All one-edge changes reduce he score Plaeau Some one-edge changes leave he score unchanged Sandard heurscs can escape boh Random resars TABU search Smulaed annealng Iowa Sae Unversy Independence Based Mehods Rely on ndependence ess o decde wheher o add lns beween nodes n he srucure search phase Need o penalze for comple srucures Hard o bea a fully conneced newor! In he mos general seng, here are oo many ndependence ess o consder Somemes s possble o nfer addonal ndependences based on nown (or nfered) ndependences (See Bromberg e al., 2006 and references ced heren) 29

30 Iowa Sae Unversy Srucure Search: Summary Dscree opmzaon problem In some cases, opmzaon problem s easy Eample: learnng rees In general, NP-Hard Need o resor o heursc search Or resrc connecvy each node assumed o have no more han l parens where l s much smaller han n Sochasc search e.g., smulaed annealng, genec algorhms Iowa Sae Unversy Srucure Dscovery Tas: Dscover srucural properes Is here a drec connecon beween X & Y Does X separae beween wo subsysems Does X causally effec Y Eample: scenfc daa mnng Dsease properes and sympoms Ineracons beween he epresson of genes 30

31 Iowa Sae Unversy P(G D) Dscoverng Srucure E B R A C Model selecon Pc a sngle hgh-scorng model Use ha model o nfer doman srucure Iowa Sae Unversy Dscoverng Srucure P(G D) E B E B E B E B E B R A R A R A R A R A C C C C C Problem Small sample sze many hgh scorng models Answer based on one model ofen useless Wan feaures common o many models 3

32 Iowa Sae Unversy Bayesan Approach Poseror dsrbuon over srucures Esmae probably of feaures Edge X Y Pah X Y P ( f D) f ( G) P( G D) G Bayesan score for G Feaure of G, e.g., X Y Indcaor funcon for feaure f Iowa Sae Unversy MCMC over Newors Canno enumerae srucures, so sample srucures MCMC Samplng Defne Marov chan over BNs Run chan o ge samples from poseror P(G D) n P( f ( G) D) Possble pfalls: f ( G ) n Huge (super-eponenal) number of newors Tme for chan o converge o poseror s unnown Islands of hgh poseror, conneced by low brdges 32

33 Iowa Sae Unversy Fed Orderng Suppose ha We now he orderng of varables say, X > X 2 > X 3 > X 4 > > X n 2 n log n parens for X mus be n X,,X - newors Lm number of parens per nodes o Inuon: Order decouples choce of parens Choce of Pa(X 7 ) does no resrc choce of Pa(X 2 ) Upsho: Can compue effcenly n closed form Lelhood P(D p) Feaure probably P(f D, p) Iowa Sae Unversy Sample Orderngs We can wre P( f D) P( f p, D) P( p D) p Sample orderngs and appromae P( f D) P( f p, D) MCMC Samplng Defne Marov chan over orderngs Run chan o ge samples from poseror P (p D) n 33

34 Iowa Sae Unversy Applcaon: Gene epresson Daa Analyss Fredman e al., 200 Inpu: Measuremen of gene epresson under dfferen condons Thousands of genes Hundreds of epermens Oupu: Models of gene neracon Uncover pahways Iowa Sae Unversy Mang response Subsrucure SST2 KAR4 TEC NDJ KSS FUS PRM AGA YLR343W AGA2 TOM6 FIG FUS3 YLR334C MFA STE6 YEL059W Auomacally consruced sub-newor of hgh-confdence edges Almos eac reconsrucon of yeas mang pahway 34

35 Iowa Sae Unversy Complee Learnng Problem Known Srucure Sascal paramerc esmaon (closed-form eq.) Unnown Srucure Dscree opmzaon over srucures (dscree search) Incomplee Paramerc opmzaon (EM, graden descen...) Combned (Srucural EM, mure models ) E B P(A E,B) e b?? e b?? e b?? e b?? E, B, A <Y,N,N> <Y,?,Y> <N,N,Y> <N,Y,?>.. <?,Y,Y> E A B L E A B E B P(A E,B) e b.9. e b.7.3 e b.8.2 e b.99.0 Iowa Sae Unversy Incomplee Daa Daa are ofen ncomplee Some varables of neres are no assgned values Ths phenomenon occurs when we have Mssng values Some varables unobserved n some nsances Hdden varables Some varables are never observed We mgh no even now hey es 35

36 Iowa Sae Unversy Hdden (Laen) Varables Why should we care abou hdden varables? X X 2 X 3 X X 2 X 3 H Y Y 2 Y 3 Y Y 2 Y 3 7 parameers 59 parameers Iowa Sae Unversy Incomplee Daa In he presence of ncomplee daa, he lelhood can have mulple mama H Y Eample: If H has wo values, lelhood has wo mama In pracce, many local mama 36

37 Iowa Sae Unversy Epecaon Mamzaon (EM) A general purpose mehod for learnng from ncomplee daa Inuon: If we had rue couns, we could esmae parameers Bu wh mssng values, couns are unnown We complee couns usng probablsc nference based on curren parameer assgnmen We use compleed couns as f hey were real o reesmae parameers Iowa Sae Unversy Epecaon Mamzaon (EM) P(YH XH,ZT,Θ) 0.3 Curren model P(YH XT,Θ) 0.4 X H T H H T Daa Y?? H T T Z T?? T H Epeced Couns N (X,Y ) X Y # H H.3 T H 0.4 H T.7 T T.6 37

38 Iowa Sae Unversy Epecaon Mamzaon (EM) Ierae Inal newor (G,Θ 0 ) Updaed newor (G,Θ ) X X 2 X 3 H Y Y 2 Y 3 + Tranng Daa Compuaon (E-Sep) Epeced Couns N(X ) N(X 2 ) N(X 3 ) N(H, X, X, X 3 ) N(Y, H) N(Y 2, H) N(Y 3, H) Reparameerze X X 2 X 3 H (M-Sep) Y Y 2 Y 3 Iowa Sae Unversy Formal Guaranees L(Θ :D) L(Θ 0 :D) Epecaon Mamzaon (EM) Each eraon mproves he lelhood If Θ Θ 0, hen Θ 0 s a saonary pon of L(Θ:D) Usually, hs means a local mamum 38

39 Iowa Sae Unversy Epecaon Mamzaon (EM) Compuaonal bolenec: Compuaon of epeced couns n E-Sep Need o compue poseror for each unobserved varable n each nsance of ranng se All poserors for an nsance can be derved from one pass of sandard BN nference Iowa Sae Unversy Summary of Parameer Learnng from Incomplee Daa Incomplee daa maes parameer esmaon hard Lelhood funcon Does no have closed form Is mulmodal Fndng ma lelhood parameers: EM Graden ascen Boh eplo nference procedures for Bayesan newors o compue epeced suffcen sascs 39

40 Iowa Sae Unversy Learnng Problem Known Srucure Unnown Srucure E, B, A <Y,N,N> <Y,?,Y> <N,N,Y> <?,Y,Y>.. <N,Y,?> E B P(A E,B) e b?? e b?? e b?? e b?? Complee Incomplee E A B Sascal paramerc esmaon (closed-form eq.) Paramerc opmzaon (EM, graden descen...) L Dscree opmzaon over srucures (dscree search) Combned (Srucural EM, mure models ) E A B E B P(A E,B) e b.9. e b.7.3 e b.8.2 e b.99.0 Iowa Sae Unversy Incomplee Daa: Srucure Scores Recall, Bayesan score: P( G D) P( G) P( D G) P( G) Wh ncomplee daa: Canno evaluae margnal lelhood n closed form We have o resor o appromaons: Evaluae score around MAP parameers Need o fnd MAP parameers (e.g., EM) P( D G, Θ) P( Θ G) dθ 40

41 Iowa Sae Unversy Srucural EM Recall, n complee daa we had Decomposon effcen search Idea: Insead of opmzng he real score Fnd decomposable alernave score Such ha mamzng new score mprovemen n real score Iowa Sae Unversy Srucural EM Idea: Use curren model o help evaluae new srucures Oulne: Perform search n (Srucure, Parameers) space A each eraon, use curren model for fndng eher: Beer scorng parameers: paramerc EM sep or Beer scorng srucure: srucural EM sep 4

42 Iowa Sae Unversy Ierae X X 2 X 3 H Y Y 2 Y 3 + Compuaon Epeced Couns N(X ) N(X 2 ) N(X 3 ) N(H, X, X, X 3 ) N(Y, H) N(Y 2, H) N(Y 3, H) Score & Parameerze X X 2 X 3 H Y Y 2 Y 3 Tranng Daa N(X 2, X ) N(H, X, X 3 ) N(Y, X 2 ) N(Y 2, Y, H) X X 2 X 3 H Y Y 2 Y 3 Iowa Sae Unversy Some Addonal Graphcal Models (Fne) Mure Models Graphcal models for sequence daa Marov Models and Hdden Marov Models Undreced graphcal models Marov newors Marov random felds 42

43 Iowa Sae Unversy Fne Mure Models p( ) K K K p(, c) p( c) p( c) p( c, θ ) α Componen Model Wegh Parameers Iowa Sae Unversy Gaussan mures: Eample: Mure of Gaussans K p( ) p( c, θ) α Each mure componen s a muldmensonal Gaussan wh s own mean μ and covarance shape Σ e.g., K2, -dm: {θ, α} {μ, σ, μ 2, σ 2, α } 43

44 Iowa Sae Unversy 2.5 Componen Models p() Mure Model p() Iowa Sae Unversy Eample: Mure of Naïve Bayes K p( ) p( c, θ) α p d ( c j c θ j, θ ) p(, ) Condonal Independence model for each componen (ofen que useful as a frs-order appromaon) 44

45 Iowa Sae Unversy Inerpreaon of Mures C has a drec (physcal) nerpreaon e.g., C {age of fsh}, C {male, female} C mgh have an nerpreaon e.g., clusers of Web surfers C s jus a convenen laen varable e.g., fleble densy esmaon Iowa Sae Unversy Graphcal Models for Mures E.g., Mures of Naïve Bayes: C (dscree, hdden) X X 2 X 3 (observed) 45

46 Iowa Sae Unversy Sequenal Mures C C C X X 2 X 3 X X 2 X 3 X X 2 X 3 Tme - Tme Tme + Marov Mures C has Marov dependence Hdden Marov Model (here wh naïve Bayes) C dscree sae, couples observables Iowa Sae Unversy Mure densy P( θ) where θ ( θ c componen denses P( ω, θ ). j j j j 2 3 mng parameers, θ2,..., θc ) P( ω ) Tas To use samples drawn from hs mure densy o esmae he unnown parameer vecor θ. Once θ s nown, we can decompose he mure no s componens 46

47 Iowa Sae Unversy Idenfably of mure densy A densy P ( θ ) s sad o be denfable f θ θ mples ha here ess an such ha P( θ ) P ( θ ) Eample -- Consder he case where s bnary and P ( θ) s he mure: P( θ ) θ ( θ) + θ2 ( θ2) 2 2 ( θ + θ2) f 2 - ( θ + θ2) f 0 2 Assume ha: P ( θ) 0.6 P ( 0 θ ) 0.4 whch mples θ + θ 2.2 Bu we canno deermne mure (why?) Iowa Sae Unversy Idenfyng mure dsrbuons Undenfably of he mure dsrbuon suggess mpossbly of unsupervsed learnng Mures of many commonly encounered densy funcons (e.g., Gaussans) are usually denfable. Dscree dsrbuons, especally when here are many componens n he mure, ofen resul n more unnowns han here are ndependen equaons, mang denfably mpossble unless oher addonal nformaon s avalable. Whle can be shown ha mures of normal denses are usually denfable, bu here are scenaros where hs s no he case 47

48 Iowa Sae Unversy Idenfyng mure dsrbuons Whle can be shown ha mures of normal denses are usually denfable, bu here are scenaros where hs s no he case P( ω ) 2 P( ω2 ) P( θ) ep ( θ) + ep ( θ2 ) 2π 2 2π 2 Canno be unquely denfed f P(ω ) P(ω 2 ) because θ (θ, θ 2 ) and θ (θ 2, θ ) are wo possble vecors ha can be nerchanged whou affecng P ( θ ) we canno recover a unque θ even from an nfne amoun of daa! We focus on hose cases n whch he mure dsrbuons are denfable 2 Iowa Sae Unversy Learnng Mures from Daa Consder fed K e.g., Unnown parameers Θ {μ, σ, μ 2, σ 2, α α 2 } Gven daa D {,. N }, we wan o fnd he parameers Θ ha bes f he daa 48

49 Iowa Sae Unversy Mamum Lelhood Prncple assume a probablsc model lelhood p(daa parameers, model) fnd he parameers ha mae he daa mos lely L( Θ) p( D Θ) N N p( K Θ) whch n he case of p( c, θ ) α a mure model reduces o Iowa Sae Unversy The EM Algorhm Dempser, Lard, and Rubn (977) general framewor for lelhood-based parameer esmaon wh mssng daa sar wh nal guesses of parameers Esep: esmae membershps gven params Msep: esmae params gven membershps Repea unl convergence converges o a (local) mamum of lelhood E-sep and M-sep are ofen compuaonally smple generalzes o mamum a poseror (wh prors) 49

50 Iowa Sae Unversy The EM Algorhm for Learnng Componens of a Mure Dsrbuon Smlar o he applcaon of EM for handlng mssng arbue values n Bayesan newors. I s he class label ha s mssng n he enre daa se! Iowa Sae Unversy EM Mehod for Mure Esmaon Suppose ha we have a se D {,, n } of n unlabeled samples drawn ndependenly from he mure densy L( Θ) p( D Θ) where Θ N N p( Θ) p( K Θ) p( K p( ω, θ ω, θ ) α { θ... θ, α... α } K ) α K 50

51 Iowa Sae Unversy EM Mehod for Mure Esmaon Mamum lelhood esmae Θ ˆ argma p( D Θ) wh p( D Θ) Θ Log lelhood s N p( Θ) l N ln p( Θ)) N ln K α p( ω, θ )) Iowa Sae Unversy EM Mehod for Mure Esmaon Because he ω are unnown, we model hem as a se of hdden random varables and ae he epecaon over possble values of Ω Unforunaely, esmang he dsrbuon of requres nowledge of Θ. To brea hs cycle, we sar wh a guess for Θ Θ ) E l Ω N P ( ω Θˆ ) K, ln α p( ω, θ ) 5

52 52 Iowa Sae Unversy EM Mehod for Mure Esmaon Wh a b of algebra, hs epresson can be smplfed o yeld We pc he ne guess for Θ so as o mamze he above epecaon subjec o he consran ( ) ( ) Ω Θ N K p P l E ( ln ˆ, ω α ω α K usng he sandard approach usng he Lagrange mulpler Iowa Sae Unversy Updae equaons for Θ: ( ) ( ) ( ) Θ Θ Θ Θ N N K j j j j P P p p, P ω where, ω P N ˆ, ˆ, ˆ ) ˆ, ( ˆ ) ˆ, ( ˆ ˆ ) ˆ ( ω ω θ α θ ω α θ ω α : Mamum lelhood mure denfcaon

53 Iowa Sae Unversy Eample: Mures of Normal Mures p ( ω, θ ) ~ N (μ, Σ ) Possble cases Case μ? Σ P (ω ) c 2??? 3???? Iowa Sae Unversy Case Unnown mean vecors μ θ,, c d / 2 / 2 [ 2π) ] ( μ ) ln p ( ω, μ ) ln( ( μ ) 2 μˆ n n P( ω P( ω, μˆ ), μˆ ) P( ω ˆ s he fracon of hose samples havng value, μ ) ha come from he h class, and ˆμ s he average of he samples comng from he h class. () 53

54 Iowa Sae Unversy Mamum lelhood mure denfcaon Unforunaely, equaon () does no gve ˆμ eplcly However, f we have some way of obanng good nal esmaes ˆ μ ( 0 ) for he unnown means, equaon () provde a way o apply EM algorhm μˆ ( j + ) n n P( ω P( ω, μˆ( j)), μˆ( j)) Iowa Sae Unversy Graden based ML denfcaon of mures Eample Consder he smple wo-componen onedmensonal normal mure 2 2 p( μ, μ 2 ) ep ( μ) + ep ( μ 2 ) 3 2π 2 3 2π 2 (2 clusers!) se μ -2, μ 2 2 and draw 25 samples sequenally from hs mure. The log-lelhood funcon s: l( μ n, μ 2 ) ln p( μ, μ 2 ) 2 54

Iowa Sae Unversy The mamum value of l occurs a: Graden based ML denfcaon of mures μ ˆ 2. 30 and μ ˆ 2 668.

55 Iowa Sae Unversy The mamum value of l occurs a: Graden based ML denfcaon of mures μ ˆ and μ ˆ whch are no far from he rue values: μ -2 and μ 2 +2 Iowa Sae Unversy Idenfyng mures of normals when all parameers unnown If no consrans are placed on he covarance mar ML prncple resuls n useless sngular soluons because s possble o mae he lelhood arbrarly large. In pracce, we ge useful resuls by focusng on he larges of he fne local mama of he lelhood funcon or by applyng he Mnmum descrpon lengh prncple. 55

56 56 Iowa Sae Unversy Mamum Lelhood esmaon of mures of normals he general case θ ω μ μ θ ω Σ θ ω θ ω μ θ ω ω n n n n n P P P P P n P ˆ), ˆ( ) ˆ )( ˆ ˆ)(, ˆ( ˆ ˆ), ˆ( ˆ), ˆ( ˆ ˆ), ˆ( ) ˆ( ω μ Σ μ Σ ω μ Σ μ Σ θ ω c j j j j j j P P P ) ˆ( ) ˆ ( ˆ ) ˆ ( ep ˆ ) ˆ( ) ˆ ( ˆ ) ˆ ( ep ˆ), ˆ( / / Iowa Sae Unversy Marov Models, Hdden Marov Models Oulne Bag of words, n-grams, and relaed models Marov models Hdden Marov models Hgher order Marov models Varaons on Hdden Marov Models Applcaons

57 Iowa Sae Unversy Applcaons of Sequence Classfers Speech recognon Naural language processng Te processng Gesure recognon Bologcal sequence analyss gene denfcaon proen classfcaon Iowa Sae Unversy Bag of words, n-grams and relaed models Map arbrary lengh sequences o fed lengh feaure represenaons Bag of words represen sequences by feaure vecors wh as many componens as here are words n he vocabulary n-grams shor subsequences of n leers Ignore relave orderng of words or n-grams along he sequence ca chased he mouse and mouse chased he ca have dencal bag of words represenaons 57

58 Iowa Sae Unversy Bag of words, n-grams and relaed models Fed lengh feaure represenaons mae possble o apply machne learnng mehods ha wor wh feaure-based represenaons Feaures Gven (as n he case of words Englsh vocabulary) Dscovered from daa sascs of occurrence of n-grams n daa If varable lengh n-grams are allowed, need o ae no accoun possble overlaps Compuaon of n-gram frequences can be made effcen usng dynamc programmng f a srng appears mes n a pece of e, any subsrng of he srng appears a leas mes n he e Iowa Sae Unversy Marov models (Marov Chans) A Marov model s a probablsc model of symbol sequences n whch he probably of he curren even s depends only on he mmedaely precedng even. Consder a sequence of random varables X, X 2,, X N. Thn of he subscrps as ndcang word poson n a senence or a leer poson n a sequence Recall ha a random varable s a funcon In he case of senences made of words, he range of he random varables s he vocabulary of he language. In he case of DNA sequences, he random varables ae on values from a 4-leer alphabe {A, C, G, T} 58

59 Iowa Sae Unversy Smple Model - Marov Chans Marov Propery: The sae of he sysem a me + only depends on he sae of he sysem a me P[ X + + X, X - -,..., X, X 0 0 ] P[ X + + X ] X X 2 X 3 X 4 X 5 Iowa Sae Unversy Marov chans The fac ha subscrp appears on boh he X and he n X s a b abusve of noaon. I mgh be beer o wre: P ( X s, X s,..., X 2 2 s ) where { v... v } Range( X ) j s j L j In wha follows, we wll abuse noaon 59

60 Iowa Sae Unversy Marov Chans Saonary: Probables are ndependen of P[ X j X ] a + j Ths means ha f sysem s n sae, he probably ha he sysem wll ranson o sae j s p j regardless of he value of Iowa Sae Unversy Descrbng a Marov Chan A Marov chan can be descrbed by he ranson mar A and nal sae probables Q: a j + P( X j X ) q P( X ) T, K, X T ) P( X ) P( X 2 X ) P( X T X T ) q X P( X K A( X, X + ) 60

61 Iowa Sae Unversy Two ways o represen he condonal probably able of a frs-order Marov process Curren symbol.7.7 Ne Symbol A B C A B C A Sample srng: CCBBAAAAABAABACBABAAA C.5 B Iowa Sae Unversy The probably of generang a srng Produc of probables, one for each erm n he sequence T 2 T p({ X } ) p( X ) p( X X ) Ths comes from he able of nal probables Ths means a sequence of symbols from me o me T Ths s a ranson probably 6

62 Iowa Sae Unversy The fundamenal quesons Lelhood Gven a model μ (A,Q), how can we effcenly compue he lelhood of an observaon P (X μ )? For any sae sequence (X,,X T ): P ( X,..., X T ) q a a L a T T Learnng Gven a se of observaon sequences X, and a generc model, how can we esmae he parameers ha defne he bes model o descrbe he daa? Use sandard esmaon mehods mamum lelhood or Bayesan esmaes dscussed earler n he course Iowa Sae Unversy Smple Eample of a Marov model Weaher ranng oday ran omorrow a rr 0.4 ranng oday no ran omorrow a rn 0.6 no ranng oday ran omorrow a nr 0.2 no ranng oday no ran omorrow a rr

63 Iowa Sae Unversy Smple Eample of a Marov model 0. 3 Q A Noe ha boh he ranson mar and he nal sae mar are Sochasc Marces (rows sum o ) n general, he ranson probables beween wo saes need no be symmerc ( a j a j ) and he probably of ranson from a sae o self ( a ) need no be zero Iowa Sae Unversy Types of Marov models Ergodc models Ergodc model - Srongly conneced dreced pah wh posve probables from each sae o each sae j (bu no necessarly a complee dreced graph). Tha s, for all,j a j >0; a >0 63

64 Iowa Sae Unversy Types of Models LR models Lef-o-Rgh (LR) model -- Inde of sae non-decreasng wh me Iowa Sae Unversy Marov models wh absorbng saes A each play Gambler wns $ wh probably p or Gambler loses $ wh probably -p Game ends when gambler goes broe, or gans a forune of $00 -- Boh $0 and $00 are absorbng saes p p p p 0 2 N- N -p -p -p -p Sar (0$) 64

65 Iowa Sae Unversy Coe vs. Peps Gven ha a person s las cola purchase was Coe, here s a 90% chance ha her ne cola purchase wll also be Coe. If a person s las cola purchase was Peps, here s an 80% chance ha her ne cola purchase wll also be Peps coe peps 0.2 Iowa Sae Unversy Coe vs. Peps Gven ha a person s currenly a Peps purchaser, wha s he probably ha she wll purchase Coe wo purchases from now? The ranson mar s: A (Correspondng o one purchase ahead) A

66 Iowa Sae Unversy Coe vs. Peps Gven ha a person s currenly a Coe drner, wha s he probably ha she wll purchase Peps hree purchases from now? A Iowa Sae Unversy Coe vs. Peps Assume each person maes one cola purchase per wee. Suppose 60% of all people now drn Coe, and 40% drn Peps. Wha fracon of people wll be drnng Coe hree wees from now? Le (q 0,q )(0.6,0.4) be he nal probables. We wll denoe Coe by 0 and Peps by 0. 9 A We wan o fnd P(X 3 0) 0. 2 a ( 0. 6)( 0. 78) + ( 0. 4)( ) ( 3) ( 3) ( 3) P( X 3 0) qa0 q0a00 + qq

67 Iowa Sae Unversy Learnng he condonal probably able Naïve: Jus observe a lo of srngs and se he condonal probables equal o observed probables Beer: add o op and number of symbols o boom - a wea unform pror over he ranson probables. p ( B A) srngs srngs p( B A) N occurrences of occurrences of + symbols srngs + # AB AB A srngs # A Iowa Sae Unversy Hdden Marov Models In many scenaros saes canno be drecly observed. We need an eenson -- Hdden Marov Models a a 22 a 33 a 44 b b 4 b 2 b 3 Observaons 2 3 a 2 a 23 a 34 4 b + b 2 + b 3 + b 4, b 2 + b 22 + b 23 + b 24, ec. a j are sae ranson probables. b are observaon (oupu) probables. 67

The curren symbol depends only on he curren hdden sae. Iowa Sae Unversy Eample: Dshones Casno Wha s hdden n hs model?

68 Iowa Sae Unversy Hdden Marov Models We nroduce hdden saes o ge a hdden Marov model: The ne hdden sae depends only on he curren hdden sae, bu hdden saes can carry along nformaon from more han one me-sep n he pas. The curren symbol depends only on he curren hdden sae. Iowa Sae Unversy Eample: Dshones Casno Wha s hdden n hs model? Sae sequences You are allowed o see he oucome of a de roll You do no now whch oucomes were obaned by a far de and whch oucomes were obaned by a loaded de 68

69 Iowa Sae Unversy Wha s an HMM? Green crcles are hdden saes Each hdden sae s dependen only on he prevous sae: Marov process The pas s ndependen of he fuure gven he presen. Iowa Sae Unversy Wha s an HMM? Purple nodes are observed saes Each observed sae s dependen only on he correspondng hdden sae 69

70 Iowa Sae Unversy Specfyng HMM X A X 2 A A A X B B B O O 2 O {X,O,, A, B} Π {π ι } are he nal sae probables A {a j } are he sae ranson probables B {b } are he observaon sae probables Iowa Sae Unversy A hdden Marov model j A B C A B C A B C Each hdden node has a vecor of ranson probables and a vecor of oupu probables. 70

71 Iowa Sae Unversy Con-Tossng Eample al 0.9 Sar /2 /2 /2 0. /4 Far loaded al 0. /2 3/4 0.9 head head L osses Far/Loaded X X 2 X L- X L X O O 2 O L- OL O Head/Tal Query: wha are he mos lely values n he X-nodes o generae he gven daa? Iowa Sae Unversy Fundamenal problems Lelhood Compue he probably of a gven observaon sequence gven a model Decodng Gven an observaon sequence, and a model, compue he mos lely hdden sae sequence Learnng Gven an observaon sequence and se of possble models, whch model mos closely fs he daa? 7

72 Iowa Sae Unversy Generang a srng from an HMM I s easy o generae srngs f we now he parameers of he model. A each me sep, mae wo random choces: Use he ranson probables from he curren hdden node o pc he ne hdden node. Use he oupu probables from he curren hdden node o pc he curren symbol o oupu. Iowa Sae Unversy Generang a srng from an HMM I s easy o generae srngs f we now he parameers of he model. We can frs produce a complee hdden sequence and hen allowng each hdden node n he sequence o produce one symbol. Hdden nodes only depend on prevous hdden nodes The probably of generang a hdden sequence does no depend on he vsble sequence ha generaes. 72

73 Iowa Sae Unversy The probably of generang a hdden sequence Produc of probables, one for each erm n he sequence T 2 T p({ X } ) p( X ) p( X X ) Ths comes from he able of nal probables of Ths means a hdden nodes sequence of hdden nodes from me o me T a Ths s a ranson probably beween hdden nodes j p( X j X ) Iowa Sae Unversy The jon probably of generang a hdden sequence and a vsble sequence T p({ X, O } ) p( X ) p( O X ) p( X X ) p( O X ) T 2 a sequence of hdden saes and oupu symbols he probably of oupung symbol O from sae X 73

74 Iowa Sae Unversy The probably of generang a vsble sequence from an HMM T T p({ O } ) p({ O } X ) p( X ) X pahs hrough hdden saes The same vsble sequence can be produced by many dfferen hdden sequences There are eponenally many possble hdden sequences. T How can we calculae p({ O } )? Iowa Sae Unversy Fundamenal problems Lelhood Compue he probably of a gven observaon sequence gven a model Decodng Gven an observaon sequence, and a model, compue he mos lely hdden sae sequence Learnng Gven an observaon sequence and se of possble models, whch model mos closely fs he daa? 74

75 Iowa Sae Unversy The HMM dynamc programmng rc τ τ DP offers an effcen way o compue a sum ha has eponenally many erms. j j j A each me τ we combne everyhng we need o now abou he pahs up o ha me The probably of havng τ produced he sequence λ p({ O } X τ ) τ up o me τ gven ha sae s used a me τ Ths quany can be compued recursvely: λ p( O X ) λ, p( X X j) j τ τ τ τ τ τ j Iowa Sae Unversy Probably of an Observaon Sequence o o - o o + o T Gven an observaon sequence and a model, compue he probably of he observaon sequence O ( o,..., ot ), μ ( A, B, Π) Compue P( O μ) 75

76 Iowa Sae Unversy Probably of an observaon sequence - + T o o - o o + o T P... b ( O X, μ) b b o 2 o2 P ( X μ) π a a... a T T P ( O, X μ) P( O X, μ) P( X μ) P ( O μ) P( O X, μ) P( X μ) X T o T Iowa Sae Unversy Probably of an Observaon Sequence - + T o o - o o + o T P( O μ) { X... X π T } b o T Π a + b o

77 Iowa Sae Unversy Probably of an observaon sequence - + T o o - o o + o T Specal srucure gves us an effcen soluon usng dynamc programmng. Inuon Probably of he frs observaons s he same for all possble + lengh sae sequences. Defne: α ( ) P( o... o, μ) Iowa Sae Unversy Forward Procedure - + T o o - o o + o T α j ( +) P( o... o P( o... o P( o... o, P( o... o ) P( o P( o... o, j) j) P( j) P( o + j) P( o j) + j) P( + j) P( + j) j) + j) 77

78 Iowa Sae Unversy Forward Procedure - + T o o - o o + o T α j ( + )... N... N α... N ( ) a... N P( o... o, P( o... o, j b P( o... o, jo + + ) P( j +, + ) P( j j) P( o + ) P( o ) P( o j) j) j) Iowa Sae Unversy Bacward Procedure - + T o o - o o + o T β ( T + ) + T β ( ) P( o... o ) β ( ) a b β ( + ) j o+ j... N j Probably of he res of he observaons gven he sae a me 78

79 Iowa Sae Unversy Sequence probably - + T o o - o o + o T N P( O μ) α ( N P O μ ) N T ( π β ) b, o () P( O μ) α ( ) β ( ) Forward Procedure Bacward Procedure Combnaon Iowa Sae Unversy Fundamenal problems Lelhood Compue he probably of a gven observaon sequence gven a model Decodng Gven an observaon sequence, and a model, compue he mos lely hdden sae sequence Learnng Gven an observaon sequence and se of possble models, whch model mos closely fs he daa? 79

80 Iowa Sae Unversy The mos probable Sae Sequence o o - o o + o T Fnd he sae sequence ha bes eplans he observaons arg ma P( X O) X Iowa Sae Unversy Verb Algorhm - j o o - o o + o T δ j ( ) ma P(..., o... o,... j, o ) The probably of he sae sequence whch mamzes he probably of seeng he observaons o me -, landng n sae j, and seeng he observaon a me 80

81 Iowa Sae Unversy Verb Algorhm - + o o - o o + o T δ j ( ) ma P(..., o... o, δ ψ j j... ( + ) maδ ( ) ajbjo + ( + ) arg maδ ( ) ajb jo+ j, o ) Recursve Compuaon Iowa Sae Unversy Verb Algorhm - + T o o - o o + o T Xˆ T Xˆ arg maδ ( T ) ψ ^ ( + ) X + Compue he mos lely sae sequence by worng bacwards 8

82 Iowa Sae Unversy Fundamenal problems Lelhood Compue he probably of a gven observaon sequence gven a model Decodng Gven an observaon sequence, and a model, compue he mos lely hdden sae sequence Learnng Gven an observaon sequence and se of possble models, whch model mos closely fs he daa? Iowa Sae Unversy Learnng he parameers of an HMM Is easy o learn he parameers f, for each observed sequence of symbols, we can nfer he poseror dsrbuon across he sequences of hdden saes We can nfer whch hdden sae sequence gave rse o an observed sequence by usng he dynamc programmng rc. 82

83 Iowa Sae Unversy Learnng HMM Parameer Esmaon A A A A B B B B B o o - o o + o T Gven an observaon sequence, fnd he model ha s mos lely o produce ha sequence. Gven a model and observaon sequence, updae he model parameers o beer f he observaons. Iowa Sae Unversy The probably of generang a vsble sequence from an HMM p( O) p( O X ) p( X ) X hdden pahs The same vsble sequence can be produced by many dfferen hdden sequences A B C D 83

84 Iowa Sae Unversy The poseror probably of a hdden pah gven a vsble sequence p( X O) p( X ) p( O X ) p( Y ) p( O Y ) Y hdden pahs a hdden pah The sum n he denomnaor could be compued effcenly usng he dynamc programmng rc. Bu for learnng we do no need o now abou enre hdden pahs. Iowa Sae Unversy Learnng he parameers of an HMM Is easy o learn he parameers f, for each observed sequence of symbols, we can nfer he poseror probably for each hdden node a each me sep. We can nfer hese poseror probables by usng he dynamc programmng rc. 84

85 Iowa Sae Unversy The HMM dynamc programmng rc α ( ) p( O... O, ) j j j α ( j ) α j ( ) a j b, hdden saes o Iowa Sae Unversy β The dynamc programmng rc agan () p( O... O ) + T j j j β ( ) aj b, o β ( + ) + j j hdden saes 85

86 Iowa Sae Unversy The forward-bacward algorhm (Baum-Welch algorhm) We do a forward pass along he observed srng o compue he alpha s a each me sep for each node We do a bacward pass along he observed srng o compue he bea s a each me sep for each node Once we have he alpha s and bea s a each me sep, s easy o re-esmae he oupu probables and ranson probables Iowa Sae Unversy Learnng he parameers of he HMM To learn he ranson mar we need o now he epeced number of mes ha each ranson beween wo hdden nodes was used when generang he observed sequence. To learn he oupu probables we need o now he epeced number of mes each node was used o generae each symbol. Because of hdden saes, we use epecaon mamzaon (EM) algorhm 86

87 Iowa Sae Unversy The re-esmaon equaons (he M-sep of he EM procedure) For he ranson probably from node o node j: a new j Coun( o j ransons n he daa) Coun( o ransons n he daa) hdden saes For he probably ha node generaes symbol A: b ( A) Coun( sae produces symbol A n he daa) Coun( sae produces symbol B n he daa) B symbols Iowa Sae Unversy Summng he epecaons over me The epeced number of mes ha node produces symbol A requres a summaon over all he dfferen mes n he sequence when here was an A. Coun( sae produces symbol A) p( X O) : O A The epeced number of mes ha he ranson from o j occurred requres a summaon over all pars of adjacen mes n he sequence Coun( ransons from sae o sae j) p(, + T j O) 87

88 Iowa Sae Unversy Combne he pas and he fuure o ge he full poseror To re-esmae he oupu probables, we need o compue he poseror probably of beng a a parcular hdden node a a parcular me. Ths requres a summaon of he poseror probables of all he pahs ha go hrough ha node a ha me. j j j j j Iowa Sae Unversy Combnng pas and fuure p( O, ) α ( ) β ( ) p( O) p( O, ) p( O) α ( ) β ( ) p( O) α ( ) aj bj, O β ( + j p(, + j O) p( O) + ) 88

89 Iowa Sae Unversy Parameer Esmaon: Baum-Welch or Forward-Bacward A A A A B B B B B o o - o o + o T p α ( ) a + (, j) γ ( ) j... N j b jo m m... N p (, β ( + ) α ( ) β ( ) j) j m Probably of raversng an arc Probably of beng n sae Iowa Sae Unversy Parameer Esmaon: Baum-Welch Algorhm A A A A B B B B B o o - o o + o T aˆ j T T π ˆ p (, j) γ ( ) γ () bˆ { : o } T γ ( ) γ ( ) Now we can compue he new esmaes of he model parameers. 89

90 Iowa Sae Unversy HMM Parameer esmaon n pracce Sparseness of daa requres Smoohng of esmaes usng Laplace esmaes (as n Naïve Bayes) o gve suable nonzero probably o unseen observaons Doman specfc rcs Feaure decomposon (capalzed?, number?, ec. n e processng) gves a beer esmae Shrnage allows poolng of esmaes over mulple saes of same ype Well desgned HMM opology Iowa Sae Unversy 90

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model Probablsc Model for Tme-seres Daa: Hdden Markov Model Hrosh Mamsuka Bonformacs Cener Kyoo Unversy Oulne Three Problems for probablsc models n machne learnng. Compung lkelhood 2. Learnng 3. Parsng (predcon