Molecula Evoluion and hylogeny Baed on: Dubin e al Chape 8.
hylogeneic Tee umpion banch inenal node leaf Topology T : bifucaing Leave - N Inenal node N+ N- Lengh { i } fo each banch hylogeneic ee Topology Lengh T
Maimum Likelihood ppoach Conide he phylogeneic ee o be a ochaic poce. G Unobeved G GG G Obeved The pobabiliy of aniion fom chaace a o chaace b i given by paamee θ ba. The pobabiliy of lee a in he oo i q a. Thee paamee ae defined via ae of change pe ime uni ime he ime uni. Given he complee ee he pobabiliy of daa i defined by he value of he θ ba and he q a.
Maimum Likelihood ppoach ume each ie evolve independenly of he ohe. G G G G DTee θ i D i Tee θ Wie down he likelihood of he daa leave equence given each ee. Ue EM o eimae he θ ba paamee. When he ee i no given: Seach fo he ee ha maimize DTee θ EM i D i Tee θ EM
obabiliic Mehod The phylogeneic ee epeen a geneaive pobabiliic model like HMM fo he obeved equence. Backgound pobabiliie: q a Muaion pobabiliie: a b Model fo evoluionay muaion Juke Cano Kimua -paamee model Such model ae ued o deive he pobabiliie
Juke Cano model model fo muaion ae Muaion occu a a conan ae Each nucleoide i equally likely o muae ino any ohe nucleoide wih ae.
The Juke-Cano model 969 We need o develop a fomula fo DN evoluion via oby whee and y ae aken fom { C G T} and i he ime lengh. Juke-Cano aume equal ae of change: G T C - T G C R T G C
The Juke-Cano model Con. We denoe by S he aniion pobabiliie: K K K k K K S We aume he mai i muliplicaive in he ene ha: S + S S fo any ime lengh o.
The Juke-Cano model Con. Fo a ho ime peiod ε we wie: ε + ε S ε I Rε ε ε ε ε ε ε ε ε ε ε ε ε ε ε By muliplicaively: S+ ε S Sε SI+Rε Hence: [S+ ε - S] /ε S R Leading o he linea diffeenial equaion: S SR Wih he addiional condiion ha in he limi a goe o infiniy:
The Juke-Cano model Con. Subiuing S ino he diffeenial equaion yield: + + Yielding he unique oluion which i known a he Juke-Cano model: e e +
Kimua -paamee model llow a diffeen ae fo aniion and anveion.
Kimua K model 980 Juke-Cano model doe no ake ino accoun ha aniion ae beween puine G and beween pymidine C T ae diffeen fom anveion ae of C T C G G T. Kimua ued a diffeen ae mai: T G C R T G C
Kimua K model Con. u u u u S u e e u e + + Leading uing imila mehod o: Whee:
Muaion obabiliie Boh model aify he following popeie: Lack of memoy: + ' a c ' b a b b c Reveibiliy: Ei aionay pobabiliie { a }.. a a b b b a G C T
obabiliic ppoach Given q he ee opology and banch lengh we can compue: 5 5 5 5 5 p p p p q T
. Calculae likelihood fo each ie on a pecific ee.. Sum up he L value fo all ie on he ee.. Compae he L value fo all poible ee.. Chooe ee wih highe L value.
Compuing he Tee Likelihood 5 5 T T We ae ineeed in he pobabiliy of obeved daa given ee and banch lengh : Compued by umming ove inenal node Thi can be done efficienly uing a ee upwad aveal pa.
Tee Likelihood Compuaion Define L k a pob. of leave below node k given ha k a Ini: fo leave: L k a if k a ; 0 ohewie Ieaion: if k i node wih childen i and j hen Teminaion:Likelihood i c b j j i i k c L a c b L a b a L a q a L T a oo n
Maimum Likelihood ML Scoe each ee by umpion of independen poiion m Banch lengh can be opimized Gadien cen EM We look fo he highe coing ee Ehauive X X n T [ m] m Sampling mehod Meopoli n [ m] T
Opimal Tee Seach efom each ove poible opologie T T T aamee pace aameic opimizaion EM Local Maima T T n
Compuaional oblem Such pocedue ae compuaionally epenive! Compuaion of opimal paamee pe candidae equie non-ivial opimizaion ep. Spend non-negligible compuaion on a candidae even if i i a low coing one. In pacice uch leaning pocedue can only conide mall e of candidae ucue