Likelihoods and Phylogenies

Size: px
Start display at page:

Download "Likelihoods and Phylogenies"

Transcription

1 Likelihoods and Phylogenies Joe Felsenstein Department of enome Sciences and Department of Biology University of Washington, Seattle Likelihoods and Phylogenies p.1/68

2 n ideal parsimony method? Ideally, we d like to have a parsimony method that Took into account less parsimonious as well as most parsimonious state reconstructions Likelihoods and Phylogenies p.2/68

3 n ideal parsimony method? Ideally, we d like to have a parsimony method that Took into account less parsimonious as well as most parsimonious state reconstructions Weighted changes differently if they occur in a branch of different length Likelihoods and Phylogenies p.2/68

4 n ideal parsimony method? Ideally, we d like to have a parsimony method that Took into account less parsimonious as well as most parsimonious state reconstructions Weighted changes differently if they occur in a branch of different length Weighted different kinds of events (e.g. transitions, transversions) differently Likelihoods and Phylogenies p.2/68

5 n ideal parsimony method? Ideally, we d like to have a parsimony method that Took into account less parsimonious as well as most parsimonious state reconstructions Weighted changes differently if they occur in a branch of different length Weighted different kinds of events (e.g. transitions, transversions) differently There is such a method. It is maximum likelihood. Likelihoods and Phylogenies p.2/68

6 n ideal parsimony method? Ideally, we d like to have a parsimony method that Took into account less parsimonious as well as most parsimonious state reconstructions Weighted changes differently if they occur in a branch of different length Weighted different kinds of events (e.g. transitions, transversions) differently There is such a method. It is maximum likelihood. But... It requires a believable model, and Likelihoods and Phylogenies p.2/68

7 n ideal parsimony method? Ideally, we d like to have a parsimony method that Took into account less parsimonious as well as most parsimonious state reconstructions Weighted changes differently if they occur in a branch of different length Weighted different kinds of events (e.g. transitions, transversions) differently There is such a method. It is maximum likelihood. But... It requires a believable model, and It is computationally intensive Likelihoods and Phylogenies p.2/68

8 Odds ratio justification for maximum likelihood D the data H 1 Hypothesis 1 H 2 Hypothesis 2 the symbol for given Prob (H 1 D) Prob (D H 1 ) Prob (H 1 ) Prob (H 2 D) = Prob (D H 2 ) Prob (H 2 ) }{{} Posterior odds ratio } {{ } Likelihood ratio }{{} Prior odds ratio Likelihoods and Phylogenies p.3/68

9 simple example of Bayes Theorem If a space probe finds no Little reen Men on Mars, when it would have a 1/3 chance of missing them if they were there: likelihoods 1 no yes 0 Likelihoods and Phylogenies p.4/68

10 simple example of Bayes Theorem If a space probe finds no Little reen Men on Mars, when it would have a 1/3 chance of missing them if they were there: yes no priors likelihoods 1 no yes /3 1 Likelihoods and Phylogenies p.5/68

11 simple example of Bayes Theorem If a space probe finds no Little reen Men on Mars, when it would have a 1/3 chance of missing them if they were there: yes no priors likelihoods 1 no yes 0 yes no posteriors 4 1 1/3 = Likelihoods and Phylogenies p.6/68

12 simple example of Bayes Theorem If a space probe finds no Little reen Men on Mars, when it would have a 1/3 chance of missing them if they were there: yes no no priors yes likelihoods 1 no yes 0 yes no posteriors 4 1 1/3 = /3 1 Likelihoods and Phylogenies p.7/68

13 simple example of Bayes Theorem If a space probe finds no Little reen Men on Mars, when it would have a 1/3 chance of missing them if they were there: yes no no priors yes likelihoods 1 no yes 0 no yes no posteriors yes 4 1 1/3 = /3 = Likelihoods and Phylogenies p.8/68

14 The likelihood ratio term ultimately dominates If we see one Little reen Man, the likelihood calculation does the right thing: 1 = 2/ (put this way, this is OK but not mathematically kosher) If after n missions, we keep seeing none, the likelihood ratio term is ( ) n 1 3 It dominates the calculation, overwhelming the prior. Thus even if we don t have a prior we can believe in, we may be interested in knowing which hypothesis the likelihood ratio is recommending... Likelihoods and Phylogenies p.9/68

15 Likelihood in simple coin-tossing Tossing a coin n times, with probability p of heads, the probability of outcome HHTHTTTTHTTH is which is L = p 5 (1 p) 6 pp(1 p)p(1 p)(1 p)(1 p)(1 p)p(1 p)(1 p)p Plotting L against p to find its maximum: Likelihood p Likelihoods and Phylogenies p.10/68

16 Differentiating to find the maximum: Differentiating the expression for L with respect to p and equating the derivative to 0, the value of p that is at the peak is found (not surprisingly) to be p = 5/11: L p = ( 5 p 6 ) p 5 (1 p) 6 = 0 1 p 5 11 p = 0 ˆp = 5 11 Likelihoods and Phylogenies p.11/68

17 You already know many likelihood estimators Many commonly-used estimators in statistics are actually MLE s: For example: The empirical average as the mean of a sample from a normal distribution The correlation coefficient The slope of a regression of Y on X The observed fraction of heads as estimate of p in tossing coins Likelihoods and Phylogenies p.12/68

18 likelihood curve Likelihood curve in one parameter Ln (Likelihood) length of a branch in the tree Likelihoods and Phylogenies p.13/68

19 Its maximum likelihood estimate Likelihood curve in one parameter and the maximum likelihood estimate Ln (Likelihood) length of a branch in the tree maximum likelihood estimate (MLE) Likelihoods and Phylogenies p.14/68

20 The (approximate, asymptotic) confidence interval Likelihood curve in one parameter and the maximum likelihood estimate and confidence interval derived from it Ln (Likelihood) 1/2 the value of a chi square with 1 d.f. significant at 95% 95% confidence interval length of a branch in the tree maximum likelihood estimate (MLE) Likelihoods and Phylogenies p.15/68

21 ontours of a likelihood surface in two dimensions length of branch 2 length of branch 1 Likelihoods and Phylogenies p.16/68

22 Where the maximum likelihood estimate is length of branch 2 MLE length of branch 1 Likelihoods and Phylogenies p.17/68

23 Likelihood-based confidence set for two variables shaded area is the joint confidence interval length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with two degrees of freedom which is significant at 95% level length of branch 1 Likelihoods and Phylogenies p.18/68

24 Likelihood-based confidence interval for one variable (shaded area is the confidence interval) length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with one degree of freedom which is significant at 95% level length of branch 1 Likelihoods and Phylogenies p.19/68

25 Likelihood-based confidence interval for one variable (shaded area is the confidence interval) length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with one degree of freedom which is significant at 95% level length of branch 1 Likelihoods and Phylogenies p.20/68

26 ln L ln L Scale-invariance of ML estimates In the case of a tree with one branch, whose length can be expressed either by the (pseudo-)time t or the probability of base change p, the value of p which achieves the highest likelihood corresponds exactly to the value of t which achieves the highest likelihood, so it doesn t matter which scale we work on as long as one can be translated into the other t ^ t = ( p ^ = 0.3 ) p ^p = 0.3 ( ^t = ) Likelihoods and Phylogenies p.21/68

27 alculating the likelihood of a tree If we have molecular sequences on a tree, the likelihood is the product over sites of the data D [i] for each site (if those evolve independently): L = Prob (D T) = sites i=1 Prob (D [i] T) With log-likelihoods, the product becomes a sum: lnl = ln Prob (D T) = sites i=1 ln Prob (D [i] T) Likelihoods and Phylogenies p.22/68

28 alculating the likelihood for site i on a tree t 1 x t i are "branch lengths", t2 t 7 t 3 z t 4 t 5 y t 6 (rate X time) w t 8 Sum over all possible states (bases) at interior nodes: L (i) = x y z w Prob (w) Prob (x w, t 7 ) Prob ( x, t 1 ) Prob ( x, t 2 ) Prob (z w, t 8 ) Prob ( z, t 3 ) Prob (y z, t 6 ) Prob ( y, t 4 ) Prob ( y, t 5 ) Likelihoods and Phylogenies p.23/68

29 alculating the likelihood for site i on a tree We use the conditional likelihoods: L (i) j (s) These compute the probability of everything at site i at or above node j on the tree, given that node j is in state s. Thus it assumes something (s) that we don t know in practice we compute these for all states s. t the tips we can define these quantities: if the observed state is (say), the vector of L s is (0, 1, 0, 0). If we observe an ambiguity, say R (purine), they are (1, 0, 1, 0) Likelihoods and Phylogenies p.24/68

30 The pruning" algorithm: j k v j l v k L (i) l (s) = [ ] Prob (s j s, v j ) L (i) (s j j ) s [ j ] Prob (s k s, v k ) L (i) k (s k) s k (Felsenstein, 1973; 1981). Likelihoods and Phylogenies p.25/68

31 and at the bottom of the tree: L (i) 0 = s π s L (i) 0 (s) (Felsenstein, 1973, 1981) and having gotten the likelihoods for each site: L = sites i=1 L (i) 0 Likelihoods and Phylogenies p.26/68

32 The tree is effectively unrooted before after 6 8 t t 6 The region around nodes 6 and 8 in the tree, when a new root (node 0) is placed in that branch (The subtrees are shown as shaded triangles) It is possible to show that if the base substitution model is reversible (as most of them are), these two trees have exactly the same likelihood. So we are only inferring the unrooted tree. Likelihoods and Phylogenies p.27/68

33 Finding the best tree The pruning algorithm helps us calculate likelihood quickly. Likelihoods and Phylogenies p.28/68

34 Finding the best tree The pruning algorithm helps us calculate likelihood quickly. It turns out that for unrooted trees (and reversible models of change), the likelihood is the same no matter where you root the tree. Likelihoods and Phylogenies p.28/68

35 Finding the best tree The pruning algorithm helps us calculate likelihood quickly. It turns out that for unrooted trees (and reversible models of change), the likelihood is the same no matter where you root the tree. If we root in the middle of a branch, we can prune down to both ends of the branch and then get the likelihood of the tree really quickly for any particular length of the branch. Likelihoods and Phylogenies p.28/68

36 Finding the best tree The pruning algorithm helps us calculate likelihood quickly. It turns out that for unrooted trees (and reversible models of change), the likelihood is the same no matter where you root the tree. If we root in the middle of a branch, we can prune down to both ends of the branch and then get the likelihood of the tree really quickly for any particular length of the branch. This means we can quickly maximize likelihood when varying one branch length in a given topology (holding all the other branch lengths fixed). Likelihoods and Phylogenies p.28/68

37 Finding the best tree The pruning algorithm helps us calculate likelihood quickly. It turns out that for unrooted trees (and reversible models of change), the likelihood is the same no matter where you root the tree. If we root in the middle of a branch, we can prune down to both ends of the branch and then get the likelihood of the tree really quickly for any particular length of the branch. This means we can quickly maximize likelihood when varying one branch length in a given topology (holding all the other branch lengths fixed). If we go around the tree doing this, the branch lengths quickly reach their optima (for that topology) Likelihoods and Phylogenies p.28/68

38 Finding the best tree The pruning algorithm helps us calculate likelihood quickly. It turns out that for unrooted trees (and reversible models of change), the likelihood is the same no matter where you root the tree. If we root in the middle of a branch, we can prune down to both ends of the branch and then get the likelihood of the tree really quickly for any particular length of the branch. This means we can quickly maximize likelihood when varying one branch length in a given topology (holding all the other branch lengths fixed). If we go around the tree doing this, the branch lengths quickly reach their optima (for that topology) For searching among topologies, the problems are the usual ones, but the pruning enables local rearrangements to be evaluated more quickly. Likelihoods and Phylogenies p.28/68

39 What does tree space" (with branch lengths) look like? an example: three species with a clock B trifurcation t 1 t 2 t 1 not possible OK etc. t 2 when we consider all three possible topologies, the space looks like: t 1 t 2 Likelihoods and Phylogenies p.29/68

40 For one tree topology The space of trees varying all 2n 3 branch lengths, each a nonegative number, defines an orthant" (open corner) of a 2n 3-dimensional real space: B v 1 F v 6 v 2 3 wall 8v 9 v v v 7 v 5 v 4 D floor wall v 9 E Likelihoods and Phylogenies p.30/68

41 Through the looking-glass Shrinking one of the n 1 interior branches to 0, we arrive at a trifurcation: B v 1 F v 6 v v v 8 7 v 2 3 v 9 v 5 v 4 D E Here, as we pass through the looking glass" we are also touch the space for two other tree topologies, and we could decide to enter either. Likelihoods and Phylogenies p.31/68

42 Through the looking-glass Shrinking one of the n 1 interior branches to 0, we arrive at a trifurcation: B v 1 F v 6 v 2 3 v v v 8 7 v 9 v 5 v 4 D E B v 1 F v 6 v v 4 D v v v 7 v 5 E Here, as we pass through the looking glass" we are also touch the space for two other tree topologies, and we could decide to enter either. Likelihoods and Phylogenies p.32/68

43 Through the looking-glass Shrinking one of the n 1 interior branches to 0, we arrive at a trifurcation: B v 1 F v 6 v 2 3 v v v 8 7 v 9 v 5 v 4 D E B v 1 F v 6 v 7 B v 2 v3 v 8 v 4 D v 9 v 5 v 1 F v 6 v v 4 D v v v 7 E v 5 E Here, as we pass through the looking glass" we are also touch the space for two other tree topologies, and we could decide to enter either. Likelihoods and Phylogenies p.33/68

44 Through the looking-glass Shrinking one of the n 1 interior branches to 0, we arrive at a trifurcation: B v 1 F v 6 v 2 3 v v v 8 7 v 9 v 5 v 4 D E B v 1 F v 6 v v 4 D v v v 7 E v 5 v 1 v 7 v 9 B v 2 v3 v8 v 5 E F v 6 v 4 Here, as we pass through the looking glass" we are also touch the space for two other tree topologies, and we could decide to enter either. D Likelihoods and Phylogenies p.34/68

45 The graph of all trees of 5 species The space of all these orthants, one for each topology, connecting ones that share faces (looking glasses): D B E D B E B E D E B D B D E B D E B D E B E D B D E D B E B D E B D E E B D E B D D B E The Schoenberg graph (all 15 trees of size 5 connected by NNI s) Likelihoods and Phylogenies p.35/68

46 Models of DN change (1) Jukes antor model (1969) a a a a a T a 1 4at P( T, t) = (1 e ) 4 (2) Kimura 2 parameter model (1980) b b a b b a T (3) Felsenstein (1984) model (like Jukes antor model but allows for differences in rates of transitions and transversions) T (like Kimura model but allows for inequality of base frequencies) (4) Hasegawa, Kishino, and Yano (1985) model (like Felsenstein model but T differs in detail) Likelihoods and Phylogenies p.36/68

47 data example: mitochondrial D-loop sequences ovine ouse ibbon rang orilla himp uman TT T T TT T T T TT TT TTT TT T TT T T T T T TT TTT TTTT T TT T T T T TT TT TTT T T T T TT TTTT TT TTT TTT TTTT TTTTT TT TT TT TT TTTT TTT T TTTT TTT TTT TT TT T T T T TT T TT TT TTT TTT TT T TT TT T TTT T T TTTTT TTT TT TT TT TTT TTT TTT TT T TTTT T TT TTTT TTT T TT T TTT T TT TT TTT TTT TT TT TTTT TT TT TT T TT TTT TT TTT TT TTT TT TT T TT TT TT TT Likelihoods and Phylogenies p.37/68

48 which gives the ML tree himp Human Orang orilla ibbon ln L = Maximum likelihood tree for the Hasegawa 232-site mitochondrial D-loop data set, with Ts/Tn set to 2, analyzed with maximum likelihood (Dnaml) Mouse Bovine Likelihoods and Phylogenies p.38/68

49 pioneer of protein evolution Margaret Dayhoff, pioneer of protein databases, protein evolution models, and gene families, about 1966 Likelihoods and Phylogenies p.39/68

50 Models with amino acids R N D Q E H I L K M F P S T W V Y R N D Q E etc. H I L K M F P S T W V Y Dayhoff PM model Jones Taylor Thornton model specific models for secondary structure contexts or membrane proteins Models adapted from Henikoff BLOSUM scoring Likelihoods and Phylogenies p.40/68

51 Dayhoff s PM001 matrix R N D Q E H I L K M F P S T W ala arg asn asp cys gln glu gly his ile leu lys met phe pro ser thr trp ala R arg N asn D asp cys Q gln E glu gly H his I ile L leu K lys M met F phe P pro S ser T thr W trp Y tyr V val Likelihoods and Phylogenies p.41/68

52 odon models (more later from Joe Bielawski) (Muse & aut, MBE 1994; oldman & Yang, MBE 1994) U U phe UUU U phe leu UU UU ser U stop U stop U leu UU U leu UU leu leu U U leu U U ile UU ile ile U U met U U val UU val val U U val U Likelihoods and Phylogenies p.42/68

53 onsiderations for a protein model Making a model for protein evolution (a not-very-practical approach) Use a good model of DN evolution. Use the appropriate genetic code. When an amino acid changes, accept it with probability that declines as the amino acids become more different. Fit this to empirical information on protein evolution. Take into account variation of rate from site to site. Take into account correlation of rates in adjacent sites. How about protein structure? Secondary structure? 3D structure? (the first four steps are the codon model of oldman and Yang, 1994 and Muse and aut, 1994, both in Molecular Biology and Evolution. The next two are the rate variation machinery of Yang, 1995, 1996 and Felsenstein and hurchill, 1996). Likelihoods and Phylogenies p.43/68

54 ovarion models? (Fitch and Markowitz, 1970) T T T T T T T T T T T T T T T T T T Which sites are available T T T T for substitutions changes as one moves along the tree T T T T T Likelihoods and Phylogenies p.44/68

55 How to calculate likelihood with rate variation Easy! Since branch lengths always come into transition probability formulas as r t, can just multiply lengths of branches by the appropriate factor to calculate the likelihood for a site. (Branch lengths are usually scaled relative to a rate of 1.) Likelihoods and Phylogenies p.45/68

56 Rate variation among sites Sites Phylogeny T T... Hidden Markov chain: Rates of evolution Likelihoods and Phylogenies p.46/68

57 rray of likelihoods for possible rates Sites Phylogeny T T... Hidden Markov chain: Rates of evolution Likelihoods and Phylogenies p.47/68

58 Hidden Markov Model of rate variation among sites Sites Phylogeny T T... Hidden Markov chain: Rates of evolution Likelihoods and Phylogenies p.48/68

59 Hidden Markov Models sum up over all paths Prob (Data tree) = Σ Prob(Data tree, path) Prob(path) paths one path another path Likelihoods and Phylogenies p.49/68

60 The rate combination contributing the most: We can leave behind pointers that allow us to backtrack This can be done by a dynamic programming algorithm (Of course, this one might account for only of the likelihood) Likelihoods and Phylogenies p.50/68

61 The Forwards lgorithm The Forwards lgorithm, well-known in the Hidden Markov model literature, updates, from last to first site (yes, I know, that s the wrong direction!), the quantity Prob (D [i] T, r j ) where D [i] is the data from site i + 1 to the end of the sequence. This is just like the conditional likelihood on the tree since it is conditioned on us knowing that the rate at site i is r j. We don t know that but we compute it for all r j and do a weighted average at the front of the array of rates. The logic is the same as when adding up likelihoods over a tree. Likelihoods and Phylogenies p.51/68

62 The Forwards lgorithm If we can calculate the contribution to the likelihood from all paths passing through one rate at a particular site... Likelihoods and Phylogenies p.52/68

63 re-uses information by dynamic programming... then we can use it to calculate the same things for the previous site Likelihoods and Phylogenies p.53/68

64 like this This algorithm, the Forwards algorithm, was invented in communications applications of Hidden Markov models Likelihoods and Phylogenies p.54/68

65 The pruning algorithm is just like species 1 species 2 different bases ancestor ancestor Likelihoods and Phylogenies p.55/68

66 the Forwards lgorithm site 22 different rates site 21 site 20 Likelihoods and Phylogenies p.56/68

67 Forwards-Backwards algorithm (marginal probabilities) oing backwards (using the Forwards lgorithm), leaving information behind, then forwards (using the Backwards lgorithm), we can calculate the total probability of the data over all paths that have a particular rate at site i by combining the two into the the Forwards Backwards algorithm can calculate the contribution of one rate at a given site to the overall likelihood Likelihoods and Phylogenies p.57/68

68 frequency The amma distribution, used for rates α = 0.25 cv = 2 α = 1 cv = 1 α = cv = rate Likelihoods and Phylogenies p.58/68

69 pproximating the amma distribution Integrating over all possible rates is hard. But we can approximate the amma distribution by the rates and probabilities in a Hidden Markov Model. Here are the rates and probabilities we might use to approximate a amma with a V of 1/2: State Rate of Probability in HMM change Likelihoods and Phylogenies p.59/68

70 simple Hidden Markov Model ssume that rate i has probability p i. Start with one from that distribution. t each site: With probability 1 λ keep the rate the same With probability λ choose a new one from this distribution Likelihoods and Phylogenies p.60/68

71 numerical example. ytochrome B We analyze 31 cytochrome B sequences, aligned by Naoko Takezaki, using the Proml protein maximum likelihood program. ssume a Hidden Markov Model with 3 states, rates: and expected block length 3. category rate probability We get a reasonable, but not perfect, tree with the best rate combination inferred to be Likelihoods and Phylogenies p.61/68

72 Phylogeny for Takezaki cytochrome B whalebm whalebp borang sorang hseal gseal gibbon gorilla2 bovine gorilla1 cchimp cat rhinocer pchimp platypus dhorse horse african caucasian wallaroo rat opossum mouse chicken seaurchin2 xenopus loach carp seaurchin1 trout lamprey Likelihoods and Phylogenies p.62/68

73 Rates inferred from ytochrome B african M-----TPMRK INPLMKLINH SFIDLPTPSN ISWWNFSL LLILQIT TLFL caucasian......r t cchimp...t pchimp...t t gorilla1... T T gorilla2... T T borang... T....L I.TI... sorang...st.. T....L I gibbon...l.. T....L.....M......I... bovine...ni.. SH...IV.N.....S.....I...L... whalebm...ni.. TH...I..D.....S.....L...V..L... whalebp...ni.. TH...IV.D.V.....S.....L...M..L... dhorse...ni.. SH..I.I S.....I...L... horse...ni.. SH..I.I S.....I...L... rhinocer...ni.. SH..V.I S.....I...L... cat...ni.. SH..I.I V..T...L... gseal...ni.. TH...I..N I...L... hseal...ni.. TH...I..N I...L... mouse...n... TH..F.I S.....V..MV..I... rat...ni.. SH..F.I S.....V..MV..L... platypus...nnl.. TH..I.IV S.....L...I..L... wallaroo...nl.. SH..I.IV I..L... opossum...ni.. TH...I..D V...I..L... chicken...pni.. SH..L.M..N.L V..MT..L...L... xenopus...pni.. SH..I.I..N.....SL.....V...I... carp...-sl.. TH..I.I.D LV L...T..L... loach...-sl.. TH..I.I.D LV.....V.....L...T..L... trout...-nl.. TH..L.I.D LV.....V.....L..T..L... lamprey.shqpsii.. TH..LS..S MLV...S......SL...I...I... seaurchin1 -...L.L.. EH.IFRIL.S T.V...L... L.I.....L...T..L... seaurchin2 -...L.. EH.IFRIL.S T.V...L... L.M.....L...I.LI Likelihoods and Phylogenies..I... p.63/68

74 Rates inferred from ytochrome B african PDSTFSSI HITRDVNY WIIRYLHN SMFFILFL HIRLYYS FLYSETW caucasian cchimp l... pchimp l....v......l... gorilla t hq... gorilla t hq... borang...t m..h......l thl... sorang m..h thl... gibbon...v l... bovine S.TT...V T......M......YM.V... YTFL... whalebm..tm...v T....V......Y.M... HFR... whalebp..tt...v T Y.M... YFR... dhorse S.TT...V T I.V... YTFL... horse S.TT...V T I.V... YTFL... rhinocer..tt...v T....M......I.V... YTFL... cat S.TM...V T YM.V...M... YTF... gseal S.TT...V T YM.V... YTFT... hseal S.TT...V T YM.V... YTFT... mouse S.TM...V T....L...M V... YTFM... rat S.TM...V T....L...Q V... YTFL... platypus S.T...V....L...M.....L..M.I..... YTQT... wallaroo S.TL...V....L..N......M....V...I... Y..K... opossum S.TL...V....L..NI......M....V...I... Y..K... chicken.t.l...v..t.n.q...l..n.....f...i..... Y..K... xenopus.t.m...v...f... LL..N... L.F...IY K... carp S.I...V T....L..NV.....F...IYM... Y..K... loach S.I...V....L..NI.....F...Y.... Y..K... trout S.I...V...S...L..NI.....F...IYM... Y..K... lamprey NTEL...V M...N..LM.N......IY...I... Y..K... seaurchin1.i.l... S....LL.NV.....L...MY... SNKI... seaurchin2.inl...v S....LL.NV.....L...MY...L Likelihoods and Phylogenies TNKI... p.64/68

75 PhyloHMMs: used in the US enome Browser The conservation scores calculated in the enome Browser use PhyloHMMs, which is just these HMM methods. Likelihoods and Phylogenies p.65/68

76 References Edwards,. W. F. and L. L. avalli-sforza Reconstruction of evolutionary trees. pp in Phenetic and Phylogenetic lassification, ed. V. H. Heywood and J. McNeill. Systematics ssociation Publication No. 6. Systematics ssociation, London. [Parsimony and likelihood for phylogenies from gene frequencies] Neyman, J Molecular studies of evolution: a source of novel statistical problems. In Statistical Decision Theory and Related Topics, ed. S. S. upta and J. Yackel, pp New York: cademic Press. [First paper on likelihood for molecular sequences. Neyman was a famous statistician.] Jukes, T. H. and. antor Evolution of protein molecules. pp in Mammalian Protein Metabolism, ed. M. N. Munro. cademic Press, New York. [The Jukes-antor model, in one formula and a couple of sentences] Hasegawa, M., H. Kishino, and T. Yano Dating of the human-ape splitting by a molecular clock of mitchondrial DN. Journal of Molecular Evolution 22: [HKY model] Kimura, M simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16: [K2P model] Likelihoods and Phylogenies p.66/68

77 (more references) Felsenstein, J Maximum-likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Systematic Zoology 22: [The pruning algorithm, parsimony not same as likelihood] Felsenstein, J Evolutionary trees from DN sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: [Making likelihood useable for molecular sequences] hurchill, Stochastic models for heterogeneous DN sequences. Bulletin of Mathematical Biology 51: [First paper to use HMMs in molecular biology] Yang, Z Maximum-likelihood estimation of phylogeny from DN sequences when substitution rates differ over sites. Molecular Biology and Evolution 10: [Use of gamma distribution of rate variation in ML phylogenies] Yang, Z Maximum likelihood phylogenetic estimation from DN sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution 39: [pproximating gamma distribution in ML phylogenies by an HMM] Likelihoods and Phylogenies p.67/68

78 (and more references) Yang, Z space-time process model for the evolution of DN sequences. enetics 139: [llowing for autocorrelated rates along the molecule using an HMM for ML phylogenies] Felsenstein, J. and.. hurchill Hidden Markov Model approach to variation among sites in rate of evolution Molecular Biology and Evolution 13: [HMM approach to evolutionary rate variation] Siepel,. and D. Haussler ombining phylogenetic and hidden Markov models in biosequence analysis. Journal of omputational Biology 11: [Using PhyloHMMs to infer conserved sequences in comparative genomics] Thorne, J. L., N. oldman, and D. T. Jones ombining protein evolution and secondary structure. Molecular Biology and Evolution [HMM for secondary structure of proteins, with phylogenies] Felsenstein, J Inferring Phylogenies. Sinauer ssociates, Sunderland, Massachusetts. [Book you and all your friends must rush out and buy] Semple,. and M. Steel Phylogenetics. Oxford University Press, Oxford. [Introduction for mathematicians] Yang, Z omputational Molecular Evolution. Oxford University Press, Oxford. [Well-thought-out, concentrates on likelihood and Bayesian methods for sequences] Likelihoods and Phylogenies p.68/68

Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models

Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models Genome 570 February, 2012 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.1/63

More information

= 1 = 4 3. Odds ratio justification for maximum likelihood. Likelihoods, Bootstraps and Testing Trees. Prob (H 2 D) Prob (H 1 D) Prob (D H 2 )

= 1 = 4 3. Odds ratio justification for maximum likelihood. Likelihoods, Bootstraps and Testing Trees. Prob (H 2 D) Prob (H 1 D) Prob (D H 2 ) 4 1 1/3 1 = 4 3 1 4 1/3 1 = 1 12 Odds ratio justification for maximum likelihood Likelihoods, ootstraps and Testing Trees Joe elsenstein the data H 1 Hypothesis 1 H 2 Hypothesis 2 the symbol for given

More information

Week 6: Protein sequence models, likelihood, hidden Markov models

Week 6: Protein sequence models, likelihood, hidden Markov models Week 6: Protein sequence models, likelihood, hidden Markov models Genome 570 February, 2016 Week 6: Protein sequence models, likelihood, hidden Markov models p.1/57 Variation of rates of evolution across

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies Week 8: Testing trees, ootstraps, jackknifes, gene frequencies Genome 570 ebruary, 2016 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.1/69 density e log (density) Normal distribution:

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Inferring Molecular Phylogeny

Inferring Molecular Phylogeny Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

Week 7: Bayesian inference, Testing trees, Bootstraps

Week 7: Bayesian inference, Testing trees, Bootstraps Week 7: ayesian inference, Testing trees, ootstraps Genome 570 May, 2008 Week 7: ayesian inference, Testing trees, ootstraps p.1/54 ayes Theorem onditional probability of hypothesis given data is: Prob

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Evolutionary Analysis of Viral Genomes

Evolutionary Analysis of Viral Genomes University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Inference of phylogenies, with some thoughts on statistics and geometry p.1/31

Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Inference of phylogenies, with some thoughts on statistics and geometry Joe Felsenstein University of Washington Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Darwin s

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop David Rasmussen & arsten Magnus June 27, 2016 1 / 31 Outline of sequence evolution: rate matrices Markov chain model Variable rates amongst different sites: +Γ Implementation in BES2 2 / 31 genotype

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

Inferring Phylogenies from Protein Sequences by. Parsimony, Distance, and Likelihood Methods. Joseph Felsenstein. Department of Genetics

Inferring Phylogenies from Protein Sequences by. Parsimony, Distance, and Likelihood Methods. Joseph Felsenstein. Department of Genetics Inferring Phylogenies from Protein Sequences by Parsimony, Distance, and Likelihood Methods Joseph Felsenstein Department of Genetics University of Washington Box 357360 Seattle, Washington 98195-7360

More information

An Introduction to Bayesian Inference of Phylogeny

An Introduction to Bayesian Inference of Phylogeny n Introduction to Bayesian Inference of Phylogeny John P. Huelsenbeck, Bruce Rannala 2, and John P. Masly Department of Biology, University of Rochester, Rochester, NY 427, U.S.. 2 Department of Medical

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? 30 July 2011. Joe Felsenstein Workshop on Molecular Evolution, MBL, Woods Hole Statistical nonmolecular

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL

MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL MODELING EVOLUTION AT THE PROTEIN LEVEL USING AN ADJUSTABLE AMINO ACID FITNESS MODEL MATTHEW W. DIMMIC*, DAVID P. MINDELL RICHARD A. GOLDSTEIN* * Biophysics Research Division Department of Biology and

More information

Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

What Is Conservation?

What Is Conservation? What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.

More information

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval ootstraps and testing trees Joe elsenstein epts. of Genome Sciences and of iology, University of Washington ootstraps and testing trees p.1/20 log-likelihoodcurveanditsconfidenceinterval 2620 2625 ln L

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course # Final Exam, May 3, 2006 Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

More information

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

More information

CSE 549: Computational Biology. Substitution Matrices

CSE 549: Computational Biology. Substitution Matrices CSE 9: Computational Biology Substitution Matrices How should we score alignments So far, we ve looked at arbitrary schemes for scoring mutations. How can we assign scores in a more meaningful way? Are

More information

Phylogenetic Inference using RevBayes

Phylogenetic Inference using RevBayes Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49

Molecular evolution. Joe Felsenstein. GENOME 453, Autumn Molecular evolution p.1/49 Molecular evolution Joe Felsenstein GENOME 453, utumn 2009 Molecular evolution p.1/49 data example for phylogeny inference Five DN sequences, for some gene in an imaginary group of species whose names

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Inferring Speciation Times under an Episodic Molecular Clock

Inferring Speciation Times under an Episodic Molecular Clock Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular

More information

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3

More information

Similarity or Identity? When are molecules similar?

Similarity or Identity? When are molecules similar? Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are

More information

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG Department of Biology (Galton Laboratory), University College London, 4 Stephenson

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Bayesian Models for Phylogenetic Trees

Bayesian Models for Phylogenetic Trees Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A J Mol Evol (2000) 51:423 432 DOI: 10.1007/s002390010105 Springer-Verlag New York Inc. 2000 Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus

More information

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics Maximum Likelihood in Phylogenetics June 1, 2009 Smithsonian Workshop on Molecular Evolution Paul O. Lewis Department of Ecology & Evolutionary Biology University of Connecticut, Storrs, CT Copyright 2009

More information

The translation machinery of the cell works with triples of types of RNA bases. Any triple of RNA bases is known as a codon. The set of codons is

The translation machinery of the cell works with triples of types of RNA bases. Any triple of RNA bases is known as a codon. The set of codons is Relations Supplement to Chapter 2 of Steinhart, E. (2009) More Precisely: The Math You Need to Do Philosophy. Broadview Press. Copyright (C) 2009 Eric Steinhart. Non-commercial educational use encouraged!

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

BMI/CS 776 Lecture 4. Colin Dewey

BMI/CS 776 Lecture 4. Colin Dewey BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01 Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference Poisson process continuous Markov process X t0 X t1 X t2

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Estimating Divergence Dates from Molecular Sequences

Estimating Divergence Dates from Molecular Sequences Estimating Divergence Dates from Molecular Sequences Andrew Rambaut and Lindell Bromham Department of Zoology, University of Oxford The ability to date the time of divergence between lineages using molecular

More information

Markov Chains and Hidden Markov Models. = stochastic, generative models

Markov Chains and Hidden Markov Models. = stochastic, generative models Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,

More information

Proteins: Characteristics and Properties of Amino Acids

Proteins: Characteristics and Properties of Amino Acids SBI4U:Biochemistry Macromolecules Eachaminoacidhasatleastoneamineandoneacidfunctionalgroupasthe nameimplies.thedifferentpropertiesresultfromvariationsinthestructuresof differentrgroups.thergroupisoftenreferredtoastheaminoacidsidechain.

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

M.O. Dayhoff, R.M. Schwartz, and B. C, Orcutt

M.O. Dayhoff, R.M. Schwartz, and B. C, Orcutt A Model of volutionary Change in Proteins M.O. Dayhoff, R.M. Schwartz, and B. C, Orcutt n the eight years since we last examined the amino acid exchanges seen in closely related proteins,' the information

More information

Molecular Evolution and Comparative Genomics

Molecular Evolution and Comparative Genomics Molecular Evolution and Comparative Genomics --- the phylogenetic HMM model 10-810, CMB lecture 5---Eric Xing Some important dates in history (billions of years ago) Origin of the universe 15 ±4 Formation

More information

Computational Genomics

Computational Genomics omputational Genomics Molecular Evolution: Phylogenetic trees Eric Xing Lecture, March, 2007 Reading: DTW boo, hap 2 DEKM boo, hap 7, Phylogeny In, Ernst Haecel coined the word phylogeny and presented

More information

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Preliminaries. Download PAUP* from:   Tuesday, July 19, 16 Preliminaries Download PAUP* from: http://people.sc.fsu.edu/~dswofford/paup_test 1 A model of the Boston T System 1 Idea from Paul Lewis A simpler model? 2 Why do models matter? Model-based methods including

More information

Maximum Likelihood in Phylogenetics

Maximum Likelihood in Phylogenetics Maximum Likelihood in Phylogenetics 29 July 2014 Workshop on Molecular Evolution Woods Hole, Massachusetts Paul O. Lewis Department of Ecology & Evolutionary Biology Paul O. Lewis (2014 Woods Hole Workshop

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Lab 9: Maximum Likelihood and Modeltest

Lab 9: Maximum Likelihood and Modeltest Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*

More information

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Viewing and Analyzing Proteins, Ligands and their Complexes 2 2 Viewing and Analyzing Proteins, Ligands and their Complexes 2 Overview Viewing the accessible surface Analyzing the properties of proteins containing thousands of atoms is best accomplished by representing

More information

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid Lecture 11 Increasing Model Complexity I. Introduction. At this point, we ve increased the complexity of models of substitution considerably, but we re still left with the assumption that rates are uniform

More information

Counting phylogenetic invariants in some simple cases. Joseph Felsenstein. Department of Genetics SK-50. University of Washington

Counting phylogenetic invariants in some simple cases. Joseph Felsenstein. Department of Genetics SK-50. University of Washington Counting phylogenetic invariants in some simple cases Joseph Felsenstein Department of Genetics SK-50 University of Washington Seattle, Washington 98195 Running Headline: Counting Phylogenetic Invariants

More information

7.012 Problem Set 1. i) What are two main differences between prokaryotic cells and eukaryotic cells?

7.012 Problem Set 1. i) What are two main differences between prokaryotic cells and eukaryotic cells? ame 7.01 Problem Set 1 Section Question 1 a) What are the four major types of biological molecules discussed in lecture? Give one important function of each type of biological molecule in the cell? b)

More information

Lecture 15: Realities of Genome Assembly Protein Sequencing

Lecture 15: Realities of Genome Assembly Protein Sequencing Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing

More information

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

More information

Introduction to Molecular Phylogeny

Introduction to Molecular Phylogeny Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =

More information