= 1 = 4 3. Odds ratio justification for maximum likelihood. Likelihoods, Bootstraps and Testing Trees. Prob (H 2 D) Prob (H 1 D) Prob (D H 2 )

Size: px
Start display at page:

Download "= 1 = 4 3. Odds ratio justification for maximum likelihood. Likelihoods, Bootstraps and Testing Trees. Prob (H 2 D) Prob (H 1 D) Prob (D H 2 )"

Transcription

1 4 1 1/3 1 = /3 1 = 1 12 Odds ratio justification for maximum likelihood Likelihoods, ootstraps and Testing Trees Joe elsenstein the data H 1 Hypothesis 1 H 2 Hypothesis 2 the symbol for given epts. of enome Sciences and of iology, University of Washington Prob (H 1 ) Prob (H 2 ) }{{} Prior odds ratio Prob ( H 1 ) Prob ( H 2 ) }{{} Likelihood ratio = Prob (H 1 ) Prob (H 2 ) }{{} Posterior odds ratio Likelihoods, ootstraps and Testing Trees p.1/60 Likelihoods, ootstraps and Testing Trees p.2/60 If a space probe finds no Little reen Men on Mars yes no yes no priors likelihoods no 1 yes 0 posteriors no yes no yes The likelihood ratio term ultimately dominates If we see one Little reen Man, the likelihood calculation does the right thing: 1 4 2/3 = 0 1 (put this way, this is OK but not mathematically kosher) If we send n space probes and keep seeing none, the likelihood ratio term is ( ) n 1 3 It dominates the calculation, overwhelming the prior. Thus even if we don t have a prior we can believe in, we may be interested in knowing which hypothesis the likelihood ratio is recommending...

2 Likelihood in Simple oin-tossing Tossing a coin n times, with probability p of heads, the probability of outcome HHTHTTTTHTTH is which is L = p 5 (1 p) 6 pp(1 p)p(1 p)(1 p)(1 p)(1 p)p(1 p)(1 p)p Plotting L against p to find its maximum: ifferentiating to find the maximum: ifferentiating the expression for L with respect to p and equating the derivative to 0, the value of p that is at the peak is found (not surprisingly) to be p = 5/11: L p = ( 5 p 6 ) p 5 (1 p) 6 = 0 1 p 5 11 p = 0 Likelihood ˆp = p Likelihoods, ootstraps and Testing Trees p.5/60 Likelihoods, ootstraps and Testing Trees p.6/60 log-likelihoodcurve Likelihood curve in one parameter Its maximum likelihood estimate Likelihood curve in one parameter and the maximum likelihood estimate Ln (Likelihood) Ln (Likelihood) length of a branch in the tree length of a branch in the tree maximum likelihood estimate (ML)

3 The (approximate, asymptotic) confidence interval ontours of a log-likelihood surface in two dimensions Likelihood curve in one parameter and the maximum likelihood estimate and confidence interval derived from it Ln (Likelihood) 1/2 the value of a chi square with 1 d.f. significant at 95% length of branch 2 95% confidence interval length of a branch in the tree maximum likelihood estimate (ML) Likelihoods, ootstraps and Testing Trees p.9/60 length of branch 1 Likelihoods, ootstraps and Testing Trees p.10/60 ontours of a log-likelihood surface in two dimensions Log-likelihood-based confidence set for two variables shaded area is the joint confidence interval length of branch 2 ML length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with two degrees of freedom which is significant at 95% level length of branch 1 length of branch 1

4 onfidence interval for one variable onfidence interval for the other variable length of branch 2 length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with one degree of freedom which is significant at 95% level height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with one degree of freedom which is significant at 95% level length of branch 1 Likelihoods, ootstraps and Testing Trees p.13/60 length of branch 1 Likelihoods, ootstraps and Testing Trees p.14/60 alculating the likelihood of a tree If we have molecular sequences on a tree, the likelihood is the product over sites of the data [i] for each site (if those evolve independently): L = Prob( T) = sites i=1 With log-likelihoods, the product becomes a sum: ln L = ln Prob( T) = sites i=1 Prob ( [i] T) ln Prob ( [i] T) alculating the likelihood for site i on a tree t 1 x t i are t t 4 t 5 2 t 3 y t t 6 7 "branch lengths", z (rate X time) w t 8 Sum over all possible states (bases) at interior nodes: L (i) = Prob (w) Prob(x w, t 7 ) x y z w Prob ( x, t 1 )Prob( x, t 2 )Prob(z w, t 8 ) Prob ( z, t 3 )Prob(y z, t 6 )Prob( y, t 4 )Prob( y, t 5 )

5 alculating the likelihood for site i on a tree We use the conditional likelihoods: L (i) j (s) The pruning" algorithm: j k These compute the probability of everything at site i at or above node j on the tree, given that node j is in state s. Thus it assumes something (s) that we don t know in practice so we compute these for all states s. v j l v k t the tips we can define these quantities: if the observed state is (say), the vector of L s is (0, 1, 0, 0). If we observe an ambiguity, say R (purine), they are (1, 0, 1, 0), not (1/2, 0, 1/2, 0) L (i) l (s) = [ s j Prob (s j s, v j ) L (i) j (s j ) [ ] Prob (s k s, v k ) L (i) k (s k) s k ] (elsenstein, 1973; 1981). Likelihoods, ootstraps and Testing Trees p.17/60 Likelihoods, ootstraps and Testing Trees p.18/60 and at the bottom of the tree: What does tree space" (with branch lengths) look like? L (i) 0 = s π s L (i) 0 (s) an example: three species with a clock trifurcation not possible t 1 t 1 etc. (elsenstein, 1973, 1981) t 2 OK and having gotten the likelihoods for each site: L = sites i=1 L (i) 0 when we consider all three possible topologies, the space looks like: t 2 t 1 t 2

6 or one tree topology The space of trees varying all 2n 3 branch lengths, each a nonegative number, defines an orthant" (open corner) of a (2n 3)-dimensional real space: v 1 v 6 v 2 3 v v v 8 7 v 9 v 5 v 4 wall floor wall v 9 Through the looking-glass Shrinking one of the n 1 interior branches to 0, we arrive at a trifurcation: v 1 v6 v7 v 2 v3 v8 v 4 v 9 v5 v 2 v 1 v v v3 8 7 v 9 v 4 v 6 v5 v 1 v6 3 v v 8 7 v 4 v v 2 v5 v 1 v6 v 2 v3 v8 v v 5 v 9 7 v 4 Here, as we pass through the looking glass" we are also touch the space for two other tree topologies, and we could enter either. Likelihoods, ootstraps and Testing Trees p.21/60 Likelihoods, ootstraps and Testing Trees p.22/60 The graph of all trees of 5 species dataexample:mitochondrial-loopsequences The Schoenberg graph (all 15 trees of size 5 connected by NNI s) ovine Mouse ibbon Orang orilla himp Human TT T T TT T T T TT TT TTT TT T TT T T T T T TT TTT TTTT T TT T T T T TT TT TTT T T T T TT TTTT TT TTT TTT TTTT TTTTT TT TT TT TT TTTT TTT T TTTT TTT TTT TT TT T T T T TT T TT TT TTT TTT TT T TT TT T TTT T T TTTTT TTT TT TT TT TTT TTT TTT TT T TTTT T TT TTTT TTT T TT T TTT T TT TT TTT TTT TT TT TTTT TT TT TT T TT TTT TT TTT TT TTT TT TT

7 which gives the ML tree Maximum likelihood tree for the Hasegawa 232-site mitochondrial -loop data set, with Ts/Tn set to 2, analyzed with maximum likelihood (NML) ln L = Orang orilla himp Human Mouse ibbon ovine Likelihoods, ootstraps and Testing Trees p.25/60 Models with amino acids H I K L M N P Q R S T V W Y etc. H I K L M N P Q R S T V W Y ayhoff PM model Jones Taylor Thornton model specific models for secondary structure contexts or membrane proteins Models adapted from Henikoff LOSUM scoring ut... how to take N sequence into account? onstraints of code? Likelihoods, ootstraps and Testing Trees p.26/60 odon models oldman & Yang, 1994; Muse & aut, 1994) ovarion models? (itch and Markowitz, 1970) U T T T T U U phe phe leu UUU UU UU ser U stop U stop U T T T T T leu UU U leu UU T T T T T leu leu U U U leu ile U UU T T T T Which sites are available ile ile met U U U for substitutions changes as one moves along the tree U val val val UU U U T T T T val U 1 ω 0 Probabilities of change vary depending on whether amino acid is changing, and to what T T T T T

8 How to calculate likelihood with rate variation asy! Since branch lengths always come into transition probability formulas as r t,canjustmultiplylengthsofbranchesbythe appropriate factor to calculate the likelihood for a site. (ranch lengths are usually scaled by assuming a rate of 1.) Rate variation among sites Sites Phylogeny T T... Likelihoods, ootstraps and Testing Trees p.29/60 Rates at different sites: Rates of evolution Likelihoods, ootstraps and Testing Trees p.30/60 Hidden Markov Model of rate variation among sites Sites Phylogeny T T... Hidden Markov Models sum up over all paths The Hidden Markov hain method sums up likelihoods over all possible paths through the states: Prob (ata tree) = Σ Prob(ata tree, path) Prob(path) paths Hidden Markov chain that assigns rates: Rates of evolution one path another path

9 The rate combination contributing the most: orwards-ackwards algorithm (marginal probabilities) We can leave behind pointers that allow us to backtrack This can be done by a dynamic programming algorithm called the Viterbi lgorithm, well-known in the HMM literature. (Of course, this one might account for only of the likelihood) The orwards ackwards algorithm can calculate the contribution of one rate at a given site to the overall likelihood (a little different from the Viterbi calculation) Likelihoods, ootstraps and Testing Trees p.33/60 Likelihoods, ootstraps and Testing Trees p.34/60 The amma distribution, used for rates α = 1/4 V = 2 numericalexample.yochrome We analyze 31 cytochrome sequences, aligned by Naoko Takezaki, using the Proml protein maximum likelihood program. ssume ahidden Markov Model with 3 states, rates: category rate probability α = 1 V = 1 α = 4 V = 1/2 and expected block length 3. We get a reasonable, but not perfect, tree with the best rate combination inferred to be rate

10 The cytochrome tree from the above run (It s not perfect). wrong dubious seaurchin2 seaurchin1 lamprey trout loach carp xenopus chicken opossum wallaroo platypus rat mouse cat hseal gseal bovine whalebp whalebm horse dhorse rhinocer gibbon sorang borang gorilla2 gorilla1 pchimp cchimp african caucasian Likelihoods, ootstraps and Testing Trees p.37/60 Rates inferred from ytochrome african M-----TPMRK INPLMKLINH SILPTPSN ISWWNSL LLILQIT TLLMH caucasian......r t cchimp...t pchimp...t t gorilla1... T T gorilla2... T T borang... T....L I.TI... sorang...st.. T....L I gibbon...l.. T....L.....M......I... bovine...ni.. SH...IV.N.....S.....I...L... whalebm...ni.. TH...I S.....L...V..L... whalebp...ni.. TH...IV..V.....S.....L...M..L... dhorse...ni.. SH..I.I S.....I...L... horse...ni.. SH..I.I S.....I...L... rhinocer...ni.. SH..V.I S.....I...L... cat...ni.. SH..I.I V..T...L... gseal...ni.. TH...I..N I...L... hseal...ni.. TH...I..N I...L... mouse...n... TH...I S.....V..MV..I... rat...ni.. SH...I S.....V..MV..L... platypus...nnl.. TH..I.IV S.....L...I..L... wallaroo...nl.. SH..I.IV I..L... opossum...ni.. TH...I V...I..L... chicken...pni.. SH..L.M..N.L V..MT..L...L... xenopus...pni.. SH..I.I..N.....SL.....V...I... carp...-sl.. TH..I.I. LV L...T..L... loach...-sl.. TH..I.I. LV.....V.....L...T..L... trout...-nl.. TH..L.I. LV.....V.....L..T..L... lamprey.shqpsii.. TH..LS..S MLV...S......SL...I...I... seaurchin1 -...L.L.. H.IRIL.S T.V...L... L.I.....L...T..L... seaurchin2 -...L.. H.IRIL.S T.V...L... L.M... Likelihoods,..L...I.LI ootstraps and Testing..I... Trees p.38/60 Rates inferred from ytochrome can PSTSSI HITRVNY WIIRYLHN SMILL HIRLYYS LYSTWNI asian mp l... mp l....v......l... lla t hq... lla t hq... ng...t m..h......l thl... ng m..h thl... on...v l... ne S.TT...V T......M......YM.V... YTL... ebm..tm...v T....V......Y.M... HR... ebp..tt...v T Y.M... YR... se S.TT...V T I.V... YTL... e S.TT...V T I.V... YTL... ocer..tt...v T....M......I.V... YTL... S.TM...V T YM.V...M... YT... l S.TT...V T YM.V... YTT... l S.TT...V T YM.V... YTT... e S.TM...V T....L...M V... YTM... S.TM...V T....L...Q V... YTL... ypus S.T...V....L...M.....L..M.I..... YTQT... aroo S.TL...V....L..N......M....V...I... Y..K... sum S.TL...V....L..NI......M....V...I... Y..K... ken.t.l...v..t.n.q...l..n......i..... Y..K...T pus.t.m...v... LL..N... L...IY K... S.I...V T....L..NV......IYM... Y..K... h S.I...V....L..NI......Y.... Y..K... t S.I...V...S...L..NI......IYM... Y..K... rey NTL...V M...N..LM.N......IY...I... Y..K...V rchin1.i.l... S....LL.NV.....L...MY... SNKI...V rchin2.inl...v S....LL.NV.....L...MY...L TNKI...V Likelihood curve and its confidence interval ln L Transition / transversion ratio (This is for the 14-species primates data available for download).

11 onstraints on a tree for a clock Likelihood-ratio test of molecular clock onstraints for a clock Mouse v 1 v 2 v 4 v 5 v 1 = v 2 Human himp v 6 v 3 v v 4 = 5 orilla Orang v 8 v 1 + v v 6 = 3 ibbon ovine v 7 v v = v 4 + v 8 log likelihood Without clock parameters 11 With clock ifference χ 2 = df = 5 (This is for this 7-species subset of the primates data). (non significant) Likelihoods, ootstraps and Testing Trees p.41/60 Likelihoods, ootstraps and Testing Trees p.42/60 Likelihood surface for three clocklike trees Two trees to be tested using KHT test 204 Mouse ln Likelihood x x Tree I ovine ibbon Orang orilla himp Human x Mouse Tree II ovine ibbon x Orang (These are profile likelihoods" as they show the largest likelihood for that value of x,maximizingovertheotherbranchlengthinthetree.) orilla himp Human

12 Table of differences in log-likelihood Histogram of those differences Tree I II site ln L iff ifference in log likelihood at site o sign test, or t-test, or similar nonparametric tests. Likelihoods, ootstraps and Testing Trees p.45/60 Likelihoods, ootstraps and Testing Trees p.46/60 ootstrap sampling (with mixtures of normals) estimate of θ (unknown) true value of θ empirical distribution of sample (unknown) true distribution ootstrap replicates ootstrap sampling To infer the error in a quantity, θ, estimatedfromasampleofpoints x 1, x 2,...,x n we can o the following R times (R =1000or so) raw a bootstrap sample" by sampling n times with replacement from the sample. all these x 1, x 2,...,x n.notethatsomeofthe original points are represented more than once in the bootstrap sample, some once, some not at all. stimate θ from each of the bootstrap samples, call these ˆθ k (k = 1, 2,...,R) istribution of estimates of parameters When all R bootstrap samples have been done, the distribution of ˆθ i estimates the distribution one would get if one were able to draw repeated samples of n data points from the unknown true distribution.

13 ootstrap sampling of phylogenies nalyzing bootstraps with phylogenies Original ata sequences sites The sites are assumed to have evolved independently given the tree. They are the entities that are sampled (the x i ). The trees play the role of the parameter. One ends up with a cloud of R sampled trees. ootstrap sample #1 sequences sites sample same number of sites, with replacement stimate of the tree To summarize this cloud, we ask, for each branch in the tree, how frequently it appears among the cloud of trees. We make a tree that summarizes this for all the most frequently occurring branches. This is the majority rule consensus tree of the bootstrap estimates of the tree. ootstrap sample #2 sites ootstrap estimate of the tree, #1 sequences sample same number of sites, with replacement (and so on) ootstrap estimate of the tree, #2 Likelihoods, ootstraps and Testing Trees p.49/60 Likelihoods, ootstraps and Testing Trees p.50/60 Partitions from branches in an (unrooted) tree The majority-rule consensus tree and so on for all the other external (tip) branches 1

14 The majority-rule consensus tree The majority-rule consensus tree 1 1 Likelihoods, ootstraps and Testing Trees p.52/ Likelihoods, ootstraps and Testing Trees p.52/60 The majority-rule consensus tree The majority-rule consensus tree

15 The majority-rule consensus tree The majority-rule consensus tree Likelihoods, ootstraps and Testing Trees p.52/ Likelihoods, ootstraps and Testing Trees p.52/60 The majority-rule consensus tree The majority-rule consensus tree Majority rule consensus tree of the unrooted trees:

16 ootstrap sampling of a phylogeny ovine Mouse Squir Monk himp Human orilla Orang ibbon Rhesus Mac Jpn Macaq rab.mac Potential problems with the bootstrap Sites may not evolve independently Sites may not come from a common distribution (but you can consider them to be sampled from a mixture of possible distributions) If do not know which branch is of interest at the outset, a multiple-tests" problem means that the most extreme P values are overstated Pvaluesarebiased(tooconservative) ootstrapping does not correct biases in phylogeny methods arbmacaq 49 Tarsier Lemur In this example, parsimony was used to infer the tree. Likelihoods, ootstraps and Testing Trees p.53/60 Likelihoods, ootstraps and Testing Trees p.54/60 elete-half jackknife P values diagramoftheparametricbootstrap ovine Mouse computer simulation estimation of tree Squir Monk himp Human orilla Orang ibbon Rhesus Mac Jpn Macaq rab.mac arbmacaq original data estimate of tree data set #1 data set #2 data set #3 data set #100 T 1 T 2 T 3 T Tarsier Lemur In this example, parsimony was used to infer the tree.

17 References Likelihood dwards,. W.. and L. L. avalli-sforza Reconstruction of evolutionary trees. pp in Phenetic and Phylogenetic lassification, ed. V. H. Heywood and J. McNeill. Systematics ssociation Publication No. 6. Systematics ssociation, London. [The founding paper for parsimony and likelihood for phylogenies, using gene frequencies] Jukes, T. H. and. antor volution of protein molecules. pp in Mammalian Protein Metabolism, ed. M. N. Munro. cademic Press, New York. [The Jukes-antor model, in one formula and a couple of sentences] Neyman, J Molecular studies of evolution: a source of novel statistical problems. In Statistical ecision Theory and Related Topics, ed. S. S. upta and J. Yackel, pp New York: cademic Press. [irst paper on likelihood for molecular sequences. Neyman was a famous statistician.] elsenstein, J Maximum-likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Systematic Zoology 22: [The pruning algorithm, parsimony is not same as likelihood] elsenstein, J volutionary trees from N sequences: a maximum likelihood approach. Journal of Molecular volution 17: [Making likelihood useable for molecular sequences] Likelihoods, ootstraps and Testing Trees p.57/60 (more references) Yang, Z Maximum-likelihood estimation of phylogeny from N sequences when substitution rates differ over sites. Molecular iology and volution 10: [Use of gamma distribution of rate variation in ML phylogenies] Yang, Z Maximum likelihood phylogenetic estimation from N sequences with variable rates over sites: approximate methods. Journal of Molecular volution 39: [pproximating gamma distribution in ML phylogenies by an HMM] Yang, Z space-time process model for the evolution of N sequences. enetics 139: [llowing for autocorrelated rates along the molecule using an HMM for ML phylogenies] elsenstein, J. and.. hurchill Hidden Markov Model approach to variation among sites in rate of evolution Molecular iology and volution 13: [HMM approach to evolutionary rate variation] Thorne, J. L., N. oldman, and. T. Jones ombining protein evolution and secondary structure. Molecular iology and volution [HMM for secondary structure of proteins, with phylogenies] ootstraps etc. fron, ootstrap methods: another look at the jackknife. nnals of Statistics 7: [The original bootstrap paper] Likelihoods, ootstraps and Testing Trees p.58/60 (more references) Margush, T. and. R. McMorris onsensus n-trees. ulletin of Mathematical iology 43: i. [Majority-rule consensus trees] elsenstein, J onfidence limits on phylogenies: an approach using the bootstrap. volution 39: [The bootstrap first applied to phylogenies] Zharkikh,., and W.-H. Li Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. our taxa with a molecular clock. Molecular iology and volution 9: [iscovery and explanation of bias in P values] Künsch, H. R The jackknife and the bootstrap for general stationary observations. nnals of Statistics 17: [The block-bootstrap] Wu,.. J Jackknife, bootstrap and other resampling plans in regression analysis. nnals of Statistics 14: [The delete-half jackknife] fron, ootstrap confidence intervals for a class of parametric problems. iometrika 72: [The parametric bootstrap] Templeton,. R Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. volution 37: [The first paper on the KHT test] (more references) oldman, N Statistical tests of models of N substitution. Journal of Molecular volution 36: [Parametric bootstrapping for testing models] Shimodaira, H. and M. Hasegawa Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Molecular iology and volution 16: [orrection of KHT test for multiple hypothesis] Prager,. M. and.. Wilson ncient origin of lactalbumin from lysozyme: analysis of N and amino acid sequences. Journal of Molecular volution 27: [winning-sites test] Hasegawa, M. and H. Kishino ccuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree. Molecular iology and volution 11: [RLL probabilities] eneral reading elsenstein, J Inferring Phylogenies. Sinauerssociates, Sunderland, Massachusetts. [ook you and all your friends must rush out and buy] Yang, Z omputational Molecular volution. OxfordUniversityPress, Oxford. [Well-thought-out book on molecular phylogenies] Semple,. and M. Steel Phylogenetics. Oxford University Press,

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval ootstraps and testing trees Joe elsenstein epts. of Genome Sciences and of iology, University of Washington ootstraps and testing trees p.1/20 log-likelihoodcurveanditsconfidenceinterval 2620 2625 ln L

More information

Likelihoods and Phylogenies

Likelihoods and Phylogenies Likelihoods and Phylogenies Joe Felsenstein Department of enome Sciences and Department of Biology University of Washington, Seattle Likelihoods and Phylogenies p.1/68 n ideal parsimony method? Ideally,

More information

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies Week 8: Testing trees, ootstraps, jackknifes, gene frequencies Genome 570 ebruary, 2016 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.1/69 density e log (density) Normal distribution:

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models

Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models Genome 570 February, 2012 Week 6: Restriction sites, RAPDs, microsatellites, likelihood, hidden Markov models p.1/63

More information

Week 7: Bayesian inference, Testing trees, Bootstraps

Week 7: Bayesian inference, Testing trees, Bootstraps Week 7: ayesian inference, Testing trees, ootstraps Genome 570 May, 2008 Week 7: ayesian inference, Testing trees, ootstraps p.1/54 ayes Theorem onditional probability of hypothesis given data is: Prob

More information

Week 6: Protein sequence models, likelihood, hidden Markov models

Week 6: Protein sequence models, likelihood, hidden Markov models Week 6: Protein sequence models, likelihood, hidden Markov models Genome 570 February, 2016 Week 6: Protein sequence models, likelihood, hidden Markov models p.1/57 Variation of rates of evolution across

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

Inference of phylogenies, with some thoughts on statistics and geometry p.1/31

Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Inference of phylogenies, with some thoughts on statistics and geometry Joe Felsenstein University of Washington Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Darwin s

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/39

More information

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26 Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and

More information

Inferring Molecular Phylogeny

Inferring Molecular Phylogeny Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22 Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Why IQ-TREE? Bui Quang Minh Australian National University. Workshop on Molecular Evolution Woods Hole, 24 July 2018

Why IQ-TREE? Bui Quang Minh Australian National University. Workshop on Molecular Evolution Woods Hole, 24 July 2018 Why IQ-R? ui Quang Minh ustralian National University Workshop on Molecular volution Woods Hole, 24 July 2018 hanks to plenty of users for feedback and bug reports! ypical phylogenetic analysis under maximum

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Inferring Molecular Phylogeny

Inferring Molecular Phylogeny r. Walter Salzburger The tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 2 1. Molecular Markers Inferring Molecular Phylogeny 3 Immunological comparisons! Nuttall

More information

An Introduction to Bayesian Inference of Phylogeny

An Introduction to Bayesian Inference of Phylogeny n Introduction to Bayesian Inference of Phylogeny John P. Huelsenbeck, Bruce Rannala 2, and John P. Masly Department of Biology, University of Rochester, Rochester, NY 427, U.S.. 2 Department of Medical

More information

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Phylogenetic Inference using RevBayes

Phylogenetic Inference using RevBayes Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Molecular Evolution and Comparative Genomics

Molecular Evolution and Comparative Genomics Molecular Evolution and Comparative Genomics --- the phylogenetic HMM model 10-810, CMB lecture 5---Eric Xing Some important dates in history (billions of years ago) Origin of the universe 15 ±4 Formation

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood For: Prof. Partensky Group: Jimin zhu Rama Sharma Sravanthi Polsani Xin Gong Shlomit klopman April. 7. 2003 Table of Contents Introduction...3

More information

Computational Genomics

Computational Genomics omputational Genomics Molecular Evolution: Phylogenetic trees Eric Xing Lecture, March, 2007 Reading: DTW boo, hap 2 DEKM boo, hap 7, Phylogeny In, Ernst Haecel coined the word phylogeny and presented

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop David Rasmussen & arsten Magnus June 27, 2016 1 / 31 Outline of sequence evolution: rate matrices Markov chain model Variable rates amongst different sites: +Γ Implementation in BES2 2 / 31 genotype

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

Estimating Divergence Dates from Molecular Sequences

Estimating Divergence Dates from Molecular Sequences Estimating Divergence Dates from Molecular Sequences Andrew Rambaut and Lindell Bromham Department of Zoology, University of Oxford The ability to date the time of divergence between lineages using molecular

More information

Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

What Is Conservation?

What Is Conservation? What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG Department of Biology (Galton Laboratory), University College London, 4 Stephenson

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? 30 July 2011. Joe Felsenstein Workshop on Molecular Evolution, MBL, Woods Hole Statistical nonmolecular

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Copyright notice. Molecular Phylogeny and Evolution. Goals of the lecture. Introduction. Introduction. December 15, 2008

Copyright notice. Molecular Phylogeny and Evolution. Goals of the lecture. Introduction. Introduction. December 15, 2008 opyright notice Molecular Phylogeny and volution ecember 5, 008 ioinformatics J. Pevsner pevsner@kennedykrieger.org Many of the images in this powerpoint presentation are from ioinformatics and Functional

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Inferring Phylogenies from Protein Sequences by. Parsimony, Distance, and Likelihood Methods. Joseph Felsenstein. Department of Genetics

Inferring Phylogenies from Protein Sequences by. Parsimony, Distance, and Likelihood Methods. Joseph Felsenstein. Department of Genetics Inferring Phylogenies from Protein Sequences by Parsimony, Distance, and Likelihood Methods Joseph Felsenstein Department of Genetics University of Washington Box 357360 Seattle, Washington 98195-7360

More information

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Is the equal branch length model a parsimony model?

Is the equal branch length model a parsimony model? Table 1: n approximation of the probability of data patterns on the tree shown in figure?? made by dropping terms that do not have the minimal exponent for p. Terms that were dropped are shown in red;

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD

PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD Annu. Rev. Ecol. Syst. 1997. 28:437 66 Copyright c 1997 by Annual Reviews Inc. All rights reserved PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD John P. Huelsenbeck Department of

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Emily Blanton Phylogeny Lab Report May 2009

Emily Blanton Phylogeny Lab Report May 2009 Introduction It is suggested through scientific research that all living organisms are connected- that we all share a common ancestor and that, through time, we have all evolved from the same starting

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Counting phylogenetic invariants in some simple cases. Joseph Felsenstein. Department of Genetics SK-50. University of Washington

Counting phylogenetic invariants in some simple cases. Joseph Felsenstein. Department of Genetics SK-50. University of Washington Counting phylogenetic invariants in some simple cases Joseph Felsenstein Department of Genetics SK-50 University of Washington Seattle, Washington 98195 Running Headline: Counting Phylogenetic Invariants

More information

1 ATGGGTCTC 2 ATGAGTCTC

1 ATGGGTCTC 2 ATGAGTCTC We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

More information

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

More information

Phylogenetic Trees. How do the changes in gene sequences allow us to reconstruct the evolutionary relationships between related species?

Phylogenetic Trees. How do the changes in gene sequences allow us to reconstruct the evolutionary relationships between related species? Why? Phylogenetic Trees How do the changes in gene sequences allow us to reconstruct the evolutionary relationships between related species? The saying Don t judge a book by its cover. could be applied

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Fundamental Probability and Statistics

Fundamental Probability and Statistics Fundamental Probability and Statistics "There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Principles of Phylogeny Reconstruction How do we reconstruct the tree of life? Basic Terminology. Looking at Trees. Basic Terminology.

Principles of Phylogeny Reconstruction How do we reconstruct the tree of life? Basic Terminology. Looking at Trees. Basic Terminology. Principles of Phylogeny Reconstruction How do we reconstruct the tree of life? Phylogeny: asic erminology Outline: erminology Phylogenetic tree: Methods Problems parsimony maximum likelihood bootstrapping

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Following Confidence limits on phylogenies: an approach using the bootstrap, J. Felsenstein, 1985 1 I. Short

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Inferring Speciation Times under an Episodic Molecular Clock

Inferring Speciation Times under an Episodic Molecular Clock Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Message Passing Algorithms and Junction Tree Algorithms

Message Passing Algorithms and Junction Tree Algorithms Message Passing lgorithms and Junction Tree lgorithms Le Song Machine Learning II: dvanced Topics S 8803ML, Spring 2012 Inference in raphical Models eneral form of the inference problem P X 1,, X n Ψ(

More information

Phylogenetics Todd Vision Spring Some applications. Uncultured microbial diversity

Phylogenetics Todd Vision Spring Some applications. Uncultured microbial diversity Phylogenetics Todd Vision Spring 2008 Tree basics Sequence alignment Inferring a phylogeny Neighbor joining Maximum parsimony Maximum likelihood Rooting trees and measuring confidence Software and file

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Thanks to Paul Lewis and Joe Felsenstein for the use of slides Thanks to Paul Lewis and Joe Felsenstein for the use of slides Review Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPGMA infers a tree from a distance

More information

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Phylogeny Tree Algorithms

Phylogeny Tree Algorithms Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

More information

Branch-Length Prior Influences Bayesian Posterior Probability of Phylogeny

Branch-Length Prior Influences Bayesian Posterior Probability of Phylogeny Syst. Biol. 54(3):455 470, 2005 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150590945313 Branch-Length Prior Influences Bayesian Posterior Probability

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information