Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Size: px
Start display at page:

Download "Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition"

Transcription

1 Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, New Mexico; and Department of Biological Sciences, Louisiana State University at Baton Rouge Assessment of the evolutionary process is crucial for understanding the effect of protein structure and function on sequence evolution and for many other analyses in molecular evolution. Here, we used simulations to study how taxon sampling affects accuracy of parameter estimation and topological inference in the absence of branch length asymmetry. With maximum-likelihood analysis, we find that adding taxa dramatically improves both support for the evolutionary model and accurate assessment of its parameters when compared with increasing the sequence length. Using a method we call doppelgänger trees, we distinguish the contributions of two sources of improved topological inference: greater knowledge about internal nodes and greater knowledge of site-specific rate parameters. Surprisingly, highly significant support for the correct general model does not lead directly to improved topological inference. Instead, substantial improvement occurs only with accurate assessment of the evolutionary process at individual sites. Although these results are based on a simplified model of the evolutionary process, they indicate that in general, assuming processes are not independent and identically distributed among sites, more extensive sampling of taxonomic biodiversity will greatly improve analytical results in many current sequence data sets with moderate sequence lengths. Introduction Understanding the evolutionary process in proteins and other macromolecules is central to the pursuit of evolutionary functional genomics (the use of evolutionary information to predict and better understand the structure, function, and interaction of genome components), but accurate inferences of both the topology of taxon relationships and the rates of substitution at different sites can be elusive. Many previous studies theoretically examined the question of topology assessment with known models of evolution, particularly for simple four-taxon situations (Gaut and Lewis 1995; Hillis 1995; Huelsenbeck 1995a, Huelsenbeck 1995b; Pollock and Goldstein 1995; Yang 1996, 1998; Graybeal 1998; Kim 1998; Rannala et al. 1998), but the crucial question remains: What is best to do in situations where the evolutionary process is unknown? Here we show, using a simple process for varying evolutionary rates among sites, that adding taxa dramatically improves the ability to accurately assess the evolutionary model. We introduce a technique we call doppelgänger trees, or shadowlike doubles of the tree of interest. The doppelgänger sequences are homologous to the sequences of interest but evolve independently. Thus, the phy- Abbreviations:, Gamma shape parameter; lnl, double the difference in log likelihoods between models; 1 and 2, the rate parameters for each of the two site categories; 1 / 2, the parameter for the ratio of the site-specific rates; EML, maximum likelihood with equal rates among sites; GML, maximum likelihood with rates evolving according to a two-category gamma model; ML, maximum likelihood; Pars, parsimony; SSML, maximum likelihood with a site-specific two-rate model where the rate category of each site was correctly specified prior to evaluation. Key words: evolutionary models, maximum likelihood, rate variation, taxon addition, phylogenetic inference. Address for correspondence and reprints: David D. Pollock, Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana Phone: , Fax: daviddpollock@yahoo.com. Mol. Biol. Evol. 17(12): by the Society for Molecular Biology and Evolution. ISSN: logenetic relationships within each doppelgänger tree are exactly the same as the tree of interest, but the trees are connected by a branch of effectively infinite length. Incorporating these trees allows us to add site-specific information about rates of evolution in a controlled manner without adding information about the state of internal nodes. We conclude that, using maximum likelihood (ML), phylogenetic reconstruction and assessment of an unknown evolutionary process are often improved more efficiently by adding taxa than by increasing the length of existing sequences. Materials and Methods Branch reconstruction percentages, parameter estimates, and likelihood estimates were all obtained using PAUP* (Swofford 1998). Simulated sequences were created using a simple program written by W.J.B. and A. Halpern. In order to study only broad and robust effects, we pursued our question by simulating a simple two-state model of evolution with sites evolving at two different substitution rates. Three types of ML analysis were performed: equal rates among sites (EML), rates evolving according to a two-category gamma model (GML) (Yang 1994), and a site-specific two-rate model where the rate category of each site was correctly specified prior to evaluation (SSML). Since prespecification of categories can lead to better phylogenetic reconstruction (Pollock 1998), SSML is a reasonable benchmark for the reconstruction potential of these data. SSML is unrealistic, however, in that generally one would not know precisely which rate category each site was in, so the performance of GML is of the greatest interest. EML serves for comparison as the lower limit of the likelihood reconstruction potential where nothing is inferred about variable rates among sites. Since EML is a special case of both GML and SSML, these models are nested, and double the difference in log likelihoods between them ( lnl) can be used to determine levels of support for the more complicated models (Huelsenbeck and 1854

2 Assessing an Unknown Evolutionary Process 1855 Rannala 1997). We also performed parsimony (Pars) analysis for comparison of topological inference capabilities. Compared with the equal-rates model, the gamma model has one more degree of freedom, in the form of the shape parameter ( ), which is estimated in the ML procedure. The ratio of rates for the two different rate categories is directly calculable from, and the likelihood at each site in GML is estimated as the sum of likelihoods for each of the two rate categories. When, the underlying model in GML is the same as the equal-rates model. In addition to the correct and absolute prior on the rate category for each site, SSML also has one degree of freedom difference from EML, which is the ratio of the site-specific rates ( 1 / 2, where 2 and 1 are the rates for the two categories). When 1, SSML is equivalent to EML. For most trees, the entire topology was reconstructed, but only the existence of the innermost branch was assessed. For doppelgänger trees, simulations were run independently for two or three eight-taxon trees and combined into one alignment. Reconstruction probabilities for the innermost branch were assessed for one of these trees (the focus tree), while the topology for the remaining branches was given prior to ML and Pars evaluation; this was necessary for sufficient speed in performing the repetitions. For the same reason, the attachment point for the branch between the focus tree and the doppelgänger trees was arbitrarily fixed on one of the internal branches other than the innermost branch in order to allow timely evaluation of the replicates. Analysis of eight-taxon trees where the topology of the terminal tips was fixed showed that fixing branches other than the innermost branch made only a slight difference in parameter estimation (, ), log likelihood differences, and reconstruction probabilities for the innermost branch (data not shown). Simulations with doppelgänger trees were replicated 300 times, while other simulations were replicated 1,000 times. The significances of log likelihood differences were calculated by assuming that the lnl statistic was chisquare distributed with one degree of freedom (Huelsenbeck and Rannala 1997). For example, the 5% significance level is thus Results In an initial analysis, we began with a four-taxon tree and compared the effect of doubling the number of taxa with that of doubling the sequence length (fig. 1). Regardless of the number of taxa added, the topological question evaluated was always that of the unrooted reconstruction of the initial four-taxon tree. We found that in the four-taxon case, lnl values between GML and EML did not significantly support GML in the majority of replicates, even though this model was in fact correct. With double the sequence length, slightly more than 75% of the replicates supported GML at the 5% significance level (table 1). For the eight-taxon case, however, all of the replicates consistently gave extremely significant support (P K 0.001) for the gamma model. The FIG. 1. Percentage of the innermost branch correctly reconstructed from two-rate simulations. Trees simulated were the four-taxon (4T) and eight-taxon (8T) trees from figure 3 with 1,000 sites and the fourtaxon tree with 2,000 sites (4T/2N). The reconstruction methods used were maximum-likelihood using a model with equal rates at all sites (EML), a two-category gamma model (GML), or a model with correctly specified site-specific categories (SSML), and parsimony (Pars). One thousand replicates of each four- and eight-taxon tree were simulated. Four-taxon trees had a short internal branch and four equallength terminal branches, which were 10 times as long as the internal branch. Eight-taxon trees also had a similar short innermost internal branch, while the four other internal and the eight terminal branches were five times as long, thus maintaining the same distance from the nodes at the tips to the nodes of the innermost branch. Half of the sites had one rate, and half of the sites had a rate nine times as large, such that the average tip-to-innermost-node distance was 0.5 substitutions per site for all four- and eight-taxon trees. Reconstruction percentages are given for the innermost branch of the focus tree, although the rest of the topology was also free to vary. Thus, the random reconstruction probability was always 33.3%. Tree reconstruction percentages, parameter estimates, and likelihood estimates were all obtained using PAUP* (Swofford 1998). shape parameter,, had a variance approximately 100- fold lower for the eight-taxon case than for the fourtaxon cases and, as a consequence, was also less biased. This reduction in the sampling variance of when the number of taxa was increased is consistent with previous work comparing results for three and four taxa (Gu, Fu, and Li 1995). It is surprising that, despite the high support levels and accurate parameter estimates, GML had topology reconstruction probabilities that were essentially the same as those of EML in these three cases (fig. 1). SSML reconstructed trees more efficiently in all cases, with reconstruction percentages up to 20% better than those of EML (fig. 1), so there was clearly room for improvement in the GML results. It is also quite surprising that the innermost branch was reconstructed somewhat better in the context of eight taxa than when the sequence length was doubled; the results were the same for EML, GML, and Pars and contradicted earlier results obtained for parsimony under identical conditions but using a single rate at all sites (Poe and Swofford 1999). For the eight-taxon case, there are two plausible effects which could cause improvement in tree recon-

3 1856 Pollock and Bruno Table 1 Mean Log Likelihood Values for EML, SSML, and GML, Along with Twice the Mean Log Likelihood Differences ( lnl) for the Different Models and Means and Standard Deviations (SD) of MLE Parameter Estimates 4 Taxa 4 Taxa/2N 8 Taxa 16 Taxa 24 Taxa EML... 2, , , , ,885.5 Perkb SSML... 2, , , , ,539.0 SSML-EML lnl ,346.5 Perkb Mean... SD... GML , , , , ,134.7 GML-EML lnl Perkb... Per site Mean SD NOTE. EML maximum likelihood with equal rates among sites; GML maximum likelihood with rates evolving according to a two-category gamma model; SSML maximum likelihood with a site-specific two-rate model where the rate category of each site was correctly specified prior to evaluation. EML log likelihood values and lnl means are also shown per kilobase of sequence. GML-EML lnl values are also shown per site. struction capability relative to the four-taxon case: the increase of information about the state of the internal nodes (seen in the reconstruction improvement for EML with eight taxa), and information about which rate is in effect at each site (which, when known completely, yields the improved performance of SSML relative to EML). For GML, it appears plausible that despite more FIG. 2. Percentage of the innermost branch of the eight-taxon tree correctly reconstructed with and without doppelgänger trees. Trees simulated were the eight-taxon tree from figure 1 alone (8T) or with single (16TD) or double (24TD) doppelgänger trees. The reconstruction methods used were the same as in figure 1. There were 300 replicates of each doppelgänger tree. Rates were the same as in figure 1. Reconstruction percentages are given for the innermost branch of the eight-taxon focus tree, while the topology of the doppelgänger trees was ignored. The rest of the topology was free to vary except for doppelgänger attachment points. Thus, the random reconstruction probability was always 33.3%. Tree reconstruction percentages, parameter estimates, and likelihood estimates were all obtained using PAUP* (Swofford 1998). accurate knowledge of the global parameters of the model with the eight-taxon tree, the indeterminate placement of sites into rate categories lowers tree reconstruction success. In order to test this hypothesis, we added site-specific information to the eight-taxon tree by using doppelgänger trees. The doppelgänger trees were duplicates of the eight-taxon focus tree with sites evolving at the same rates but independently of that tree. These trees had the same topology and branch lengths as the tree of interest (the focus tree), but evolution was simulated independently; this is equivalent to the sequences from these trees being related by a branch of infinite length. The doppelgänger sequences were added to the alignment, and phylogenetic analyses were performed for the combined data sets. Thus, the doppelgänger trees provided additional information about the probable site-specific rate category of each site, but no information about the state of internal nodes on the focus tree. The doppelgänger results confirmed that with more site-specific information derived from the data, GML can approach the performance of SSML. We tested a 16-taxon single doppelgänger (16TD) and a 24-taxon double doppelgänger (24TD), and for EML and SSML the likelihood values per eight-taxon tree were almost identical to previous results, as expected (table 1). Also, topology reconstruction rates were essentially unchanged (fig. 2), which indicates that the doppelgänger trees had little effect when these models were used. In contrast, GML had twice the lnl improvement per eight-taxon tree for 16TD, and for 24TD it was 2.6 times as high per eight-taxon tree as without doppelgängers, indicating nonlinear improvement in support for GML. The shape parameter was also better estimated; the variance of for 16TD was about one third that for the eight-taxon case, and for 24TD it was about one fifth. The changes in topology reconstruction probabilities for GML were also dramatic (fig. 2). For 16TD, GML made up half the difference with SSML, while for 24TD,

4 Assessing an Unknown Evolutionary Process 1857 GML reconstruction probability was at nearly the same level as for SSML. Reconstruction rates for parsimony in the 16TD and 24TD cases decreased to the same rates as in the original four-taxon case. Discussion It appears that more useful site-specific information can be obtained by adding taxa to a data set than by increasing sequence length. This information can increase phylogenetic reconstruction probabilities both by increasing knowledge of the state of internal nodes and by increasing knowledge of the rate at individual sites. Taxon addition also dramatically improves the accuracy of global parameter estimation, but this has little independent effect on phylogenetic reconstruction for the conditions of this study. This complements earlier observations that it is important to add taxa when reconstructing site-specific interactions (Pollock and Taylor 1997; Pollock, Taylor, and Goldman 1999). For the gamma model, reconstruction probability due to site-specific knowledge did not improve initially, despite high levels of support for a two-rate model and accurate estimation of the shape parameter. Instead of a dramatic improvement once the general model was strongly supported, reconstruction rates did not increase until the EML- GML lnl levels approached 1.0 per site. Although adding sequence length can be useful if the rate category for each site is specified (as in SSML), for the more general case where each site may belong to any of the possible rate categories, a large portion of the improvement in reconstruction capability can come only through taxon addition. The number of taxa required to gain most of the potential improvement in this situation (24) is an obtainable number for most evolutionary researchers, although we note that actual benefits will vary depending on how added sequences are related to the initial sequences (Goldman 1998; Rannala et al. 1998). Although we used a simple model here to evaluate general principles, we expect that these principles will hold qualitatively for the more complicated models needed to describe protein evolution, which take into account codons, differential rates of exchange between the 20 amino acids, and varying rates and other parameters among many more site categories. When the evolutionary process is unknown, it is best to increase sampling of taxonomic biodiversity in order to get as much information as possible about site-specific substitution rates. This will lead to improved topological reconstruction and support for models that more accurately reflect the underlying complexity, and will in turn allow better understanding of the effect of structure and function on the evolutionary process. Our results appear to conflict with some previous studies which have ascribed better results to increased sequence length rather than increased taxonomic sampling, or recommended avoidance of additional sequences outside the clade under consideration. These conflicts are the result of using Pars rather than ML. In order to understand the difference with regard to the question of FIG. 3. Percentage of the innermost branch correctly reconstructed with different evolutionary rates. Reconstruction rates for four-taxon and eight-taxon trees using parsimony and maximum likelihood (EML)., EML with four taxa;, EML with eight taxa;, parsimony with eight taxa. Other than there being a single rate for all sites rather than two, conditions are the same as in figure 1. Parsimony reconstruction percentages for the four-taxon tree are nearly identical to those of EML and are not shown in order to maintain clarity. Tree structures used are shown in the inset, and sequences were of length 1,000. taxon addition, we simulated a single rate at all sites, with different rates over a series of simulations. We found a broad zone in which parsimony fails to make efficient use of the information available in the eighttaxon tree (fig. 3). Pars was equivalent to ML for slow rates, but it underperformed for all larger evolutionary rates up to the point where all methods performed equally poorly. This effect is different from the well-known problem of long-branch attraction (the Felsenstein Zone [Felsenstein 1978; Hendy and Penny 1989]), as in this case the tree was entirely symmetrical and all branches outside of the innermost branch were equal in length. The effect is surprising in that many phylogenetic researchers would expect parsimony to perform well in this situation (Hillis 1996, 1998). While the average rate in our previous two-rate simulations was situated in the center of this zone, where the discrepancy between ML and Pars was greatest (as was the rate used by Poe and Swofford [1999]), the individual rates were on either end of the zone, where the discrepancy was small. Parsimony appears to take on the average characteristics of the underlying rates rather than the characteristics of a single rate equal to the average, and the apparent conflict is thus explained. For four-taxon trees with double the sequence length, reconstruction using parsimony or ML was slightly less accurate than that for eight taxa using ML (data not shown). In addition to this zone, we have shown that Pars is confounded by additional data from distant taxa (even without long-branch attraction), while ML is not distracted and can make use of the information about sitespecific rates. We note that although the behavior of Pars appears somewhat pathological in our simulations, the

5 1858 Pollock and Bruno situation is extreme in that the doppelgänger trees evolved independently from the focus tree, and this is not a realistic assumption for alignable sequences from the natural world. We did not specifically address (and in fact intentionally avoided) long-branch attraction (Felsenstein 1978; Hendy and Penny 1989; Huelsenbeck 1997; Bruno and Halpern 1999). We note, however, that our results indicate that increasing the number of taxa quickly increases confidence in the correct model and quickly increases the accuracy of parameter estimates. Since using the correct model greatly reduces the problems of long-branch attraction with ML (Huelsenbeck 1995b, 1995a; Bruno and Halpern 1999), taxon addition should diminish this concern. The results of previous studies that used parsimony to evaluate the question of how taxa should be added to improve node reconstruction should probably be reconsidered. In particular, the notion that added taxa can decrease accuracy (Kim 1996; Hillis 1998; Poe and Swofford 1999) should be abandoned as an artifact of parsimony. In contrast, the use of ML and log likelihood differences allows for careful evaluation of support for complex models under consideration and a means of evaluating when model parameters are well described and gives clear support for the usefulness of increasing sequence biodiversity. Acknowledgments We thank A. L. Halpern, B. Korber, M. Lachmann, and C. Macken for comments on the manuscript. D.D.P. was supported by a Los Alamos National Laboratory Director s Fellowship, and W.J.B. was supported by a grant from the Department of Energy. LITERATURE CITED BRUNO, W. J., and A. L. HALPERN Topological bias and inconsistency of maximum likelihood using wrong models. Mol. Biol. Evol. 16: FELSENSTEIN, J Cases in which parsimony and compatibility methods will be positively misleading. Syst. Zool. 27: GAUT, B. S., and P. O. LEWIS Success of maximum likelihood phylogeny inference in the four-taxon case. Mol. Biol. Evol. 12: GOLDMAN, N Phylogenetic information and experimental design in molecular systematics. Proc. R. Soc. Lond. B Biol. Sci. 265: GRAYBEAL, A Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47:9 17. GU, X., Y.-X. FU, and W.-H. LI Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. Mol. Biol. Evol. 12: HENDY, M. D., and D. PENNY A framework for the quantitative study of evolutionary trees. Syst. Zool. 38: HILLIS, D. M Approaches for assessing phylogenetic accuracy. Syst. Biol. 44: Inferring complex phylogenies. Nature 383: Taxonomic sampling, phylogenetic accuracy, and investigator bias. Syst. Biol. 47:3 8. HUELSENBECK, J. P. 1995a. The performance of phylogenetic methods in simulation. Syst. Biol. 44: b. The robustness of two phylogenetic methods: four-taxon simulations reveal a slight superiority of maximum likelihood over neighbor joining. Mol. Biol. Evol. 12: Is the Felsenstein Zone a fly trap? Syst. Biol. 46: HUELSENBECK, J. P., and B. RANNALA Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276: KIM, J General inconsistency conditions for maximum parsimony: effects of branch lengths and increasing numbers of taxa. Syst. Biol. 45: Large-scale phylogenies and measuring the performance of phylogenetic estimators. Syst. Biol. 47: POE, S., and D. L. SWOFFORD Taxon sampling revisited. Nature 398: POLLOCK, D. D Increased accuracy in analytical molecular distance estimation. Theor. Popul. Biol. 54: POLLOCK, D. D., and D. B. GOLDSTEIN A comparison of two methods for constructing evolutionary distances from a weighted contribution of transition and transversion differences. Mol. Biol. Evol. 12: POLLOCK, D. D., and W. R. TAYLOR Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng. 10: POLLOCK, D. D., W. R. TAYLOR, and N. GOLDMAN Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 287: RANNALA, B., J. P. HUELSENBECK, Z.YANG, and R. NIELSEN Taxon sampling and the accuracy of large phylogenies. Syst. Biol. 47: SWOFFORD, D. L Phylogenetic analysis using parsimony (*and other methods). Sinauer, Sunderland, Mass. YANG, Z Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39: Phylogenetic analysis using parsimony and likelihood methods. J. Mol. Evol. 42: On the best evolutionary rate for phylogenetic analysis. Syst. Biol. 47: ANTONY DEAN, reviewing editor Accepted August 15, 2000

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Letter to the Editor The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1 Department of Zoology and Texas Memorial Museum, University of Texas

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Thanks to Paul Lewis and Joe Felsenstein for the use of slides

Thanks to Paul Lewis and Joe Felsenstein for the use of slides Thanks to Paul Lewis and Joe Felsenstein for the use of slides Review Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPGMA infers a tree from a distance

More information

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13 Phylogenomics Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University How may we improve our inferences? How may we improve our inferences? Inferences Data How may we improve

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft] Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley K.W. Will Parsimony & Likelihood [draft] 1. Hennig and Parsimony: Hennig was not concerned with parsimony

More information

Points of View JACK SULLIVAN 1 AND DAVID L. SWOFFORD 2

Points of View JACK SULLIVAN 1 AND DAVID L. SWOFFORD 2 Points of View Syst. Biol. 50(5):723 729, 2001 Should We Use Model-Based Methods for Phylogenetic Inference When We Know That Assumptions About Among-Site Rate Variation and Nucleotide Substitution Pattern

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Estimating Divergence Dates from Molecular Sequences

Estimating Divergence Dates from Molecular Sequences Estimating Divergence Dates from Molecular Sequences Andrew Rambaut and Lindell Bromham Department of Zoology, University of Oxford The ability to date the time of divergence between lineages using molecular

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200 Spring 2018 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley D.D. Ackerly Feb. 26, 2018 Maximum Likelihood Principles, and Applications to

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2011 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler Feb. 1, 2011. Qualitative character evolution (cont.) - comparing

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Distances that Perfectly Mislead

Distances that Perfectly Mislead Syst. Biol. 53(2):327 332, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490423809 Distances that Perfectly Mislead DANIEL H. HUSON 1 AND

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used Molecular Phylogenetics and Evolution 31 (2004) 865 873 MOLECULAR PHYLOGENETICS AND EVOLUTION www.elsevier.com/locate/ympev Efficiencies of maximum likelihood methods of phylogenetic inferences when different

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Weighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction

Weighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction Weighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction William J. Bruno,* Nicholas D. Socci, and Aaron L. Halpern *Theoretical Biology and Biophysics, Los Alamos

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A J Mol Evol (2000) 51:423 432 DOI: 10.1007/s002390010105 Springer-Verlag New York Inc. 2000 Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus

More information

Minimum evolution using ordinary least-squares is less robust than neighbor-joining

Minimum evolution using ordinary least-squares is less robust than neighbor-joining Minimum evolution using ordinary least-squares is less robust than neighbor-joining Stephen J. Willson Department of Mathematics Iowa State University Ames, IA 50011 USA email: swillson@iastate.edu November

More information

Phylogenetic methods in molecular systematics

Phylogenetic methods in molecular systematics Phylogenetic methods in molecular systematics Niklas Wahlberg Stockholm University Acknowledgement Many of the slides in this lecture series modified from slides by others www.dbbm.fiocruz.br/james/lectures.html

More information

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2008 Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008 University of California, Berkeley B.D. Mishler March 18, 2008. Phylogenetic Trees I: Reconstruction; Models, Algorithms & Assumptions

More information

Homoplasy. Selection of models of molecular evolution. Evolutionary correction. Saturation

Homoplasy. Selection of models of molecular evolution. Evolutionary correction. Saturation Homoplasy Selection of models of molecular evolution David Posada Homoplasy indicates identity not produced by descent from a common ancestor. Graduate class in Phylogenetics, Campus Agrário de Vairão,

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given Molecular Clocks Rose Hoberman The Holy Grail Fossil evidence is sparse and imprecise (or nonexistent) Predict divergence times by comparing molecular data Given a phylogenetic tree branch lengths (rt)

More information

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057 Bootstrapping and Tree reliability Biol4230 Tues, March 13, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Rooting trees (outgroups) Bootstrapping given a set of sequences sample positions randomly,

More information

Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics

Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics Journal of Mammalian Evolution, Vol. 4, No. 2, 1997 Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics Jack Sullivan1'2 and David L. Swofford1 The monophyly of Rodentia

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection Mark T. Holder and Jordan M. Koch Department of Ecology and Evolutionary Biology, University of

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Systematics - Bio 615

Systematics - Bio 615 Bayesian Phylogenetic Inference 1. Introduction, history 2. Advantages over ML 3. Bayes Rule 4. The Priors 5. Marginal vs Joint estimation 6. MCMC Derek S. Sikes University of Alaska 7. Posteriors vs Bootstrap

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid Lecture 11 Increasing Model Complexity I. Introduction. At this point, we ve increased the complexity of models of substitution considerably, but we re still left with the assumption that rates are uniform

More information

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence Are directed quartets the key for more reliable supertrees? Patrick Kück Department of Life Science, Vertebrates Division,

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

The Importance of Proper Model Assumption in Bayesian Phylogenetics

The Importance of Proper Model Assumption in Bayesian Phylogenetics Syst. Biol. 53(2):265 277, 2004 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150490423520 The Importance of Proper Model Assumption in Bayesian

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution

Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution Accuracy and Power of the Likelihood Ratio Test in Detecting Adaptive Molecular Evolution Maria Anisimova, Joseph P. Bielawski, and Ziheng Yang Department of Biology, Galton Laboratory, University College

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Multiple sequence alignment accuracy and phylogenetic inference

Multiple sequence alignment accuracy and phylogenetic inference Utah Valley University From the SelectedWorks of T. Heath Ogden 2006 Multiple sequence alignment accuracy and phylogenetic inference T. Heath Ogden, Utah Valley University Available at: https://works.bepress.com/heath_ogden/6/

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Kei Takahashi and Masatoshi Nei

Kei Takahashi and Masatoshi Nei Efficiencies of Fast Algorithms of Phylogenetic Inference Under the Criteria of Maximum Parsimony, Minimum Evolution, and Maximum Likelihood When a Large Number of Sequences Are Used Kei Takahashi and

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Preliminaries. Download PAUP* from:   Tuesday, July 19, 16 Preliminaries Download PAUP* from: http://people.sc.fsu.edu/~dswofford/paup_test 1 A model of the Boston T System 1 Idea from Paul Lewis A simpler model? 2 Why do models matter? Model-based methods including

More information

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG Department of Biology (Galton Laboratory), University College London, 4 Stephenson

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Chapter 7: Models of discrete character evolution

Chapter 7: Models of discrete character evolution Chapter 7: Models of discrete character evolution pdf version R markdown to recreate analyses Biological motivation: Limblessness as a discrete trait Squamates, the clade that includes all living species

More information

AUTHOR COPY ONLY. Hetero: a program to simulate the evolution of DNA on a four-taxon tree

AUTHOR COPY ONLY. Hetero: a program to simulate the evolution of DNA on a four-taxon tree APPLICATION NOTE Hetero: a program to simulate the evolution of DNA on a four-taxon tree Lars S Jermiin, 1,2 Simon YW Ho, 1 Faisal Ababneh, 3 John Robinson, 3 Anthony WD Larkum 1,2 1 School of Biological

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Phylogenetic inference: from sequences to trees

Phylogenetic inference: from sequences to trees W ESTFÄLISCHE W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT NIVERSITÄT WILHELMS-U ÜNSTER MM ÜNSTER VOLUTIONARY FUNCTIONAL UNCTIONAL GENOMICS ENOMICS EVOLUTIONARY Bioinformatics 1 Phylogenetic inference: from sequences

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Statistical estimation of models of sequence evolution Phylogenetic inference using maximum likelihood:

More information

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny 008 by The University of Chicago. All rights reserved.doi: 10.1086/588078 Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny (Am. Nat., vol. 17, no.

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons Letter to the Editor Slow Rates of Molecular Evolution Temperature Hypotheses in Birds and the Metabolic Rate and Body David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons *Department

More information

Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods

Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods Syst. Biol. 47(2): 228± 23, 1998 Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods JOHN J. WIENS1 AND MARIA R. SERVEDIO2 1Section of Amphibians

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

To link to this article: DOI: / URL:

To link to this article: DOI: / URL: This article was downloaded by:[ohio State University Libraries] [Ohio State University Libraries] On: 22 February 2007 Access Details: [subscription number 731699053] Publisher: Taylor & Francis Informa

More information

Supplementary Materials for

Supplementary Materials for advances.sciencemag.org/cgi/content/full/1/8/e1500527/dc1 Supplementary Materials for A phylogenomic data-driven exploration of viral origins and evolution The PDF file includes: Arshan Nasir and Gustavo

More information

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Mathematical Statistics Stockholm University Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences Bodil Svennblad Tom Britton Research Report 2007:2 ISSN 650-0377 Postal

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Class: _ Date: _ Ch 17 Practice test 1. A segment of DNA that stores genetic information is called a(n) a. amino acid. b. gene. c. protein. d. intron. 2. In which of the following processes does change

More information

An Investigation of Phylogenetic Likelihood Methods

An Investigation of Phylogenetic Likelihood Methods An Investigation of Phylogenetic Likelihood Methods Tiffani L. Williams and Bernard M.E. Moret Department of Computer Science University of New Mexico Albuquerque, NM 87131-1386 Email: tlw,moret @cs.unm.edu

More information

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation

Inferring Complex DNA Substitution Processes on Phylogenies Using Uniformization and Data Augmentation Syst Biol 55(2):259 269, 2006 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 101080/10635150500541599 Inferring Complex DNA Substitution Processes on Phylogenies

More information

An Introduction to Bayesian Inference of Phylogeny

An Introduction to Bayesian Inference of Phylogeny n Introduction to Bayesian Inference of Phylogeny John P. Huelsenbeck, Bruce Rannala 2, and John P. Masly Department of Biology, University of Rochester, Rochester, NY 427, U.S.. 2 Department of Medical

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Introduction to characters and parsimony analysis

Introduction to characters and parsimony analysis Introduction to characters and parsimony analysis Genetic Relationships Genetic relationships exist between individuals within populations These include ancestordescendent relationships and more indirect

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

Pinvar approach. Remarks: invariable sites (evolve at relative rate 0) variable sites (evolves at relative rate r)

Pinvar approach. Remarks: invariable sites (evolve at relative rate 0) variable sites (evolves at relative rate r) Pinvar approach Unlike the site-specific rates approach, this approach does not require you to assign sites to rate categories Assumes there are only two classes of sites: invariable sites (evolve at relative

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Proceedings of the SMBE Tri-National Young Investigators Workshop 2005

Proceedings of the SMBE Tri-National Young Investigators Workshop 2005 Proceedings of the SMBE Tri-National Young Investigators Workshop 25 Control of the False Discovery Rate Applied to the Detection of Positively Selected Amino Acid Sites Stéphane Guindon,* Mik Black,*à

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information