Hexapods Resurrected

Similar documents
Hexapoda Origins: Monophyletic, Paraphyletic or Polyphyletic? Rob King and Matt Kretz

ELECTRONIC APPENDIX. This is the Electronic Appendix to the article

ALEXANDRE HASSANIN, 1 NELLY LÉGER, 2 AND JEAN DEUTSCH 3

A Site- and Time-Heterogeneous Model of Amino Acid Replacement

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

The Relationship Between the Rate of Molecular Evolution and the Rate of Genome Rearrangement in Animal Mitochondrial Genomes

The Linnaeus Centre for Bioinformatics, Uppsala University, Sweden 3

Effects of Gap Open and Gap Extension Penalties

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

INCREASED RATES OF MOLECULAR EVOLUTION IN AN EQUATORIAL PLANT CLADE: AN EFFECT OF ENVIRONMENT OR PHYLOGENETIC NONINDEPENDENCE?

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Phylogenomics: the beginning of incongruence?

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Distinctions between optimal and expected support. Ward C. Wheeler

Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites

Homoplasy. Selection of models of molecular evolution. Evolutionary correction. Saturation

Letter to the Editor. Department of Biology, Arizona State University

Dr. Amira A. AL-Hosary

Anatomy of a species tree

The Importance of Data Partitioning and the Utility of Bayes Factors in Bayesian Phylogenetics

Large-scale gene family analysis of 76 Arthropods

Consensus Methods. * You are only responsible for the first two

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

The Importance of Proper Model Assumption in Bayesian Phylogenetics

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Bayesian support is larger than bootstrap support in phylogenetic inference: a mathematical argument

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Hillis DM Inferring complex phylogenies. Nature 383:

Arthropods (Arthropoda)

A Fitness Distance Correlation Measure for Evolutionary Trees

Performance-Based Selection of Likelihood Models for Phylogeny Estimation

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Intraspecific gene genealogies: trees grafting into networks

Molecular Phylogenetics and Evolution

C3020 Molecular Evolution. Exercises #3: Phylogenetics

A Guide to Phylogenetic Reconstruction Using Heterogeneous Models A Case Study from the Root of the Placental Mammal Tree

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

Phylogenetics in the Age of Genomics: Prospects and Challenges

Supplementary Materials for

Parsimony via Consensus

Frequentist Properties of Bayesian Posterior Probabilities of Phylogenetic Trees Under Simple and Complex Substitution Models

Techniques for generating phylogenomic data matrices: transcriptomics vs genomics. Rosa Fernández & Marina Marcet-Houben

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Bayesian Models for Phylogenetic Trees

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Constructing Evolutionary/Phylogenetic Trees

Phylogenetic Inference and Parsimony Analysis

Lecture 4. Models of DNA and protein change. Likelihood methods

Ecdysozoan phylogeny and Bayesian inference: first use of nearly complete 28S and 18S rrna gene sequences to classify the arthropods and their kin q

Chapter 26 Phylogeny and the Tree of Life

Points of View JACK SULLIVAN 1 AND DAVID L. SWOFFORD 2

Phylogenetic Tree Reconstruction

Fundamental Differences Between the Methods of Maximum Likelihood and Maximum Posterior Probability in Phylogenetics

Estimating Divergence Dates from Molecular Sequences

An Investigation of Phylogenetic Likelihood Methods

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Impact of Missing Data on Phylogenies Inferred from Empirical Phylogenomic Data Sets

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

8/23/2014. Phylogeny and the Tree of Life

Evaluating phylogenetic hypotheses

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Estimating maximum likelihood phylogenies with PhyML

Constructing Evolutionary/Phylogenetic Trees

Reconstructing the history of lineages

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Lie Markov models. Jeremy Sumner. School of Physical Sciences University of Tasmania, Australia

Missing data and influential sites: choice of sites for phylogenetic analysis can be as important as taxon-sampling and model choice

Museum of Natural History, New York, New York, USA

Phylogenetic Inference using RevBayes

BINF6201/8201. Molecular phylogenetic methods

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Genome-scale approaches to resolving incongruence in molecular phylogenies

Phylogenetics. BIOL 7711 Computational Bioscience

Inferring Molecular Phylogeny

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Consensus methods. Strict consensus methods

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Copyright. Jeremy Matthew Brown

The Effects of Nucleotide Substitution Model Assumptions on Estimates of Nonparametric Bootstrap Support

Temporal Trails of Natural Selection in Human Mitogenomes. Author. Published. Journal Title DOI. Copyright Statement.

Points of View. Congruence Versus Phylogenetic Accuracy: Revisiting the Incongruence Length Difference Test

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Chapter 7: Models of discrete character evolution

Concepts and Methods in Molecular Divergence Time Estimation

Systematics - Bio 615

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1

Elongation Factor-2: A Useful Gene for Arthropod Phylogenetics

The number of times eyes originated during evolution is often

What is Phylogenetics

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

A (short) introduction to phylogenetics

AUTHOR COPY ONLY. Hetero: a program to simulate the evolution of DNA on a four-taxon tree

Transcription:

Hexapods Resurrected (Technical comment on: "Hexapod Origins: Monophyletic or Paraphyletic?") Frédéric Delsuc, Matthew J. Phillips and David Penny The Allan Wilson Centre for Molecular Ecology and Evolution Massey University, Palmerston North, New Zealand Correspondence: David Penny The Allan Wilson Center for Molecular Ecology and Evolution Massey University, Palmerston North, P O Box 11-222 New Zealand Tel: +64 6 350 5033 / Fax: +64 6 350 5626 Courier Address: Institute of Molecular BioSciences, Science Tower D E-mail: D.Penny@massey.ac.nz - 1 -

Abstract Nardi et al. (Science, 21 March 2003, 1887) suggested that extant hexapods might be diphyletic based on the analysis of amino acids sequences of mitochondrial genes. However, improved phylogenetic analyses of nucleotide sequences (RY-coding) from the same genes consistently retrieve the classical hypothesis of hexapod monophyly. Text (741 words) A recent paper (1) suggested, rather cautiously, that hexapods (insects plus collembolans in their dataset) might be diphyletic, rather than forming a monophyletic group. In their results, collembolans appeared well separated from other insects and emerged before crustaceans. Such an unexpected result has huge consequences for the interpretation of both morphological and developmental evolution in arthropods (2) and therefore deserves further scrutiny especially from the methodological point of view. The authors drew their conclusions from maximum likelihood and Bayesian analyses at the amino acid level of four of the 12 mitochondrial proteins for both the original data set of 35 taxa and a reduced data set of 15 taxa. However, phylogenetic analyses of amino acids currently suffer from potential caveats. First, the currently available models of mitochondrial amino acid substitution are based on empirically deduced matrices from mammalian dominated sequence databases. Second, the maximum likelihood implementation used by the authors does not model the variation of rate across sites, known to be one of the most important parameter of the likelihood model (3). And third, bias in - 2 -

nucleotide composition also affects the amino acid composition of the gene product causing potential problems for phylogenetic reconstruction (4). Some of these pitfalls might be avoided by analyzing nucleotide sequences for which more realistic models of sequence evolution and powerful reconstruction methods are available. In particular, we have recently shown in the case of mammalian complete mitochondrial genomes (5) that it is possible to deal with saturation and base composition heterogeneity by recoding the nucleotides into purines (R) and pyrimidines (Y). This provided a solution to longstanding controversies concerning the position of the root of the mammalian tree (5). Applying this strategy on nucleotides from the original Nardi et al. data set strongly suggests that by correcting for different artifacts it is possible to extract a useful historical signal. Unlike the authors (1), we were able to retrieve the bee (Apis) and the louse (Heterodoxus) within insects (Fig. 1). The artefactual position of these taxa as sister-groups of ticks was explained as being a consequence of their very high shared AT nucleotide composition reflected in their amino acid content (1). From our results, base composition heterogeneity seems to be more easily accommodated in phylogenetic reconstructions using nucleotides. More importantly, our analysis conforms with classical views of arthropod phylogeny: collembolans are sister-group of insects, and these monophyletic hexapods group with crustaceans into Pancrustacea (Fig. 1). One remaining problem with this tree concerns the paraphyly of myriapods induced by the nesting of the centipede (Lithobius) inside chelicerates. - 3 -

However, as noted by the authors (1), their 35 taxa dataset also suffers from extreme rate heterogeneity among taxa, rendering it difficult to draw firm conclusions from. Thus, to check the collembolan position further, they reduced their data set to 15 taxa with more homogeneous rates and amino acid compositions. Despite their conservative analysis, they still reported collembolans outside both insects and crustaceans, rendering hexapods diphyletic. However, such a reduced dataset is particularly prone to systematic biases from low taxon sampling (6). In fact, deleting taxa with very anomalous rates and nucleotide composition can be helpful, but care is required not to delete taxa that then leave long isolated branches potentially leading to long branches attraction phenomena (7). More specifically, the inclusion of a single outgroup can have a strong impact on the phylogenetic reconstruction even in the absence of rate heterogeneity (8). In the case of placental mammal mitogenomics, taxon sampling has been shown to be a major source of phylogenetic error (9) and we found that increasing the number and diversity of taxa gave excellent agreement between nuclear and mitochondrial data (10). To maximize taxon sampling, we constructed a well balanced 25 taxa data set designed to break isolated long branches (especially in the outgroup) without adding strong rate heterogeneity. Phylogenetic analyses of this nucleotide data set including RY-coded third codon positions produced a tree where Arthropoda, Pancrustacea, Hexapoda, Insecta and Pterygota all appear as monophyletic groups, though with variable support (Fig. 2). Moreover, this topology is much more compatible with the current views on arthropod phylogeny (11). The probability of randomly selecting a topology compatible with this prior hypothesis is so small (10) that it provides strong evidence in favor of its veracity. - 4 -

Obviously, additional complete mitochondrial genomes are needed to strengthen the tree further, but with the data and methods currently at hand, the hypothesis of a common ancestry for extant hexapods cannot be rejected. References and notes 1. F. Nardi et al., Science 299, 1887 (2003). 2. R.H Thomas, Science 299, 1854 (2003). 3. J. Sullivan, D.L. Swofford, Syst. Biol. 50, 723 (2001). 4. P.G. Foster, D.A. Hickey, J. Mol. Evol. 48, 284 (1999). 5. M.J. Phillips, D. Penny, Mol. Phylogenet. Evol. 28, 171 (2003). 6. D.J. Zwickl, D.M. Hillis, Syst. Biol. 51, 588 (2002). 7. M.D. Hendy, D. Penny, Syst. Zool. 38, 297 (1989). 8. B.R. Holland, D. Penny, M.D. Hendy, Syst. Biol. 52, 229 (2003). 9. H. Philippe, J. Mol. Evol. 45, 712 (1997). 10. Y.-H. Lin et al., Mol. Biol. Evol. 19, 2060 (2002). 11. G. Giribet, G.D. Hedgecombe, W.C. Wheeler, Nature 413, 157 (2001). 12. F. Ronquist, J.P. Huelsenbeck, Bioinformatics 19, 1572 (2003). 13. D.L. Swofford, PAUP Sinauer Associates, Sunderland Massachusetts (2002). 14. D. Posada, K.A. Crandall, Bioinformatics 14, 817 (1998). 15. Francesco Nardi and colleagues kindly sent us their amino acid data set. Emmanuel Douzery provided helpful comments. Our data sets are available at http://awcmee.massey.ac.nz/downloads.htm. This work was supported by a Lavoisier - 5 -

Postdoctoral Grant from the French Ministry of Foreign Affairs to FD and by the New Zealand Marsden Fund. Figure legends Fig. 1: Bayesian 50% majority rule consensus tree with associated branch lengths obtained using nucleotide sequences of COX1, COX2, COX3 and CYTB (3750 sites) corresponding to the 35 taxa data set of Nardi et al. (1). The first and third codon positions were RY-coded whereas second codon positions were kept as nucleotides. MrBayes version 3.0b4 (12) was used to perform a partitioned likelihood Bayesian search where three independent substitution models were attributed to each codon position: a two-state substitution model + I + Γ for RY coded first and third codon positions and a GTR + I + Γ model for second codon position nucleotides. Four incrementally heated MCMCMC were run for 500,000 generations sampling trees and parameters every 10 generations. The consensus tree was obtained from the 35,000 trees sampled after the initial burn in period. Values at nodes indicate Bayesian posterior probabilities ( = 1.00). Note that the terminal branch lengths leading to the bee (Apis) and the louse (Heterodoxus) have been reduced by a factor three. Underlined taxa are not included in the 25 taxa data set. - 6 -

Fig. 2: Maximum likelihood (ML) phylogram obtained using nucleotide sequences of COX1, COX2, COX3 and CYTB for a 25 taxa data set (3777 sites). The third codon positions were RY-coded whereas first and second codon positions were kept as nucleotides. PAUP version 4.0b10 (13) was used to perform a ML heuristic search under the best fitting GTR + I + Γ model and associated ML estimates of parameters as determined by Modeltest version 3.06 (14). A partitioned likelihood Bayesian search was carried out with MrBayes (12) using a GTR + I + Γ model for first and second codon position nucleotides and a two-state substitution model + I + Γ for the RY-coded third codon positions with the same parameter settings than in Fig. 1. Values at nodes indicate ML bootstrap proportions (100 replications) / Bayesian posterior probabilities. The two collembolans are figured in bold. - 7 -

Figure 1 0.46 0.1 Katharina Loligo Albinaria Lumbricus Platynereis 0.89 Narceus Thyropygus MYRIAPODA Lithobius 0.72 0.61 Limulus 0.84 Ixodes CHELICERATA Rhipicephalus Artemia 0.75 Daphnia ARTHROPODA CRUSTACEA 0.91 1.00 Pagurus Penaeus Gomphiocephalus COLLEMBOLA PANCRUSTACEA Tetrodontophora 0.96 Tricholepidion Zygentoma Locusta Orthoptera HEXAPODA 0.40 0.57 Triatoma Hemiptera Crioceris INSECTA Coleoptera 0.87 Tribolium 0.70 Apis Hymenoptera PTERYGOTA Heterodoxus Phthiraptera 0.86 Bombyx mandarina 0.68 Bombyx mori Lepidoptera Ostrinia furnacalis Ostrinia nubilalis 0.76 Anopheles quadrimaculatus Anopheles gambiae Chrysomya Cochliomyia Diptera Ceratitis Drosophila yakuba Drosophila melanogaster - 8 -

Figure 2 91 / 1.00 Katharina Loligo MOLLUSCA 99 / 1.00 Platynereis Lumbricus ANNELIDA Limulus Chelicerata 95 / 1.00 Thyropygus Narceus Myriapoda Pagurus ARTHROPODA 51 / 1.00 45 / 0.92 Penaeus Daphnia Artemia Crustacea PANCRUSTACEA 63 / 1.00 HEXAPODA Tricholepidion Triatoma Gomphiocephalus Tetrodontophora 56 / 0.99 Crioceris INSECTA Tribolium 77 / 0.98 PTERYGOTA 49 / 0.88 54 / 0.76 66 / 0.98 Locusta Ostrinia Bombyx 71 / 0.98 Anopheles 0.1 77 / 0.99 70 / 0.91 Drosophila Ceratitis Chrysomya Cochliomyia - 9 -