Supplementary material to Whitney, K. D., B. Boussau, E. J. Baack, and T. Garland Jr. in press. Drift and genome complexity revisited. PLoS Genetics.

Similar documents
Drift and Genome Complexity Revisited

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

Phylogenetics: Building Phylogenetic Trees

Chapter 26 Phylogeny and the Tree of Life

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Biol 206/306 Advanced Biostatistics Lab 11 Models of Trait Evolution Fall 2016

Concepts and Methods in Molecular Divergence Time Estimation

The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome. The minimal prokaryotic genome

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Macroevolution Part I: Phylogenies

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Dr. Amira A. AL-Hosary

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

An introduction to the picante package

AP Biology. Cladistics

Effects of Gap Open and Gap Extension Penalties

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Models of Continuous Trait Evolution

BINF6201/8201. Molecular phylogenetic methods

PHYLOGENY AND SYSTEMATICS

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

A Protocol for Diagnosing the Effect of Calibration Priors on Posterior Time Estimates: A Case Study for the Cambrian Explosion of Animal Phyla

8/23/2014. Phylogeny and the Tree of Life

Phylogenetic molecular function annotation

SUPPLEMENTARY INFORMATION

Phys 214. Planets and Life

Computational approaches for functional genomics

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Mixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University.

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

SUPPLEMENTARY INFORMATION

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Microbial Taxonomy. Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

BRIEF COMMUNICATIONS

Phenotypic Evolution. and phylogenetic comparative methods. G562 Geometric Morphometrics. Department of Geological Sciences Indiana University

A (short) introduction to phylogenetics

Microbial Diversity. Yuzhen Ye I609 Bioinformatics Seminar I (Spring 2010) School of Informatics and Computing Indiana University

Outline. Classification of Living Things

OpenStax-CNX module: m Animal Phylogeny * OpenStax. Abstract. 1 Constructing an Animal Phylogenetic Tree

Microbial Taxonomy and the Evolution of Diversity

Constructing Evolutionary/Phylogenetic Trees

Towards a general model of superorganism life history

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Chapter 19: Taxonomy, Systematics, and Phylogeny

Also the title of the paper is misleading. It assumes more broad review of beetles. It should be restricted to Staphylinidae.

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Multiple Sequence Alignment. Sequences

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

Name: Class: Date: ID: A

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Chapters 25 and 26. Searching for Homology. Phylogeny

Introduction to Evolutionary Concepts

Outline Classes of diversity measures. Species Divergence and the Measurement of Microbial Diversity. How do we describe and compare diversity?

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Evolutionary Physiology

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Supplementary Information

Origins of Life. Fundamental Properties of Life. Conditions on Early Earth. Evolution of Cells. The Tree of Life

PHYLOGENY & THE TREE OF LIFE

Phylogenetic inference: from sequences to trees

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

Honor pledge: I have neither given nor received unauthorized aid on this test. Name :

Biased amino acid composition in warm-blooded animals

FIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were

BIOLOGY 432 Midterm I - 30 April PART I. Multiple choice questions (3 points each, 42 points total). Single best answer.

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

Theory of Evolution Charles Darwin

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:

Comparative Genomics II

Lecture 11 Friday, October 21, 2011

COMPUTING LARGE PHYLOGENIES WITH STATISTICAL METHODS: PROBLEMS & SOLUTIONS

University of Groningen

Sec$on 9. Evolu$onary Rela$onships

Biology 211 (2) Week 1 KEY!

Transcription:

Supplementary material to Whitney, K. D., B. Boussau, E. J. Baack, and T. Garland Jr. in press. Drift and genome complexity revisited. PLoS Genetics. Tree topologies Two topologies were examined, one favoring the Coelomata hypothesis (Figs. S1a and S2a), in which protostomes are not monophyletic, and one favoring the Ecdysozoa hypothesis (Figs. S1b and S2b), in which protostomes are monophyletic (Adoutte et al. 2000). Branch lengths Fossil estimates -- Divergence dates (in billion years before present) were gathered from the literature (Brocks et al., 2005; Battistuzzi et al., 2004; Douzery et al., 2004; Benton and Donoghue, 2007) and from http://www.fossilrecord.net/, and were applied to the two tree topologies. Twenty-two of the 28 total nodes in the phylogeny were dated in this way. The remaining six nodes were placed evenly between neighboring dated nodes (see BLADJ procedure of Webb et al. 2008). The resulting branch lengths are shown in Fig. S1. rrna estimates -- Branch lengths based on expected numbers of substitutions in ribosomal RNA were determined and applied to the two topologies. Aligned 18s or 16s (small subunit) ribosomal RNA sequences were downloaded from the SILVA database (http://www.arb-silva.de/, Pruesse et al., 2007) for the 29 studied species. Sequences from Trypanosoma vivax and Tetraodon nigroviridis were used to represent respectively Trypanosoma sps. and Fugu rubripes, for which sequences could not be found in the database. Based on this alignment, branch lengths were estimated in the maximum likelihood framework using bppml (Dutheil and Boussau, 2008) using a GTR + G(4 categories) + Invariant model of sequence evolution. The resulting branch lengths are shown in Fig. S2. Regression analyses (corresponding to Table 2 of main paper) The relationship between N e u and genome size for n=29 taxa was examined using REGRESSIONv2.m (Lavin et al. 2008) running in MATLAB v. 7.9.0. Three types of models were examined: ordinary least squares (OLS), phylogenetic generalized least squares (PGLS), and phylogenetic regression in which the residual variation is modeled as an Ornstein-Uhlenbeck process (RegOU). OLS is traditional nonphylogenetic regression, which in effect assumes a star phylogeny in which all species are equally unrelated, and corresponds to the N e u vs. genome size analysis reported in Lynch & Conery (2003). PGLS assumes that residual variation among species is correlated, with the correlation given by a Brownian-motion like process along the specified phylogenetic tree (topology and branch lengths). The RegOU model estimates (via restricted maximum likelihood) the strength of phylogenetic signal in the residual variation simultaneously with the regression coefficients; the former is given by d, the Ornstein- Uhlenbeck transformation parameter. An OU evolutionary model is typically used to model the effects of stabilizing selection around an optimum. When d=0, there is no phylogenetic signal in the residuals from the regression model; when d is significantly greater than 0, significant phylogenetic signal exists (Lavin et al. 2008; Blomberg et al. 2003). In addition to the single OLS analysis, twelve analyses were done. These corresponded to PGLS and RegOU models each run on the six combinations of two topologies (above) and three sets of branch lengths (all=1, as in Whitney & Garland 2010; and Fossil and rrna, as above). We compared the likelihoods of the PGLS and OLS models, with a higher likelihood taken as evidence of a better-fitting model. OU and OLS models were compared with ln maximum likelihood ratio tests and P<0.05 as a cutoff. Whitney et al. Supplement p1

Regression analyses for bacteria The relationship between N e u and genome size for n=7 bacterial species from the dataset of Lynch and Conery (2003) was examined using OLS and PGLS in REGRESSIONv2.m (Lavin et al. 2008) as described above using all=1 branch lengths. N e u threshold values Observations from the dataset of Lynch & Conery (2003) were scored for "low" vs. "high" N e u according to the following thresholds identified in Lynch & Conery (2003): intron number N e u = 0.015; total transposon number (and fraction) N e u = 0.0128. While the former was presented in the text, the latter was determined from Fig. 4 (bottom right panel) of Lynch & Conery (2003) using the program g3data 1.5.1 (http://www.frantz.fi/software/g3data.php) to estimate genome size at the threshold as 11.89 Mb. N e u was then back-calculated from genome size using the regression equation in the Fig.1b legend of Lynch & Conery (2003). References Adoutte A, Balavoine G, Lartillot N, Lespinet O, Prud'homme B, de Rosa R. 2000. The new animal phylogeny: Reliability and implications. Proceedings of the National Academy of Sciences of the United States of America 97(9): 4453-4456. Battistuzzi FU, Feijao A, Hedges SB. A genomic timescale of prokaryote evolution: insights into the origin of methanogenesis, phototrophy, and the colonization of land. BMC Evol Biol. 4:44. (2004). Benton MJ, Donoghue PC. Paleontological evidence to date the tree of life. Mol Biol Evol. 24(1):26-53. (2007) Blomberg SP, T Garland, and AR Ives. 2003. Testing for phylogenetic signal in comparative data: Behavioral traits are more labile. Evolution 57:717-745. Brocks, Jochen J, Love, Gordon D, Summons, Roger E, Knoll, Andrew H, Logan, Graham A, et Bowden, Stephen A. Biomarker evidence for green and purple sulphur bacteria in a stratified Palaeoproterozoic sea. Nature, 437(7060), 866-870. (2005). Douzery EJ, Snell EA, Bapteste E, Delsuc F, Philippe H. The timing of eukaryotic evolution: does a relaxed molecular clock reconcile proteins and fossils? Proc Natl Acad Sci U S A. 101(43):15386-91. (2004). Dutheil J, Boussau B. Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs. BMC Evol Biol. 8:255. (2008). Lavin SR, Karasov WH, Ives AR, Middleton KM, Garland T (2008) Morphometrics of the avian small intestine compared with that of nonflying mammals: A phylogenetic approach. Physiological and Biochemical Zoology 81: 526-550. Lynch M, Conery JS (2003) The origins of genome complexity. Science 302: 1401-1404. Whitney et al. Supplement p2

Pruesse, E., C. Quast, K. Knittel, B. Fuchs, W. Ludwig, J. Peplies, and F. O. Glöckner. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nuc. Acids Res. 35(21):7188-96. (2007). Webb CO, Ackerly DD, Kembel SW (2008) Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics 24: 2098-2100. Whitney KD, Garland Jr. T (2010) Did genetic drift drive increases in genome complexity? PLoS Genetics 6: e1001080. Whitney et al. Supplement p3

a) b) Figure S1: Branch lengths based on the fossil record. Dates are in billion years. a) Tree topology according to the Coelomata hypothesis. b) Tree topology according to the Ecdysozoa hypothesis. Whitney et al. Supplement p4

a) b) Figure S2: Branch lengths based on expected rrna substitutions. a) Tree topology according to the Coelomata hypothesis. b) Tree topology according to the Ecdysozoa hypothesis. Whitney et al. Supplement p5