Phylogenetics in the Age of Genomics: Prospects and Challenges

Similar documents
Dr. Amira A. AL-Hosary

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Phylogenetic inference

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Anatomy of a species tree

Lecture 11 Friday, October 21, 2011

Constructing Evolutionary/Phylogenetic Trees

Chapter 16: Reconstructing and Using Phylogenies

8/23/2014. Phylogeny and the Tree of Life

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

Tree thinking pretest

C.DARWIN ( )

Constructing Evolutionary/Phylogenetic Trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Algorithms in Bioinformatics

The course Phylogenetic data analysis (period IV, 2010) is an in-depth course on this topic

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Phylogenetic Tree Reconstruction

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Classification and Phylogeny

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Phylogeny & Systematics: The Tree of Life

Classification and Phylogeny

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenomics: the beginning of incongruence?

Letter to the Editor. Department of Biology, Arizona State University

What is Phylogenetics

How should we organize the diversity of animal life?

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Effects of Gap Open and Gap Extension Penalties

1 ATGGGTCTC 2 ATGAGTCTC

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

How to read and make phylogenetic trees Zuzana Starostová

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Lecture 6 Phylogenetic Inference

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

Phylogeny and the Tree of Life

Biology 1B Evolution Lecture 2 (February 26, 2010) Natural Selection, Phylogenies

Classification, Phylogeny yand Evolutionary History

Phylogenetic Analysis

Phylogenetic Analysis

Biology 211 (2) Week 1 KEY!

Name: Class: Date: ID: A

BINF6201/8201. Molecular phylogenetic methods

Phylogeny is the evolutionary history of a group of organisms. Based on the idea that organisms are related by evolution

Reconstructing the history of lineages

PHYLOGENY AND SYSTEMATICS

Phylogenetic analysis. Characters

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Phylogenetic Analysis

Chapter 26 Phylogeny and the Tree of Life

The practice of naming and classifying organisms is called taxonomy.

Introduction to characters and parsimony analysis

Macroevolution Part I: Phylogenies

A (short) introduction to phylogenetics

Estimating Evolutionary Trees. Phylogenetic Methods

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

9/19/2012. Chapter 17 Organizing Life s Diversity. Early Systems of Classification

5/4/05 Biol 473 lecture

Phylogeny and the Tree of Life

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Quantifying sequence similarity

Fig. 26.7a. Biodiversity. 1. Course Outline Outcomes Instructors Text Grading. 2. Course Syllabus. Fig. 26.7b Table

Phylogeny: building the tree of life

Cladistics and Bioinformatics Questions 2013

Multiple Sequence Alignment. Sequences

BIOLOGY 432 Midterm I - 30 April PART I. Multiple choice questions (3 points each, 42 points total). Single best answer.

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

Phylogenetics. BIOL 7711 Computational Bioscience

Chapter 26 Phylogeny and the Tree of Life

EVOLUTIONARY DISTANCES

Gene regulation: From biophysics to evolutionary genetics

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Consensus methods. Strict consensus methods

Organizing Life s Diversity

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

Supplementary material to Whitney, K. D., B. Boussau, E. J. Baack, and T. Garland Jr. in press. Drift and genome complexity revisited. PLoS Genetics.

Transcription:

Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab

http://pubmed2wordle.appspot.com/

What is a Phylogenetic Tree? A phylogenetic tree is the mathematical structure used to depict the evolutionary history of a group of organisms or genes Phylogenetic trees show historical relationships, not similarities

Why are Phylogenetic Trees Useful? Benefits to Science: The origin and history of life Evolution of molecules, phenotypes and developmental mechanisms Benefits to Society: Human health (identification of disease agents & reservoirs) Agriculture (crops wild relatives) Biodiversity (conservation strategies)

CSI: Phylogeny A gastroenterologist was accused of second-degree murder for injecting his former girlfriend with blood obtained from an HIV type 1 (HIV-1)-infected patient under his care Phylogenetic analyses of HIV-1 sequences were admitted and used as evidence in the court Patient and Victim HIV-1 sequences Sequences from local population sample of HIV-1 infected individuals Metzker et al. (2002) PNAS

Phylogenetic Concepts and Terminology tree / topology taxon - taxa branches, nodes & internodes polytomy & resolution

Rooted and Unrooted Trees

Phylogenetic Concepts and Terminology

The Meaning of Vertical and Horizontal Axes in Trees

Estimating Phylogenies 1. Select an optimality criterion 2. Select a search strategy 3. Use selected search strategy to generate a series of trees, and apply selected optimality criterion to each, always keeping track of the best tree(s) examined thus far

Optimality Criteria Criterion Minimum Evolution Least Squares Parsimony Maximum Likelihood Bayesian Inference Data pairwise distances pairwise distances discrete characters discrete characters discrete characters Distance methods first convert an alignment into a matrix of pairwise distances Discrete character methods use the alignment as is and consider directly each site/character

Search Strategies There is a need for a search strategy because the number of possible trees goes up very fast with increasing number of species T BT ( ) (2n 5) n 3 Search Strategy Exhaustive Heuristic

Exhaustive Search

Heuristic Search Heuristic searches don t guarantee discovery of optimal topology The lazy subtree rearrangement (LSR) tree search strategy used by the RAxML program

Assessing Robustness in Inference: Bootstrap Characters are sampled with replacement to create many bootstrap replicate data sets equal in size to the original one Each bootstrap replicate data set is analysed (e.g., with parsimony, likelihood) 1234567... seq1: AAAATCGTGGGG seq2: ATAATGGTGGCG seq3: AAAATGGTGGCG seq4: ATAATGGTGGCG Agreement among the resulting trees is summarized with a majority-rule consensus tree The frequency of occurrence of clades, also known as bootstrap values, is a measure of support for those groups

The Problem of Incongruence Gene X Gene Y Species tree? Incongruence is pervasive in the phylogenetics literature Rokas & Chatzimanolis (2008) in Phylogenomics (W. J. Murphy, Ed.)

A Systematic Evaluation of Single Gene Phylogenies S. cerevisiae S. paradoxus S. mikatae S. bayanus Dataset: 106 genes on all 16 chromosomes totaling 127kb corresponding roughly to 1% of the genomic sequence, 2% of genes Analyses: Maximum Likelihood (ML) & Maximum Parsimony (MP) on nt data sets and MP on amino acid data sets Kellis et al. (2003) Nature

Incongruence in Shallow Time ML / MP Rokas et al. (2003) Nature

Concatenation of 106 Genes Yields a Single Yeast Phylogeny ML / MP on nt MP on aa Rokas et al. (2003) Nature

Phylogenetic Accuracy is Positively Correlated with Gene Number Murphy et al. (2001) Science Zanis et al. (2002) PNAS Rokas & Carroll (2005) Mol. Biol. Evol.

Reconstructing the Tree of Life Ciccarelli et al. (2006) Science James et al. (2006) Nature

Methods of Assessing Robustness are Subject to Systematic Error Rokas & Carroll (2006) PLoS Biol.; Rokas et al. (2003) Nature

Gene Trees Can Differ from Species Trees Degnan & Rosenberg (2009) Trends Ecol. Evol.

Inferring the Species Tree from Individual Gene Histories Concordance Factor: The proportion of the genome for which a clade is true STEM: Maximum likelihood estimation of species trees Kubatko et al.(2009) Bioinformatics BEST: Bayesian estimation of species trees Liu (2008) Bioinformatics Ané et al. (2007) Mol. Biol. Evol.

Lack of Resolution Due to Lack of Data or to Radiation? Balavoine & Adoutte (1998) Science / Rokas et al. (2003) Evol. Dev.

Lack of Resolution Among Most Metazoan Phyla 50 genes, ML/MP Rokas et al. (2005) Science

Biological Explanations for the Lack of Resolution Hypothesis I: Mutational Saturation the phylogenetic signal originally contained in these protein sequences has been erased by multiple substitutions Hypothesis II: Evolutionary Radiation the close spacing of cladogenetic events early in the history of metazoans limits their resolution

Time of Origin of Metazoa & Fungi 1) Fossils Metazoa: Xiao et al. (1998) Nature Fungi: Yuan et al. (2005) Science 2) Relative molecular clocks Study Douzery et al. (2004) PNAS Hedges et al. (2004) BMC Evol. Biol. Heckman et al. (2001) Science Metazoa 849 Myr 976 Myr 1177 Myr Fungi 727 Myr 968 Myr 1208 Myr 3) The way this set of 50 genes has evolved across the two Kingdoms is remarkably similar

A Remarkable Contrast in Phylogenetic Resolution Given: lack of resolution in Metazoa (using lots of genes) contrast in resolution between Metazoa and Fungi (using identical genes) The early history of animals is best viewed as an evolutionary radiation 50 genes, ML/MP Rokas et al. (2005) Science

Rapid Tempo of Cladogenesis Early in Metazoan History Valentine (2002) Ann. Rev. Ecol. Syst.

Incongruence in Deep Time Rokas (2008) Ann. Rev. Genet.

What Makes Animals Hard to Resolve But Yeasts Easy? Internode length: influences amount of phylogenetic signal (I) Homoplasy: independent evolution of identical characters (*, ) Rokas & Carroll (2006) PLOS Biol.

the independent evolution of identical character states (e.g., amino acid residues) in different branches of a phylogenetic tree that are not directly inherited from a common ancestor Homoplasy

Quantifying Homoplasy Across the Tree of Life Rokas & Carroll (2008) Mol. Biol. Evol.

Measuring Homoplasy Across the Tree of Life Rokas & Carroll (2008) Mol. Biol. Evol.

Measuring Homoplasy Across the Tree of Life

An Overabundance of Homoplastic Substitutions Dataset Obs(H) Exp(H) Obs(H)/Exp(H) Saccharomyces yeasts 4.7% 2.0% 2.4 Aspergillus filamentous fungi 6.9% 2.5% 2.8 Fungal phyla 6.7% 3.5% 1.9 Eukaryote phyla 7.5% 3.5% 2.1 Plants 2.3% 0.9% 2.7 Drosophila fruit-flies 2.6% 1.3% 2.0 Cetartiodactyls mammals 12.3% 4.9% 2.5 Paenungulata mammals 10.4% 5.0% 2.1 Metazoan phyla 7.4% 3.3% 2.3 Vertebrates 10.2% 3.2% 3.2 Average: 7.1% 2.3% 2.4 Rokas & Carroll (2008) Mol. Biol. Evol.

Excess Homoplasy is Specific to Homoplastic Substitutions Parsimony-informative sites Clade Obs(H) Exp(H) Obs(H)/Exp(H) Saccharomyces yeasts 3.3% 3.3% 0.0 Aspergillus filamentous ascomycetes 10.3% 9.8% 1.1 Fungal phyla 2.5% 2.2% 1.1 Eukaryotic phyla 2.3% 2.1% 1.1 Land plants 31.6% 31.7% 1.0 Drosophila fruit-flies 10.2% 10.1% 1.0 Cetartiodactyl mammals 4.5% 3.6% 1.3 Metazoan phyla 3.1% 2.8% 1.1 *Paenungulata mammals 5.4% 2.4% 2.3 *Vertebrates 2.9% 2.4% 1.2 Average: 7.6% 7.0% 1.1 Rokas & Carroll (2008) Mol. Biol. Evol.

Homoplasy Stems From Frequently Exchanged Amino Acids 190 possible interchanges among 20 amino acids - 75 can be achieved via a single nucleotide substitution - The other 115 require two or three substitutions 65% of observed interchanges is between the top12 most frequently observed amino acid interchanges a single mutational step away Rokas & Carroll (2008) Mol. Biol. Evol.

Conservative AA Substitutions Are Very Common in Alignments GAC/T (D) Aspartic Acid GAA/G (E) Glutamic Acid >Cimi >Acla >Afis >Afla >Afum >Anid >Anig >Aory >Ater EGAAGPIRFNMRLPESRYHCAGGITTIVVY AATTSPEDVETRKDDERVEHGGGITTVLVY AATASAEEFELRKEDERVEHGDGITTILIY SPAAAADEFDLQRDEDQNDYADSVSSVFIF AATTSAEEFELRKEDERVEHGDGITTILIY APAATADVSDTRLKQEQAEQAGGITAILVY APAAAPDDLETRLEEEQADYAGSVSSILIY SPAAAADEFDLQRDEDQNDYADSVSSVFIF SPAAAPDDVDTQREEDQNDYAGSVSSIFVF

What Are the Limits to Resolution? For internodes in ~ 600 my animal phylogeny Time Amount of Data _ 40 my -> 700 variable sites 10 my -> 2,800 variable sites 1 my -> 28,000 variable sites 0.1 my ->?????? variable sites 1 y ->?????? variable sites Philippe et al. (1994) Development

What If Mammals Had Diversified in the Cambrian? Springer et al. (2003) PNAS

107 Myr tree 600 Myr tree

Phylogenetic Accuracy is Inversely Correlated with Elapsed Time

Failure to Resolve Internodes < 10 Million Years Rokas et al. (2005) Science

Why Are There So Few Bushes? Data in, fully-resolved phylogenetic tree out

The Human / Chimp / Gorilla Tree Inform. Sites 8561 / 11293 1302 / 11293 1430 / 11293 Carroll (2003) Nature Patterson et al. (2006) Nature

Bushes in the Mammalian Tree Inform. Sites 22 25 21 Nishihara et al. (2009) PNAS

The Tetrapod / Lungfish / Coelacanth Bush Inform. Sites The three major lineages first 96 / 294 appeared within 20 30 million years ago, approximately 390 92 / 294 million years ago 106 / 294 44 genes, ML/MP/NJ Takezaki et al. (2004) Mol. Biol. Evol.

Mind the Gap Between Real Data and Models One can use the most sophisticated audio equipment to listen, for an eternity, to a recording of white noise and still not glean a useful scrap of information Rodrigo et al. (1994) Chapter in: Sponge in Time and Space; Biology, Chemistry, Paleontology

Acknowledgements http://as.vanderbilt.edu/rokaslab