Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

Similar documents
8/23/2014. Phylogeny and the Tree of Life

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Phylogenetics in the Age of Genomics: Prospects and Challenges

Estimating Evolutionary Trees. Phylogenetic Methods

Letter to the Editor. Department of Biology, Arizona State University

Macroevolution Part I: Phylogenies

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Chapter 26 Phylogeny and the Tree of Life

Classification, Phylogeny yand Evolutionary History

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Anatomy of a species tree

PHYLOGENY AND SYSTEMATICS

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Cladistics and Bioinformatics Questions 2013

Phylogenetic inference

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Dr. Amira A. AL-Hosary

Biology 211 (2) Week 1 KEY!

Reconstructing the history of lineages

Taming the Beast Workshop

PHYLOGENY & THE TREE OF LIFE

Lecture 11 Friday, October 21, 2011

To link to this article: DOI: / URL:

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Phylogenetic Analysis and Intraspeci c Variation : Performance of Parsimony, Likelihood, and Distance Methods

Phylogenetic analyses. Kirsi Kostamo

Phylogeny and the Tree of Life

Phylogeny and the Tree of Life

Hillis DM Inferring complex phylogenies. Nature 383:

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

Classification and Phylogeny

Chapter 19: Taxonomy, Systematics, and Phylogeny

Classification and Phylogeny

Chapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc.

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Introduction to characters and parsimony analysis

Chapter 7: Models of discrete character evolution

Principles of Phylogeny Reconstruction How do we reconstruct the tree of life? Basic Terminology. Looking at Trees. Basic Terminology.

Constructing Evolutionary/Phylogenetic Trees

Phylogeny and the Tree of Life

Systematics - Bio 615

How to read and make phylogenetic trees Zuzana Starostová

Lecture Notes: BIOL2007 Molecular Evolution

How should we organize the diversity of animal life?

Lecture 6 Phylogenetic Inference

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

ESS 345 Ichthyology. Systematic Ichthyology Part II Not in Book

A Phylogenetic Network Construction due to Constrained Recombination

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Understanding How Stochasticity Impacts Reconstructions of Recent Species Divergent History. Huateng Huang

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

Combining Data Sets with Different Phylogenetic Histories

Chapter 16: Reconstructing and Using Phylogenies

Consensus methods. Strict consensus methods

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

Constructing Evolutionary/Phylogenetic Trees

An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

AP Biology. Cladistics

C.DARWIN ( )

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Intraspecific gene genealogies: trees grafting into networks

Processes of Evolution

Algorithms in Bioinformatics

Name. Ecology & Evolutionary Biology 2245/2245W Exam 2 1 March 2014

Consensus Methods. * You are only responsible for the first two

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

Correspondence should be addressed to Jeffrey P. Townsend;

SPECIATION. REPRODUCTIVE BARRIERS PREZYGOTIC: Barriers that prevent fertilization. Habitat isolation Populations can t get together

PHYLOGENY WHAT IS EVOLUTION? 1/22/2018. Change must occur in a population via allele

Lecture V Phylogeny and Systematics Dr. Kopeny

Theory of Evolution Charles Darwin

The practice of naming and classifying organisms is called taxonomy.

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Phylogenomics: the beginning of incongruence?

Biologists have used many approaches to estimating the evolutionary history of organisms and using that history to construct classifications.

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetic Analysis

Nomenclature and classification

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Phylogeny & Systematics: The Tree of Life

Quantifying sequence similarity

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Modern Evolutionary Classification. Section 18-2 pgs

The Tree of Life. Phylogeny

Supplementary Materials for

The impact of missing data on species tree estimation

Transcription:

Phylogenomics Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University

How may we improve our inferences?

How may we improve our inferences? Inferences Data

How may we improve our inferences? Inferences Data Data Experimental Design

How may we improve our inferences? Inferences Data Data Experimental Design Experimental Design Prior Knowledge

Phylogenetic experimental design has a diffuse history Which types of characters are most informative? (Collins et al., 2005; Dequeiroz and Wimberger, 1993; Graybeal, 1994; Naylor and Brown, 1997; Rokas and Holland, 2000; Wiens and Servedio, 1997; Yang, 1998; Zwickl and Hillis, 2002) Would increased taxonomic or character sampling be more informative? (Graybeal, 1998; Hillis, 1998; Kim, 1996, 1998; Poe, 1998; Pollock et al., 2002; Rannala et al., 1998; Rokas and Carroll, 2005; Rosenberg and Kumar, 2001, 2003; Sullivan et al., 1999) Which taxa should be sampled to resolve a given phylogenetic problem? (Goldman, 1998; Huelsenbeck, 1991b; Kim, 1996, 1998; Poe, 2003)

Phylogenetic experimental design has a diffuse history Which types of characters are most informative? (Collins et al., 2005; Dequeiroz and Wimberger, 1993; Graybeal, 1994; Naylor and Brown, 1997; Rokas and Holland, 2000; Wiens and Servedio, 1997; Yang, 1998; Zwickl and Hillis, 2002) Would increased taxonomic or character sampling be more informative? (Graybeal, 1998; Hillis, 1998; Kim, 1996, 1998; Poe, 1998; Pollock et al., 2002; Rannala et al., 1998; Rokas and Carroll, 2005; Rosenberg and Kumar, 2001, 2003; Sullivan et al., 1999) Which taxa should be sampled to resolve a given phylogenetic problem? (Goldman, 1998; Huelsenbeck, 1991b; Kim, 1996, 1998; Poe, 2003)

Which types of characters are most informative? Ancient utility Recent utility Reference Morphological Polymorphic Behavioral Fixed (Dequieroz & Wimberger, 1993) (Wiens & Servedio, 1997) Nonsynonymous Synonymous (Graybeal, 1994; Naylor & Brown, 1997) Amino acids Rarely changing Nucleotides characters are best (Russo et al. 1996, Gissi et al. 2006) (many; Rokas & Holland, 2000)

Can we visualize phylogenetic informativeness?

Can we visualize phylogenetic informativeness? Characters with no differences across taxa are useless

Can we visualize phylogenetic informativeness? Characters with no differences across taxa are useless Characters that are saturated with mutations across taxa are misleading

Can we visualize phylogenetic informativeness? Characters with no differences across taxa are useless Characters that are saturated with mutations across taxa are misleading There exists a happy medium

Can we visualize phylogenetic informativeness? Characters with no differences across taxa are useless Characters that are saturated with mutations across taxa are misleading There exists a happy medium..

Deriving phylogenetic informativeness Calculate the probability of a site being informative, given a rate of evolution, λ, and letting t1 + t2 -> 0. λ^ = 1 4 T Townsend, 2007, Systematic Biology 56:222-231

Simulations demonstrate a 1/4T optimal rate of character change % correct partitions Yang, Systematic Biology (1998) Rate of change

Deriving phylogenetic informativeness Calculate the probability of a site being informative, given a rate of evolution, λ, and letting t1 + t2 -> 0. λ^ = 1 4 T ρ T;λ 1,...,λ n n i=1 ( ) = 16λ i 2 Te 4λ it Townsend, 2007, Systematic Biology 56:222-231

Informativeness is not obvious Townsend, 2007, Systematic Biology 56:222-231

Informativeness is not obvious Townsend, 2007, Systematic Biology 56:222-231

Informativeness is not obvious Townsend, 2007, Systematic Biology 56:222-231

Phylogenetic informativeness yields predictions of relative performance 3 2 1 3.0 2.5 Phylogenetic informativeness 2.0 1.5 1.0 0.5 Generally, higher phylogenetic informativeness of a gene indicates a higher ability to resolve the corresponding node. 500 400 300 200 100 0 0 Millions of years in the past 0 100 200 300 400 500 0 Millions of years in the past Townsend et al., 2008, JME 67: 222-231

Phylogenetic informativeness must be considered with caveats Informativeness profiles provide no clear expectation of performance for specific nodes 500 400 300 200 100 0 0 Millions of years in the past 3.0 2.5 2.0 1.5 1.0 0.5 0 100 200 300 400 500 0 3 2 1 Millions of years in the past Phylogenetic informativeness No reason to expect an x-fold difference in informativeness would result in an X-fold difference in the number of nodes resolved Profiles of informativeness predict signal but not the misleading effect of noise (convergence or parallelism) Townsend et al., 2008, JME 67: 222-231

With profiles alone, the effect of noise can be depicted, but not quantified Informativeness profiles do not account for misleading effects of noise (convergence or parallelism) Phylogenetic informativeness No quantitative rule, but as a rule of thumb, selecting genes that peak deeper than the time interval of interest will minimize the influence of noise Time (ME units) Townsend et al., 2008, JME 67: 222-231

Large data sets lead to both signal and noise Gathering large data sets may enable signal to outweigh noise......but signal and noise are contributed by larger datasets. Time (ME units) Phylogenetic informativeness Thus, large datasets can lead to spurious resolutions of deep polytomies. Deep time and short internodes exacerbate this issue. Townsend et al., 2008, JME 67: 222-231

Matrix methods quantify signal and noise Townsend et al., 2012, Systematic Biology

Matrix methods quantify signal and noise Townsend et al., 2012, Systematic Biology

Matrix methods quantify signal and noise Begin with a four taxon tree, branches to time T, internode t0 Townsend et al., 2012, Systematic Biology

Matrix methods quantify signal and noise Begin with a four taxon tree, branches to time T, internode t0 Townsend et al., 2012, Systematic Biology

Matrix methods quantify signal and noise Begin with a four taxon tree, branches to time T, internode t0 Derive transition matrix for the site patterns for the four species: AAAA, AAAB, AABB, AABC, ABCD Townsend et al., 2012, Systematic Biology

Matrix methods quantify signal and noise Begin with a four taxon tree, branches to time T, internode t0 Derive transition matrix for the site patterns for the four species: AAAA, AAAB, AABB, AABC, ABCD Townsend et al., 2012, Systematic Biology

Matrix methods quantify signal and noise Begin with a four taxon tree, branches to time T, internode t0 Derive transition matrix for the site patterns for the four species: AAAA, AAAB, AABB, AABC, ABCD Power to resolve a node can then be reduced to a biased random walk problem, with each site representing a step. Townsend et al., 2012, Systematic Biology

Each site contributes to the probability of correct resolution, incorrect resolution, or polytomy Noise Polytomy Signal Probability signal 0.43 Probability polytomy 0.14 Probability noise 0.43

Nuclear DNA mtdna Net phylogenetic informativeness Net phylogenetic informativeness time time Dornburg et al., in revision

Frequency COIII COI CytB ND2 ENC1 MYH6 Noise/Polytomy/Signal Dornburg et al., in revision GLYT PTR

Phylogenetic experimental design has a diffuse history Which types of characters are most informative? (Collins et al., 2005; Dequeiroz and Wimberger, 1993; Graybeal, 1994; Naylor and Brown, 1997; Rokas and Holland, 2000; Wiens and Servedio, 1997; Yang, 1998; Zwickl and Hillis, 2002) Would increased taxonomic or character sampling be more informative? (Graybeal, 1998; Hillis, 1998; Kim, 1996, 1998; Poe, 1998; Pollock et al., 2002; Rannala et al., 1998; Rokas and Carroll, 2005; Rosenberg and Kumar, 2001, 2003; Sullivan et al., 1999) Which taxa should be sampled to resolve a given phylogenetic problem? (Goldman, 1998; Huelsenbeck, 1991b; Kim, 1996, 1998; Poe, 2003)

Phylogenetic informativeness can be modified to optimize taxon sampling (g) T-t T * = t T-t t t t = 0 T-t-t Pr{inf}= e 4T λ ( ( ) )λ ( 1 e t 0λ ) 1 e T t 1 ˆλ t (2.2)T Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Increase taxon sampling with fastevolving characters, increase character sampling with slow-evolving characters Optimal choice of taxon and gene depends on rate of character evolution and ingroup divergence time Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Phylogenetic experimental design has a diffuse history Which types of characters are most informative? (Collins et al., 2005; Dequeiroz and Wimberger, 1993; Graybeal, 1994; Naylor and Brown, 1997; Rokas and Holland, 2000; Wiens and Servedio, 1997; Yang, 1998; Zwickl and Hillis, 2002) Would increased taxonomic or character sampling be more informative? (Graybeal, 1998; Hillis, 1998; Kim, 1996, 1998; Poe, 1998; Pollock et al., 2002; Rannala et al., 1998; Rokas and Carroll, 2005; Rosenberg and Kumar, 2001, 2003; Sullivan et al., 1999) Which taxa should be sampled to resolve a given phylogenetic problem? (Goldman, 1998; Huelsenbeck, 1991b; Kim, 1996, 1998; Poe, 2003)

Phylogenetic informativeness can be modified to optimize taxon sampling (g) T-t T * = t T-t t t t = 0 T-t-t Pr{inf}= e 4T λ ( ( ) )λ ( 1 e t 0λ ) 1 e T t 1 ˆλ t (2.2)T Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Four taxa and their ingroups were used to evaluate PITA Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Four taxa and their ingroups were used to evaluate PITA Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Four taxa and their ingroups were used to evaluate PITA Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Four taxa and their ingroups were used to evaluate PITA PezizomycotaSaccharomycetes Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

The Saccharomycetes chronogram features four tip lineages in the genus Saccharomyces, closely related to S. cerevisiae, and a range of ingroup taxa at other depths

Sampling the deepest ingroups typically yields the greatest resolution Log Likelihood Depth of ingroup (Time) Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Sampling using PITA is correlated with support Log Likelihood PITA Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Saccharomycete clade: PITA performs best Cumulative Log Likelihood Rank ordered taxon-gene pairs Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Deep Saccharomycete clade: PITA performs best Cumulative Log Likelihood Rank ordered taxon-gene pairs Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Does the other ingroup clade show the same patterns?

Does the other ingroup clade show the same patterns?

Does the other ingroup clade show the same patterns?

Does the other ingroup clade show the same patterns? PezizomycotaSaccharomycetes

The Pezizomycetes chronogram features no lineages closely related to N. crassa or C. immitis a smaller range of depths of ingroup taxa than in the Saccharomycetes

Sampling the deepest ingroups yields the greatest resolution Log Likelihood Depth of ingroup (Time) Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

PITA is correlated with support Log Likelihood PITA Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Pezizomycete clade: PITA performs best Cumulative Log Likelihood Rank ordered taxon-gene pairs Townsend & Lopez-Giraldez, Systematic Biology 59: 446-457

Conclusions Large data sets contribute both signal and noise. Phylogenetic informativeness profiles can help you prioritize loci - but remember the effect of noise. With fast-evolving characters, increase taxa. With slow-evolving characters, increase characters. The most informative taxa to sample to resolve a node are the deepest ingroups to the node.

Townsend Lab Web 2.0: Scientific Social Collaboration Transcriptomics 10101010101010101010101011010101010101010101010101 0101010101010101010101010 0101010101010101010101010 11011011011011011011011011101101101101101101101101 Computational and 00100100100100100100100100010010010010010010010010 11001100110011001100110011100110011001100110011001 statistical analysis of 0011001100110011001100110 0011001100110011001100110 00100100100100100100100100010010010010010010010010 large datasets 11011011011011011011011011101101101101101101101101 01010101010101010101010100101010101010101010101010 10101010101010101010101011010101010101010101010101 Phylogenomics Fungal Biology and Phylogeny