Evaluating phylogenetic hypotheses

Similar documents
(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Ratio of explanatory power (REP): A new measure of group support

Phylogenetic analyses. Kirsi Kostamo

C3020 Molecular Evolution. Exercises #3: Phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Cladistics. The deterministic effects of alignment bias in phylogenetic inference. Mark P. Simmons a, *, Kai F. Mu ller b and Colleen T.

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Molecular Phylogenetics and Evolution

Consensus Methods. * You are only responsible for the first two

Effects of Gap Open and Gap Extension Penalties

Questions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Bootstrapping and Tree reliability. Biol4230 Tues, March 13, 2018 Bill Pearson Pinn 6-057

Distinctions between optimal and expected support. Ward C. Wheeler

Dr. Amira A. AL-Hosary

Introduction to characters and parsimony analysis

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Constructing Evolutionary/Phylogenetic Trees

Systematics - Bio 615

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

A data based parsimony method of cophylogenetic analysis

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Constructing Evolutionary/Phylogenetic Trees

Assessing Phylogenetic Hypotheses and Phylogenetic Data

Combining Data Sets with Different Phylogenetic Histories

On Gaps. Gonzalo Giribet and Ward C. Wheeler

Consensus methods. Strict consensus methods

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenetic Tree Reconstruction

Phylogenetic inference

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

How to read and make phylogenetic trees Zuzana Starostová

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

Biologists have used many approaches to estimating the evolutionary history of organisms and using that history to construct classifications.

A Contribution to the Phylogeny of the Ciidae and its Relationships with Other Cucujoid and Tenebrionoid Beetles (Coleoptera: Cucujiformia)

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Phylogenetic Inference and Parsimony Analysis

Phylogenetic hypotheses and the utility of multiple sequence alignment

Lecture 6 Phylogenetic Inference

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Total Evidence Or Taxonomic Congruence: Cladistics Or Consensus Classification

Lecture V Phylogeny and Systematics Dr. Kopeny

Reconstructing the history of lineages

Phylogenetic methods in molecular systematics

Weighted compromise trees: a method to summarize competing phylogenetic hypotheses

Concepts and Methods in Molecular Divergence Time Estimation

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Multiple sequence alignment accuracy and phylogenetic inference

Chapter 19: Taxonomy, Systematics, and Phylogeny

CLIFFORD W. CUNNINGHAM. Zoology Department, Duke University, Durham, North Carolina , USA;

A (short) introduction to phylogenetics

8/23/2014. Phylogeny and the Tree of Life

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

Cladistics and Bioinformatics Questions 2013

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

1. Can we use the CFN model for morphological traits?

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Letter to the Editor. The Effect of Taxonomic Sampling on Accuracy of Phylogeny Estimation: Test Case of a Known Phylogeny Steven Poe 1

Integrating Ambiguously Aligned Regions of DNA Sequences in Phylogenetic Analyses Without Violating Positional Homology

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

--Therefore, congruence among all postulated homologies provides a test of any single character in question [the central epistemological advance].

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Letter to the Editor. Department of Biology, Arizona State University

How should we organize the diversity of animal life?

Phylogenetics: Parsimony

A Chain Is No Stronger than Its Weakest Link: Double Decay Analysis of Phylogenetic Hypotheses

Data exploration in phylogenetic inference: scientific, heuristic, or neither. and Arnold G. Kluge c, * Accepted 16 June 2003

5 Quantitative Approaches to Phylogenetics


THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Department of Entomology, The Natural History Museum, London SW7 5BD, United Kingdom; 2

Integrating Fossils into Phylogenies. Throughout the 20th century, the relationship between paleontology and evolutionary biology has been strained.

BINF6201/8201. Molecular phylogenetic methods

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

Impact of errors on cladistic inference: simulation-based comparison between parsimony and three-taxon analysis

Phylogeny of the Drosophila saltans Species Group Based on Combined Analysis of Nuclear and Mitochondrial DNA Sequences

What is Phylogenetics

Split Support and Split Con ict Randomization Tests in Phylogenetic Inference

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Assessing Progress in Systematics with Continuous Jackknife Function Analysis

Phylogenetic Analysis

Phylogenetic Analysis

BIOL 428: Introduction to Systematics Midterm Exam

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Points of View. Congruence Versus Phylogenetic Accuracy: Revisiting the Incongruence Length Difference Test

THE EMERGING STATISTICAL PERSPECTIVE IN SYSTEMATICS: A COMMENT ON MARES AND BRAUN

Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Chapter 9 BAYESIAN SUPERTREES. Fredrik Ronquist, John P. Huelsenbeck, and Tom Britton

Classification, Phylogeny yand Evolutionary History

5 Measures of Support

Transcription:

Evaluating phylogenetic hypotheses Methods for evaluating topologies Topological comparisons: e.g., parametric bootstrapping, constrained searches Methods for evaluating nodes Resampling techniques: bootstrapping, jackknifing, symmetric resampling Character-based methods: Bremer support (or decay index), Relative Bremer support, etc. Model dependence (stability analysis) Bayesian phylogenetics

Nodal support Resampling techniques A resampling technique operates by calculating a tree from each of the several pseudoreplicate matrices, each of which is obtained by randomly selecting characters (sites) from the original matrix. This procedure is repeated a number of times (i.e. 1,000). A resampling frequency is then calculated for each group, this being the fraction of pseudoreplicate trees on which the group occurs Bootstrapping (Felsenstein, 1985): Characters are resampled with replacement Jackknifing (Farris et al. 1996; Farris 1997): Characters are resampled by independent removal (with probability of e -1 ) Symmetric resampling (Goloboff et al., 2003) Resampling techniques as optimality criteria Search strategies for resampling

Bootstrapping

Jackknifing It simplifies the relationship between frequency and support. For data with no missing entries, the expected jackknife frequency of a group G set off by k uncontradicted characters is just 1-e -k. Bootstrapping has the same expectation, but only when the total number n of characters in the matrix is very large. Otherwise, the expected frequency depends on both k and n. Then the bootstrap frequency of G would change with seemingly irrelevant factors, such as the number of autapomorphies or the number of characters supporting groups entirely separate from G. Outgroup 0 0 0 TaxonA 1 1 0 TaxonB 1 1 0 TaxonC 0 0 1 1-e -k Bootstrap proportion = 96.3% for (A, B) Jackknife proportion = 86.7% for (A, B)

Nodal support Character-based techniques Bremer support (branch support; decay index) (Bremer 1988): The number of extra steps required before a clade is lost from the strict consensus tree of near-minimum-length cladograms. This can be calculated by the step (fit) difference between two trees. Given two trees, when character I fits the most parsimonious tree better, the fit difference for that character in the two trees is favorable to the most parsimonious tree (f i ). When character I fits the least parsimonious tree better, the fit difference is contradictory (c i ). Define F = f i, and C = c i. The Bremer support measures support of groups using simply the difference F C. Partitioned Bremer support (Baker & DeSalle 1997): describes the relative support of a given partition over a tree generated under multiple partitions. Relative Fit Difference (RFD) or Relative Bremer support (Goloboff & Farris 2001): takes into account evidence in favor and against a given node

Model dependence (Sensitivity analysis sensu Wheeler 1995) Sensitivity analysis (SA) is the study of how the variation in the output of a model (numerical or otherwise) can be apportioned, qualitatively or quantitatively, to different sources of variation, and how that given model depends upon the information fed into it. Andrea Saltelli, 2000

Alignments are parameter-dependent (Fitch & Smith 1983; Waterman et al. 1992) Different parameters Different alignments (or homologies) Different phylogenetic hypotheses How can we make use of such variation in our phylogenetic analyses?

Sensitivity Analysis in systematics Data exploration Evaluate sensitivity of hypotheses to parameter variation Evaluate stability of nodes to parameter variation Criteria for choosing optimal analytical parameters Character-based congruence methods Topological-based congruence methods Wheeler, W. C. 1995. Sequence alignment, parameter sensitivity, and the phylogenetic analysis of molecular data. Syst. Biol. 44:321-331. Wheeler, W. C. 1999. Measuring topological congruence by extending character techniques. Cladistics 15:131-135.

16S 18S 28S COI H3 MOR MOL TOT ILD 110 620 307 604 703 135 80 2443 2540 0.03583 111 1113 622 1025 1627 311 80 4769 4861 0.01707 121 1775 939 1688 2365 453 160 7339 7529 0.01979 141 3022 1564 2940 3785 725 320 12319 12694 0.02663 181 5518 2786 5421 6606 1269 640 22192 22932 0.03018 210 751 419 933 703 135 160 3079 3276 0.05342 211 1285 749 1406 1627 311 160 5489 5679 0.02483 221 2084 1184 2398 2365 453 320 8695 9075 0.02986 241 3642 2024 4348 3785 725 640 14979 15729 0.03592 281 6735 3700 8222 6606 1269 1280 27519 29007 0.04120 410 959 612 1533 703 135 320 4198 4615 0.07649 411 1556 947 2061 1627 311 320 6710 7091 0.03794 421 2601 1577 3682 2365 453 640 11064 11860 0.04570 441 4659 2809 6893 3785 725 1280 19633 21266 0.05243 481 8778 5252 13304 6606 1269 2560 36693 40053 0.05702 3221 2245 1232 1862 3254 622 240 9640 9640 0.01919

What is sensitivity analysis? The same raw data are explored under different analytical conditions Molecular data: Parsimony: effect of indels, nucleotide transformations, etc. Maximum Likelihood: effect of models and model corrections Morphological data: Effect of implied weighting on cladogram topology (Prendini, 2003)

How can we represent results from a sensitivity analysis? Present all trees obtained under the different conditions examined Strict consensus of all trees obtained under the different conditions Navajo rugs (or sensitivity plots) Frequency that a given node is obtained under the different analytical conditions This does not require choosing a hypothesis

Strict consensus of all trees under one parameter set Strict consensus of all trees under all parameter sets

GAP: CHANGE RATIO 6 3 5 2 1 8 7 4 TRANSVERSION:TRANSITION RATIO Mytiloidea Arcoidea Limopsoidea Pteroidea Pinnoidea Ostreoidea Anomioidea Limoidea Pectinoidea node 1: Pteriomorphia node 2 node 3 node 4 node 5 node 6 node 7 node 8 Anomioidea + Pectinoidea Anomioidea + Limoidea Anomioidea + node 5 inf. 1 2 4 inf. 1 2 4 inf. 1 2 4 inf. 1 2 4 1 2 4 1 2 4 1 2 4