Evaluating phylogenetic hypotheses

Evaluating phylogenetic hypotheses Methods for evaluating topologies Topological comparisons: e.g., parametric bootstrapping, constrained searches Methods for evaluating nodes Resampling techniques: bootstrapping, jackknifing, symmetric resampling Character-based methods: Bremer support (or decay index), Relative Bremer support, etc. Model dependence (stability analysis) Bayesian phylogenetics

Nodal support Resampling techniques A resampling technique operates by calculating a tree from each of the several pseudoreplicate matrices, each of which is obtained by randomly selecting characters (sites) from the original matrix. This procedure is repeated a number of times (i.e. 1,000). A resampling frequency is then calculated for each group, this being the fraction of pseudoreplicate trees on which the group occurs Bootstrapping (Felsenstein, 1985): Characters are resampled with replacement Jackknifing (Farris et al. 1996; Farris 1997): Characters are resampled by independent removal (with probability of e -1 ) Symmetric resampling (Goloboff et al., 2003) Resampling techniques as optimality criteria Search strategies for resampling

Bootstrapping

Jackknifing It simplifies the relationship between frequency and support. For data with no missing entries, the expected jackknife frequency of a group G set off by k uncontradicted characters is just 1-e -k. Bootstrapping has the same expectation, but only when the total number n of characters in the matrix is very large. Otherwise, the expected frequency depends on both k and n. Then the bootstrap frequency of G would change with seemingly irrelevant factors, such as the number of autapomorphies or the number of characters supporting groups entirely separate from G. Outgroup 0 0 0 TaxonA 1 1 0 TaxonB 1 1 0 TaxonC 0 0 1 1-e -k Bootstrap proportion = 96.3% for (A, B) Jackknife proportion = 86.7% for (A, B)

Nodal support Character-based techniques Bremer support (branch support; decay index) (Bremer 1988): The number of extra steps required before a clade is lost from the strict consensus tree of near-minimum-length cladograms. This can be calculated by the step (fit) difference between two trees. Given two trees, when character I fits the most parsimonious tree better, the fit difference for that character in the two trees is favorable to the most parsimonious tree (f i ). When character I fits the least parsimonious tree better, the fit difference is contradictory (c i ). Define F = f i, and C = c i. The Bremer support measures support of groups using simply the difference F C. Partitioned Bremer support (Baker & DeSalle 1997): describes the relative support of a given partition over a tree generated under multiple partitions. Relative Fit Difference (RFD) or Relative Bremer support (Goloboff & Farris 2001): takes into account evidence in favor and against a given node

Model dependence (Sensitivity analysis sensu Wheeler 1995) Sensitivity analysis (SA) is the study of how the variation in the output of a model (numerical or otherwise) can be apportioned, qualitatively or quantitatively, to different sources of variation, and how that given model depends upon the information fed into it. Andrea Saltelli, 2000

Alignments are parameter-dependent (Fitch & Smith 1983; Waterman et al. 1992) Different parameters Different alignments (or homologies) Different phylogenetic hypotheses How can we make use of such variation in our phylogenetic analyses?

Sensitivity Analysis in systematics Data exploration Evaluate sensitivity of hypotheses to parameter variation Evaluate stability of nodes to parameter variation Criteria for choosing optimal analytical parameters Character-based congruence methods Topological-based congruence methods Wheeler, W. C. 1995. Sequence alignment, parameter sensitivity, and the phylogenetic analysis of molecular data. Syst. Biol. 44:321-331. Wheeler, W. C. 1999. Measuring topological congruence by extending character techniques. Cladistics 15:131-135.

16S 18S 28S COI H3 MOR MOL TOT ILD 110 620 307 604 703 135 80 2443 2540 0.03583 111 1113 622 1025 1627 311 80 4769 4861 0.01707 121 1775 939 1688 2365 453 160 7339 7529 0.01979 141 3022 1564 2940 3785 725 320 12319 12694 0.02663 181 5518 2786 5421 6606 1269 640 22192 22932 0.03018 210 751 419 933 703 135 160 3079 3276 0.05342 211 1285 749 1406 1627 311 160 5489 5679 0.02483 221 2084 1184 2398 2365 453 320 8695 9075 0.02986 241 3642 2024 4348 3785 725 640 14979 15729 0.03592 281 6735 3700 8222 6606 1269 1280 27519 29007 0.04120 410 959 612 1533 703 135 320 4198 4615 0.07649 411 1556 947 2061 1627 311 320 6710 7091 0.03794 421 2601 1577 3682 2365 453 640 11064 11860 0.04570 441 4659 2809 6893 3785 725 1280 19633 21266 0.05243 481 8778 5252 13304 6606 1269 2560 36693 40053 0.05702 3221 2245 1232 1862 3254 622 240 9640 9640 0.01919

What is sensitivity analysis? The same raw data are explored under different analytical conditions Molecular data: Parsimony: effect of indels, nucleotide transformations, etc. Maximum Likelihood: effect of models and model corrections Morphological data: Effect of implied weighting on cladogram topology (Prendini, 2003)

How can we represent results from a sensitivity analysis? Present all trees obtained under the different conditions examined Strict consensus of all trees obtained under the different conditions Navajo rugs (or sensitivity plots) Frequency that a given node is obtained under the different analytical conditions This does not require choosing a hypothesis

Strict consensus of all trees under one parameter set Strict consensus of all trees under all parameter sets

GAP: CHANGE RATIO 6 3 5 2 1 8 7 4 TRANSVERSION:TRANSITION RATIO Mytiloidea Arcoidea Limopsoidea Pteroidea Pinnoidea Ostreoidea Anomioidea Limoidea Pectinoidea node 1: Pteriomorphia node 2 node 3 node 4 node 5 node 6 node 7 node 8 Anomioidea + Pectinoidea Anomioidea + Limoidea Anomioidea + node 5 inf. 1 2 4 inf. 1 2 4 inf. 1 2 4 inf. 1 2 4 1 2 4 1 2 4 1 2 4