Bot 421/521 PHYLOGENETIC ANALYSIS I. Origins A. Hennig 1950 (German edition) Phylogenetic Systematics 1966 B. Zimmerman (Germany, 1930 s) C. Wagner (Michigan, 1920-2000) II. Characters and character states e.g. character = flower color character states = red, white, blue A. qualitative 1. multistate (red,white,blue) 2. binary (presence/absence) B. quantitative 1. continuous (measurements) 2. discontinuous (counts) C. delimitation of character states (Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise e.g. compare two char. states: linear to lanceolate / ovate (used in a cladistic analysis) vs. linear-lanceolate / lanceolate / lanceolate to narrowly ovate / lanceolate-ovate / lanceolate to broadly ovate (used in species descriptions) 2. character states should be delimited by carefully analyzed discontinuities (not necessarily absolute gaps) a. mean or range
b. sample size c. scoring intermediates as polymorphic for more info: Kornet, DJ and H Turner 1999. Coding polymorphisms for phylogeny reconstruction. Syst. Biol. 48: 365-379. 3. explicit justification should be given does Cladistics = Phylogenetics? clados = branch III. The distribution of character states morph = character state apo = derived plesio = ancestral syn, sym = shared auto = unique e.g. synapomorphy symplesiomorphy autapomorphy IV. The description of groups A. monophyletic = an ancestor and all of its descendants, i.e. a clade identified by synapomorphies B. polyphyletic = a group with two or more ancestral sources in which parallel similarities have evolved mistakenly grouped by homoplasy C. paraphyletic = a group including a common ancestor and some, but not all of its descendents
V. Polarization A. used to "root" a network B. outgroup(s) must have separated from the ingroup lineage before the ingroup diversified C. used to "infer" ancestral char. states D. current algorithms do not require a priori rooting VI. Tree construction Parsimony minimizes the number of character state changes on a tree = Hennig's auxilliary principle: "never assume convergent or parallel evolution; always assume homology in the absence of evidence to the contrary" = Ockham's Razor; (William of Ockham, 1347) or the principle of simplicity VII. Summarizing trees bird & bat example A. Consensus tree = summarizes clades found in 2 or more trees VIII. Weighting 1. strict = only clades common to all trees 2. majority rule = clades that appear in 50% or more of the trees 3. semi-strict = all clades not contradicted in other trees A. Character states 1. ordered or Wagner parsimony
2 2 2 2. unordered or Fitch parsimony 3. Dollo parsimony gains count for more than losses can be generalized, e.g. making losses more probable than gains B. Characters 2.0 1.0 1. A priori weighting discouraged 2. Successive weighting is a posteriori gives homoplastic characters less weight IX. Do we believe the tree? A. Tree Statistics 1. tree length 2. consistency index CI = m / s m = minimum # of steps / character s = observed # of steps / character
e.g. 8/10 = 0.80 homoplasy index HI = 1 - CI a. inflated by autapomorphies (uninformative characters) b. trees with more taxa tend to have lower CI 3. retention index RI = g - s / g - m g = maximum # of steps / character a. varies between 0 and 1 e.g. (14-10) / (14-8) = 0.67 b. rescaled CI = RI X CI e.g. (0.67)*(0.8) = 0.53 B. Bootstrap Felsenstein (1985) taxa X characters matrix resampled with replication creating a pseudosample 100 [or more] pseudosamples re-analyzed resulting trees combined in a majority rule consensus tree results: % of trees that a particular clade appears in NOT a confidence interval, rather: an indication of the degree of support for a particular clade an empirical study suggests that 70 % bootstrap and above has a 95% probability of being accurate Hillis,DM; Bull,JJ; White,ME; Badgett,MR; Molineux,IJ (1992): Experimental phylogenetics: generation of a known phylogeny. Science 255: 589-592. Hillis,DM; Bull,JJ (1993): An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42: 182-192. C. Decay index
when does a clade collapse? D. Have we found the shortest tree(s)? 5 taxa = 15 unrooted trees 1. Immense numbers of trees 2. Efficiency of search algorithms 7 taxa = 945 unrooted trees 10 taxa = 2 X 10 6 unrooted trees 20 taxa = 2 X 10 20 unrooted trees T B(T) = (2i-5) i=3 T terminal nodes (=taxa) i = multiply for each taxon (starting at 3) adding a root increases # by a factor of 2T-3 Search options: branch and bround & exhaustive searches: guaranteed to find the shortest tree vs. heuristic searches faster, but susceptible to "local optima" stepwise addition sequence options: random [10-1000 times], closest, simple swap options: tree-bisection reconnection subtree pruning and regrafting TBR SPR
nearest neighbor interchange NNI X. Applications A. Mapping characters on trees B. Biogeography C. Phylogenetic Classification 1. named groups are monophyletic 2. not all clades are named 3. criteria a. strength of evidence supporting a particular clade b. obvious morphological synapomorphy c. group size d. nomenclatural stability 4. Are ranks arbitrary? a. cladists think so "ranks are simply more or less inclusive" b. not if "gaps" in the pattern of variation are recognized c. if nomenclatural stability is a goal, traditional ranks will tend to be preserved when they are monophyletic