Dating Phylogenies with Fossils Fredrik Ronquist Dept. Bioinformatics & Genetics Swedish Museum of Natural History
Node dating A B C D E F G H Molecular analysis of extant taxa using clock model Calibration point Node dates Fixed Drawn from prior Unconstrained Calibration point Fossil record is used to derive priors on calibration points Calibration nodes are fixed; we can integrate out uncertainty concerning other nodes Prior on root age
Potential Shortcomings Phylogeny is not known with certainty but we have to fix calibration nodes Unclear how to derive appropriate calibration distributions Fossil placement is often uncertain; unclear if this can be accommodated in the calibration distributions Does not incorporate all the data in the analysis You have to summarize many fossils in a few calibration points
Total-evidence dating A B C D E F G H Simultaneous analysis of extant taxa (black) and fossils (red) I J L Node dates Branch length based on morphology Fixed Drawn from prior Unconstrained Molecular data provide backbone for extant taxa Morphological data help place fossils K Prior on root age Integrate out uncertainty in all parameters (topology, placement of fossils etc)
Early radiation of the Hymenoptera Documented by a number of incomplete impression fossils that are difficult to place phylogenetically 45 fossil and 68 extant taxa 343 morphological characters Fossil completeness 4 20 % 5 kb sequence data from 7 markers Phylogenetic model: Mk model of morphology Codon-site-partitioned GTR+I+G : SYM+I+G Non-clock, strict clock and relaxed clocks
Morphological models Variable state space Arbitrary state labels Sampling (ascertainment bias): only variable characters observed Different models for ordered and unordered characters
Relaxed clock models Thorne Kishino 2002 (TK02) model: continuous autocorrelated model Compound Poisson process (CPP) model: discrete autocorrelated model Independent gamma rates (IGR) model: uncorrelated continuous model
CPP Relaxed Clock Molecular clock tree m 1 m 2 m 3 Non-clock tree m 4 m 6 m 5 m 7 A B D A B C D time C time lengths effective branch lengths evolutionary branch lengths evolutionary change
Tree model for total-evidence dating Coalescent model: not relevant model for higher-level phylogenies Birth-death model: problem of modeling speciation, extinction, sampling and fossilization Uniform model: can be extended to serially sampled trees
Uniform prior on serially sampled trees A B C D A B C D (a) (b) A B C A B C D D (c) (d)
Two approaches to dating Node dating 68 extant taxa Seven Hymenoptera calibration points derived from 45 fossils (C-I) Two outgroup calibration points (A-B) Offset exponential priors, mean being min of the next more inclusive calibration point Calibrations set together with the leading paleontological and morphological experts on the Hymenoptera Total-evidence dating 68 extant + 45 fossil taxa in simultaneous analysis Position of fossil taxa determined by morphological characters (343 characters in total, 4 20% coded for fossils) Extant phylogeny mostly determined by molecular characters (5 kb sequence data from 7 markers) All calibration constraints removed except the two outgroup calibrations
A* B* HYMENOPTERA Xyelidae Blasticotomidae Tenthredinidae Diprionidae Cimbicidae Argidae Pergidae Pamphiliidae Megalodontidae Cephidae Anaxyelidae Siricidae Xiphydriidae Orussidae C* 0.1 substitutions /site Orthoptera D* UNICALCARIDA Paraneoptera Mecoptera Coleoptera Adephaga Xyela Macroxyela E* I* F* VESPINA G* H* Lepidoptera Coleoptera Polyphaga Neuroptera Raphidioptera Blasticotoma Runaria Paremphytus Tenthredo Aglaostigma Dolerus Taxonus Selandria Strongylogaster Monophadnoides Metallus Hoplocampa Nematinus Nematus Cladius Monoctenus Gilpinia Diprion Athalia Cimbicinae Abia Corynis Arge Sterictiphora Perga Phylacteophaga Lophyrotoma Decameria Acordulecera Neurotoma Onycholyda Pamphilius Cephalcia Acantholyda Outgroups Xyeloidea Megalodontes cephalotes Megalodontes skorniakowii Cephus Calameuta Hartigia Syntexis Sirex Xeris Urocerus Tremex Xiphydria Orussus Tenthredinoidea Pamphilioidea Cephoidea Siricoidea Xiphydrioidea Orussoidea Stephanidae A Stephanidae B Cynipoidea Megalyridae Trigonalidae Apoidea A Apoidea B Apoidea C Vespidae Chalcidoidea Evanioidea Ichneumonidae Non-clock tree retrieves expected relationships Rate deceleration in Xyelidae! A - I were used in clock analyses as calibration points (all clades well supported) Apocrita
Comparing strict clock to non-clock branch lengths sampled from the non-clock topology Correlation Squared deviation non-clock branch length 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Xyelidae squared deviation of non-clock branch length (y) 0.000 0.005 0.010 0.015 0.020 0.025 0.030 Xyelidae y = 0.01867x 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 strict-clock branch length strict-clock branch length (x)
Finding suitable priors for the CPP model Poisson rate multiplier variance space rate of CPP process 2.5 2 10 2.25 2.0 1.75 1.5 1.25 1.0 0.75 0.5 < 0.01 0.010 0.016 0.017 0.019 0.171 > 1.00 Deviation from observed variance 0.25 0.22 0.28 0.37 0.47 0.61 0.78 1.00 1.28 1.65 2.12 0 variance of rate multiplier
Slight but significant rate autocorrelation Branch rate ratios frequency 0 0 150 200 random pairs parent-offspring pairs 0 1 2 3 4 log rate ratio (absolute value)
Morphology Outgroup Hemimetabola Outgroup Holometabola Xyeloidea Tenthredinoidea Pamphilioidea Cephoidea Syntexis Siricidae Xiphydriidae Orussidae Apocrita Non-clock Outgroup Hemimetabola Outgroup Holometabola Xyeloidea Tenthredinoidea Pamphilioidea Cephoidea Siricoidea Xiphydriidae Orussidae Apocrita Relaxed clock models may need rooting constraint Strict clock Outgroup Hemimetabola Outgroup Holometabola Xyeloidea Pamphilioidea Strict clock with rooting constraint Outgroup Hemimetabola Outgroup Holometabola Xyeloidea Pamphilioidea Tenthredinoidea Tenthredinoidea Cephoidea Siricoidea Xiphydriidae Orussidae Apocrita Cephoidea Siricoidea Xiphydriidae Orussidae Apocrita Relaxed clock Outgroup Hemimetabola Outgroup Holometabola Xyeloidea Relaxed clock with rooting constraint Outgroup Hemimetabola Outgroup Holometabola Xyeloidea Tenthredinoidea Tenthredinoidea Pamphilioidea Cephoidea Siricoidea Xiphydriidae Orussidae Apocrita Pamphilioidea Cephoidea Siricoidea Xiphydriidae Orussidae Apocrita
IGR model Rate variation across the tree CPP model c e d f Orthoptera Paraneoptera Lepidoptera Mecoptera Coleoptera Adephaga Coleo Polyphaga Neuroptera Raphidioptera Xyela Macroxyela Blasticotoma Runaria Paremphytus Tenthredo Aglaostigma Dolerus Taxonus Selandria Strongylogaster Monophadnoides Metallus Hoplocampa Nematinus Nematus Cladius Monoctenus Gilpinia Diprion Athalia Cimbicinae Abia Corynis Arge Sterictiphora Perga Phylacteophaga Lophyrotoma Acordulecera Decameria Neurotoma Onycholyda Pamphilius Cephalcia Acantholyda Megalodontes cephalotes Megalodontes skorniakowii Cephus Calameuta Hartigia Syntexis Sirex Xeris Urocerus Tremex Xiphydria Orussus Stephanidae A Stephanidae B Cynipoidea Megalyridae Trigonalidae Apoidea A Apoidea B Apoidea C Vespidae Chalcidoidea Evanioidea Ichneumonidae 0.1 0.3 0.6 1.6 3.7 9.0 c d e f Orthoptera Paraneoptera Lepidoptera Mecoptera Coleoptera Adephaga Coleo Polyphaga Neuroptera Raphidioptera Xyela Macroxyela Blasticotoma Runaria Paremphytus Tenthredo Aglaostigma Dolerus Taxonus Selandria Strongylogaster Monophadnoides Metallus Hoplocampa Nematinus Nematus Cladius Monoctenus Gilpinia Diprion Athalia Cimbicinae Abia Corynis Arge Sterictiphora Perga Phylacteophaga Lophyrotoma Decameria Acordulecera Neurotoma Onycholyda Pamphilius Cephalcia Acantholyda Megalodontes cephalotes Megalodontes skorniakowii Cephus Calameuta Hartigia Syntexis Tremex Sirex Urocerus Xeris Xiphydria Orussus Stephanidae A Stephanidae B Cynipoidea Megalyridae Trigonalidae Apoidea A Apoidea B Apoidea C Vespidae Evanioidea Chalcidoidea Ichneumonidae
91 97 84 68 85 96 61 98 71 Leioxyela Triassoxyela 64 57 75 93 83 60 77 58 60 55 52 97 64 70 61 Abrotoxyela Liadoxyela Eoxyela Anagaridyela Chaetoxyela Nigrimonticola Spathoxyela Mesoxyela mesozoica 62 Grimmaratavites Brigittepterus Xyelula Sogutia Rudisiricius Ferganolyda Prosyntexis Thoracotrema Trematothorax Ghilarella Sepulca Onokhoius Karatavites Aulisca Protosirex Aulidontes Mesolyda Brachysyntexis Kulbastavia Syntexyela Anaxyela Palaeathalia Pseudoxyelocerus Dahuratoma Undatoma 90 82 57 52 Xyelotoma Pamphiliidae undescribed 95 Praeoryssus Paroryssus 70 Turgidontes Symphytopterus Cleistogaster Leptephialtites Stephanogaster 94 97 98 80 77 99 Mesorussus 71 99 Orthoptera Paraneoptera Mecoptera Lepidoptera Coleoptera Adephaga Coleo Polyphaga Raphidioptera Neuroptera Xyela Macroxyela 0% 20% 40% 60% 80% % Blasticotoma Runaria Paremphytus Tenthredo Aglaostigma Dolerus Taxonus Selandria Strongylogaster Monophadnoides Metallus Hoplocampa Nematinus Nematus Cladius Monoctenus Gilpinia Diprion Athalia Cimbicinae Abia Corynis Arge Sterictiphora Perga Phylacteophaga Lophyrotoma Decameria Acordulecera Neurotoma Acantholyda Cephalcia Pamphilius Onycholyda Megalodontes skorniakowii Megalodontes cephalotes Cephus Calameuta Hartigia Syntexis Sirex Xeris Urocerus Tremex Xiphydria Orussus Stephanidae A Stephanidae B Cynipoidea Megalyridae Evanioidea Trigonalidae Vespidae Apoidea C Apoidea B Apoidea A Chalcidoidea Ichneumonidae Majority rule consensus with fossils Completeness of morphology scores 350 300 250 200 150 50 0 million years before present
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Mesoxyela mesozoica Sogutia liassica Outgroups Outgroups Xyelidae Xyelidae Tenthredinoidea Tenthredinoidea Pamphilioidea Pamphilioidea Cephoidea Siricoidea Xiphydriidae Orussidea Cephoidea Siricoidea Xiphydriidae Orussidea Apocrita Apocrita Aulisca odontura Leptephialtites caudatus Outgroups Outgroups Xyelidae Xyelidae Tenthredinoidea Tenthredinoidea Pamphilioidea Pamphilioidea Cephoidea Siricoidea Xiphydriidae Orussidea Cephoidea Siricoidea Xiphydriidae Orussidea Apocrita Apocrita Uncertainty in the phylogenetic position varies across fossils
total evidence dating node dating Orthoptera Paraneoptera Lepidoptera Mecoptera Coleoptera Adephaga Coleo Polyphaga Neuroptera Raphidioptera Xyela Macroxyela Blasticotoma Runaria Tenthredo Paremphytus Aglaostigma Dolerus Taxonus Selandria Strongylogaster Monophadnoides Metallus Hoplocampa Nematinus Nematus Cladius Monoctenus Gilpinia Diprion Athalia Cimbicinae Abia Corynis Arge Sterictiphora Perga Phylacteophaga Lophyrotoma Acordulecera Decameria Neurotoma Onycholyda Pamphilius Cephalcia Acantholyda Megalodontes cephalotes Megalodontes skorniakowii Cephus Calameuta Hartigia Syntexis Sirex Xeris Urocerus Tremex Xiphydria Orussus Stephanidae A Stephanidae B Cynipoidea Megalyridae Trigonalidae Apoidea A Apoidea B Apoidea C Vespidae Evanioidea Chalcidoidea Ichneumonidae Estimated divergence times 450 400 350 300 250 200 150 50 0 million years before present Silurian Devonian Carboniferous Permian Triassic Jurassic Cretaceous Paleogene Neogene Quaternary
density 0.00 0.02 0.04 0.06 0.08 Holometabola node dating total evidence dating tree age prior: least restrictive medium restrictive most restrictive density 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Hymenoptera Posteriors on node ages 600 500 400 300 200 0 600 500 400 300 200 0 Ma Ma density 0.00 0.02 0.04 0.06 0.08 0.10 Xyelidae density 0.00 0.01 0.02 0.03 Tenthredinoidea 600 500 400 300 200 0 600 500 400 300 200 0 Ma Ma density 0.00 0.01 0.02 0.03 Pamphilioidea density 0.00 0.01 0.02 0.03 0.04 Siricoidea 600 500 400 300 200 0 600 500 400 300 200 0 Ma Ma density 0.00 0.01 0.02 0.03 0.04 Vespina density 0.00 0.01 0.02 0.03 Apocrita 600 500 400 300 200 0 600 500 400 300 200 0 Ma Ma
Table 2. Fossils used in the node-dating analyses and calibration point prior settings. The minimal and mean age for the offset-exponential prior are given, along with the corresponding fossils and references. Calibration point Prior on age (Ma) Fossil(s) Reference PP correct 1 A. Neoptera min: 315 Katerinka (oldest Neoptera) Prokop & Nel 2007 mean: 396 Rhyniognatha (oldest insect) Engel & Grimaldi 2004 B. Holometabola min: 302 insect gall (oldest Holometabola) Labandeira & Philips 1996 mean: 396 Rhyniognatha Engel & Grimaldi 2004 C. Hymenoptera min: 235 Triassoxyela, Asioxyela Rasnitsyn & Quicke 2002 96% mean: 302 insect gall Labandeira & Philips 1996 D. Xyelidae 2 min: 180 Eoxyela Rasnitsyn 1983 0% E. Pamphilioidea 2 min: 161 Aulidontes, Pamphilidae undescribed Rasnitsyn & Zhang 2004 48% F. Siricoidea 2 min: 161 Aulisca, Anaxyela, Syntexyela, Zhang & Rasnitsyn 2006 0% Kulbastavia, Brachysyntexis G. Vespina 2 min: 180 Brigittepteris Rasnitsyn et al. 2003 7% H. Apocrita 2 min: 176 Cleistogaster Rasnitsyn 1975 34% I. Tentrhedinoidea s.str. 2,3 min: 140 Palaeathalia Zhang 1985 % 1 Posterior probability (PP) from the total-evidence analysis that the fossil attaches at the position assumed in the node-dating analysis. Note that these posterior probabilities take both the morphological data and the ages of the fossils into account. 2 The mean age for all intra-hymenopteran calibration points was assumed to be the minimal age of Hymenoptera, i.e. 235 Ma (Triassoxyela, Asioxyela). 3 Tenthredinoidea excluding Blasticotomidae.
Error in divergence time estimation is not influenced to a large extent by molecular character data Branch length posteriors for different models on four example branches density 0 10 20 30 40 50 60 non clock strict clock IGR effective branch length IGR time branch length TK02 effective branch length TK02 time branch length CPP effective branch length CPP time branch length density 0 20 40 60 non-clock length Relaxed clock: effective length time length strict clock length 0.00 0.05 0.10 0.15 0.20 effective or time branch length 0.00 0.05 0.10 0.15 0.20 0.25 effective or time branch length density 0 50 150 200 density 0 50 150 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 effective or time branch length 0.00 0.02 0.04 0.06 0.08 0.10 effective or time branch length
Conclusions 1(2) Total-evidence dating is preferable because it: explicitly incorporates fossil evidence allows powerful analysis of the available data results in divergence times that are more precise less sensitive to prior assumptions probably more accurate provides better platform for future development, such as explicit modeling of fossilization, speciation, extinction, and sampling
Conclusions 2(2) There is a limit to how much molecular characters can help reduce the errors in divergence time estimates Most significant improvements will come from more intense study of the fossil record better understanding of morphological evolution better models of rate variation across sites and lineages better modeling of speciation, extinction, fossilization and sampling of fossil and extant taxa Challenges with total-evidence dating under birth-death prior with fossilization: Dealing with trees where fossils are ancestors (sit on branches) Sampling probabilities and biases, both for fossils and extant taxa Uniform fossilization or slice sampling Priors for speciation and extinction rates
Acknowledgements Alex Rasnitsyn Lars Vilhelmsen Susanne Schulmeister Debra Murray Seraina Klopfstein Chi Zhang Tracy Heath Tanja Stadler Funding: US National Science Foundation (HymAToL), Swedish Research Council, Swiss National Science Foundation