SUPPLEMENTARY INFORMATION

Similar documents
Comparing Genomes! Homologies and Families! Sequence Alignments!

Inferring Phylogenies from RAD Sequence Data

The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection

Quantitative and qualitative analyses. of in-paralogs

Sheet1. Page 1. protein

Camello, a novel family of Histone Acetyltransferases that acetylate histone H4 and is essential for zebrafish development

Species Tree Inference by Minimizing Deep Coalescences

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogenomics: the beginning of incongruence?

New Universal Rules of Eukaryotic Translation Initiation Fidelity

Marine medaka ATP-binding cassette (ABC) superfamily and new insight into teleost Abch nomenclature

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Museum of Natural History, New York, New York, USA

Biased amino acid composition in warm-blooded animals

A novel laminin β gene BmLanB1-w regulates wing-specific cell adhesion in silkworm, Bombyx mori

Phylogenetic inference

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Combination of X-ray crystallography, SAXS and DEER to obtain the structure of the FnIII-3,4 domains of integrin α6β4

Supplemental Figure 1.

Anatomy of a species tree

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors

Supplementary Information for

Evolutionary Rate Covariation of Domain Families

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny

Fine-Scale Phylogenetic Discordance across the House Mouse Genome

Supporting Information

Master Biomedizin ) UCSC & UniProt 2) Homology 3) MSA 4) Phylogeny. Pablo Mier

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting

Pyrobayes: an improved base caller for SNP discovery in pyrosequences

SUPPLEMENTARY INFORMATION

Application of new distance matrix to phylogenetic tree construction

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Genome-scale approaches to resolving incongruence in molecular phylogenies

Species Level Deconvolution of Metagenome Assemblies with Hi C Based Contact Probability Maps

Constructing Evolutionary/Phylogenetic Trees

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

Evolutionary Bioinformatics

Bayesian Estimation of Concordance among Gene Trees

SUPPLEMENTARY INFORMATION

The nematode arthropod clade revisited: phylogenomic analyses from ribosomal protein genes misled by shared evolutionary biases

Evolution by duplication

Constructing Evolutionary/Phylogenetic Trees

Supplementary Figure 1 Histogram of the marginal probabilities of the ancestral sequence reconstruction without gaps (insertions and deletions).

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Evolutionary Genomics. Antonis Rokas Department of Biological Sciences, Vanderbilt University

Supplementary Materials for

BIOINFORMATICS LAB AP BIOLOGY

A Kolmogorov-Smirnov test for the molecular clock on Bayesian ensembles of phylogenies

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Bioinformatics Bioinf and E ormatics volu and E tionar volu y Genomics Genom 5/2/2011 1

Symmetric Tree, ClustalW. Divergence x 0.5 Divergence x 1 Divergence x 2. Alignment length

The MANTiS Manual. Contents. MANTiS Version 1.1

Presentation by Julie Hudson MAT5313

Hillis DM Inferring complex phylogenies. Nature 383:

A Functional Selection Model Explains Evolutionary Robustness Despite Plasticity in Regulatory Networks

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1

Comparative genome analysis across a kingdom of eukaryotic organisms: Specialization and diversification in the Fungi

1 ATGGGTCTC 2 ATGAGTCTC

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

The Lophotrochozoan TGF-b signalling cassette - diversification and conservation in a key signalling pathway

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

RNA polymerase II clustering through carboxyterminal domain phase separation

Cryo-EM data collection, refinement and validation statistics

Synonymous Codon Substitution Matrices

Multiple Sequence Alignments

Emergence of Xin Demarcates a Key Innovation in Heart Evolution

Consensus Methods. * You are only responsible for the first two

A (short) introduction to phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Bioinformatics Report Branchiostoma lanceolatum dopamine D 1 / receptor protein phylogenetic analysis. Alanna Lewis

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

Supplementary Figure 1

Supporting Online Material for

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

SUPPLEMENTARY INFORMATION

Contentious relationships in phylogenomic studies can be driven by a handful of genes

Exploring metazoan evolution through dynamic and holistic changes in protein families and domains

MegAlign Pro Pairwise Alignment Tutorials

Procedure to Create NCBI KOGS

Estimating Evolutionary Trees. Phylogenetic Methods

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

Fast coalescent-based branch support using local quartet frequencies

Quantifying sequence similarity

In Search of the Biological Significance of Modular Structures in Protein Networks

Molecular Coevolution of the Vertebrate Cytochrome c 1 and Rieske Iron Sulfur Protein in the Cytochrome bc 1 Complex

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

The impact of sequence parameter values on phylogenetic accuracy

Paulo Jorge Dias 1,2 and Isabel Sá-Correia 1,2*

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Evolution takes place in an ecological context that shapes the

Hands-On Nine The PAX6 Gene and Protein

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

I. Short Answer Questions DO ALL QUESTIONS

Transcription:

doi:1.138/nature1213 Supplementary Table 1. The Taxonomy of the Organisms Used in this Study Organism (acronym) Taxonomy Yeasts Zygosacharomyces rouxii (Zrou) Verterbrates Xenopus tropicalis (Xtro) Gallus gallus (Ggal) Monodelphis domestica (Mdom) Bos taurus (Btau) Equus caballus (Ecab) Canis familiaris (Cfam) Macaca mulatta (Mmul) Pongo pygmaeus (Ppyg) Homo sapiens (Hsap) Pan troglodytes (Ptro) Rattus norvegicus (Rnor) Mus musculus (Mmus) Cavia porcellus (Cpor) Danio rerio (Drer) ; Animalia;Chordata;Aphibia;Anura;Pipidae Animalia;Chordata;Aves;Galliformes;Phasianidae Animalia;Chordata;Monodelphis;Mammalia;Didelphimorphia Animalia;Chordata;Mammalia;Artiodactyla;Bovidae Animalia;Chordata;Mammalia;Perissodactyla;Equidae Animalia;Chordata;Mammalia;Carnivora;Canidae Animalia;Chordata;Mammalia;Primates;Cercopithecidae Animalia;Chordata;Mammalia;Primates;Hominidae Animalia;Chordata;Mammalia;Primates;Hominidae Animalia;Chordata;Mammalia;Primates;Hominidae Animalia;Chordata;Mammalia;Rodentia;Muridae Animalia;Chordata;Mammalia;Rodentia;Muridae Animalia;Chordata;Mammalia;Rodentia;Caviidae Animalia;Chordata;Actinopterygii;Cypriniformes;Cyprinidae WWW.NATURE.COM/NATURE 1

RESEARCH SUPPLEMENTARY INFORMATION Oryzias latipes (Olat) Tetraodon nigroviridis (Tnig) Takifugu rubripes (Trub) Gasterosteus aculeatus (Gacu) Metazoa Strongylocentrotus purpuratus (Spur) Branchiostoma floridae (Bflo) Ciona intestinalis (Cint) Mus musculus (Mmus) Gallus gallus (Ggal) Homo sapiens (Hsap) Xenopus tropicalis (Xtro) Danio rerio (Drer) Helobdella robusta (Hrob) Lottia gigantea (Lgig) Caenorhabditis elegans (Cele) Schistosoma mansoni (Sman) Ixodes scapularis (Isca) Daphnia pulex (Dpul) Apis mellifera (Amel) Tribolium castaneum (Tcas) Drosophila melanogaster (Dmel) Bombyx mori (Bmor) Monosiga brevicollis (Mbre) Nematostella vectensis (Nvec) Trichoplax adhaerens (Tadh) Animalia;Chordata;Actinopterygii;Beloniformes;Adrianichthyidae Animalia;Chordata;Actinopterygii;Tetraodontiformes;Tetraodontidae Animalia;Chordata;Actinopterygii;Tetraodontiformes;Tetraodontidae Animalia;Chordata;Actinopterygii;Gasterosteiformes;Gasterosteidae Animalia;Echinodermata;Echinoidea;Echinoida;Strongylocentrotidae Animalia;Chordata;Leptocardii;Amphioxiformes;Branchiostomidae Animalia;Chordata;Ascidiaceae;Enterogona;Cionidae Animalia;Chordata;Mammalia;Rodentia;Muridae Animalia;Chordata;Aves;Galliformes;Phasianidae Animalia;Chordata;Mammalia;Primates;Hominidae Animalia;Chordata;Aphibia;Anura;Pipidae Animalia;Chordata;Actinopterygii;Cypriniformes;Cyprinidae Animalia;Annelida;Clitellata;Rhynchobdellida;Glossiphoniidae Animalia;Mollusca;Gastropoda;Patellogastropoda;Lottiidae Animalia;Nematoda;Secernentea;Rhabditida;Rhabditidae Animalia;Platyhelminthes;Digenea;Strigeidida Animalia;Arthropoda;Arachnida;Ixodida;Ixodidae Animalia;Arthropoda;Branchiopoda;Cladocera;Daphniidae Animalia;Arthropoda;Insecta;Hymenoptera;Apidae Animalia;Arthropoda;Insecta;Coleoptera;Tenebrionidae Animalia;Arthropoda;Insecta;Diptera;Drosophilidae Animalia;Arthropoda;Insecta;Lepidopteroa;Bombycidae hoanoflagellida;codonosigidae Animalia;Cnidaria;Anthozoa;Actiniaria;Edwardsiidae Animalia;Placozoa;Tricoplacia;Tricoplaciformes;Trichoplacidae 2 WWW.NATURE.COM/NATURE

RESEARCH Supplementary Table 2.Bipartitions that Significantly Conflict with the Bipartitions Recovered in the Phylogeny of 23 Yeast Genomes Primary Tree Bipartition GSF IC [Conflicting Bipartition]:GSF value Kthe, Kwal 99.96 None Calb, Cdub 98.95 None Sbay, Scer, Skud, Smik, Spar 99.97 None Calb, Cdub, Cgui, Clus, Cpar, Ctro, Dhan, Lelo, Psti 95.9 None Calb, Cdub, Ctro 9.78 None Cpar, Lelo 89.77 None Calb, Cdub, Cpar, Ctro, Lelo 86.76 None Scer, Spar 77.54 [Sbay,Skud,Smik,Spar]:8; [Smik,Spar]:5 Cgla, Kpol, Sbay, Scas, Scer, Skud, Smik, Spar, Zrou 62.59 [Calb,Cdub,Cgla,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti]:6; [Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti,Zrou]:5 Scer, Smik, Spar 6 Scer, Skud, Smik,Spar 52.6 Calb, Cdub, Cpar, Ctro, Lelo, Psti 45.11 Kthe, Kwal, Sklu 41 2 Agos, Klac 36.9 Agos, Klac, Kthe, Kwal, Sklu 31.4 [Sbay,Skud,Smik]:14; [Sbay,Scer,Skud,Spar]:11; [Sbay,Skud,Smik,Spar]:8; [Skud,Smik]:7 ; [Scer,Skud,Spar]:6 [Sbay,Skud]:29; [Sbay,Skud,Smik]:14; [Sbay,Scer,Smik,Spar]:11; [Sbay,Scer,Skud,Spar]:11; [Sbay,Skud,Smik,Spar]:8 [Cgui,Clus,Dhan,Psti]:2; [Dhan,Psti]:11; [Cgui,Dhan,Psti]:1; [Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo]:8; [Calb,Cdub,Clus,Cpar,Ctro,Lelo]:5; [Clus,Dhan,Psti]:5 [Agos,Kthe,Kwal]:9; [Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Kthe, Kwal,Lelo,Psti]:9; [Klac,Sklu]:8; [Agos,Klac,Kthe,Kwal]:7; [Agos, Klac,Sklu]:7; [Agos,Sklu]:7; [Cgla,Kpol,Kthe,Kwal,Sbay,Scas, Scer,Skud,Smik,Spar,Zrou]:6; [Klac,Kthe,Kwal]:5; [Cgla,Kpol, Sbay,Scas,Scer,Sklu,Skud,Smik,Spar,Zrou]:5 [Agos,Cgla,Kpol,Kthe,Kwal,Sbay,Scas,Scer,Sklu,Skud,Smik,Spar, Zrou]:17; [Cgla,Klac,Kpol,Kthe,Kwal,Sbay,Scas,Scer,Sklu,Skud, Smik,Spar,Zrou]:13; [Agos,Kthe,Kwal,Sklu]:13; [Klac,Kthe,Kwal, Sklu]:1; [Agos,Kthe,Kwal]:9; [Klac,Sklu]:8; [Agos,Sklu]:7; [Cgla, Klac,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:7; [Klac,Kthe, Kwal]:5 [Agos,Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Klac,Lelo,Psti]:19; [Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Klac,Lelo,Psti]:17; [Agos, Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti]:13; [Calb,Cdub, Cgui,Clus,Cpar,Ctro,Dhan,Kthe,Kwal,Lelo,Psti]:9; [Agos,Cgla, Klac,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:7; [Cgla,Klac, Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:7; [Cgla,Kpol,Kthe, Kwal,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:6; [Cgla,Kpol,Sbay, Scas,Scer,Sklu,Skud,Smik,Spar,Zrou]:5 Cgla, Sbay, Scas, Scer, Skud, Smik, Spar 29.12 [Cgla,Kpol]:12; [Kpol,Scas]:1; [Kpol,Sbay,Scas,Scer,Skud,Smik, Spar,Zrou]:9; [Kpol,Sbay,Scas,Scer,Skud,Smik,Spar]:8; [Cgla, WWW.NATURE.COM/NATURE 3

RESEARCH SUPPLEMENTARY INFORMATION Zrou]:8; [Kpol,Sbay,Scer,Skud,Smik,Spar]:8; [Sbay,Scer,Skud, Smik,Spar,Zrou]:7; [Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:7; [Cgla,Kpol,Scas]:6; [Agos,Klac,Kpol,Kthe,Kwal,Sbay,Scas,Scer, Sklu,Skud,Smik,Spar,Zrou]:6; [Scas,Zrou]:5 Sbay, Scas, Scer, Skud, Smik, Spar 29.2 [Cgla,Sbay,Scer,Skud,Smik,Spar]:2; [Cgla,Scas]:17; [Kpol, Scas]:1; [Kpol,Sbay,Scer,Skud,Smik,Spar]:8; [Sbay,Scer,Skud, Smik,Spar,Zrou]:7; [Cgla,Kpol,Scas]:6; [Scas,Zrou]:5 Calb, Cdub, Cgui, Cpar, Ctro, Dhan, Lelo, Psti 29.1 [Cgui,Clus,Dhan]:24; [Cgui,Clus,Dhan,Psti]:2; [Cgui,Clus]:2; [Calb,Cdub,Clus,Cpar,Ctro,Dhan,Lelo,Psti]:16; [Clus,Dhan]:12; [Calb,Cdub,Clus,Cpar,Ctro,Lelo,Psti]:9; [Calb,Cdub,Cgui,Clus, Cpar,Ctro,Dhan,Lelo]:8; [Calb,Cdub,Cgui,Clus,Cpar,Ctro,Lelo, Psti]:6; [Clus,Dhan,Psti]:5; [Calb,Cdub,Clus,Cpar,Ctro,Lelo]:5 Cgui, Dhan 29.2 [Cgui,Clus]:2; [Calb,Cdub,Cpar,Ctro,Dhan,Lelo,Psti]:18; [Calb,Cdub,Clus,Cpar,Ctro,Dhan,Lelo,Psti]:16; [Clus,Dhan]:12; [Dhan,Psti]:11; [Calb,Cdub,Cgui,Cpar,Ctro,Lelo,Psti]:6; [Calb, Cdub,Cgui,Clus,Cpar,Ctro,Lelo,Psti]:6; [Clus,Dhan,Psti]:5 Cgla, Kpol, Sbay, Scas, Scer, Skud, Smik, Spar 24.2 [Kpol,Zrou]:17; [Cgla,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:15; [Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:9; [Cgla,Zrou]:8; [Sbay,Scer,Skud,Smik,Spar,Zrou]:7; [Sbay,Scas,Scer,Skud,Smik, Spar,Zrou]:7; [Calb,Cdub,Cgla,Cgui,Clus,Cpar,Ctro,Dhan,Lelo, Psti]:6; [Scas,Zrou]:5 4 WWW.NATURE.COM/NATURE

RESEARCH a b Maximum likelihood species inferred using the GARLI software Bayesian inference species inferred using the MrBayes software.1 Supplementary Figure 1 The topology of the yeast recovered from concatenation analyses using one other maximum likelihood software (GARLI) and one other Bayesian inference (MrBayes) software was identical to the topology recovered by maximum likelihood analysis using the RAxML software. a, The yeast species recovered from concatenation analysis of 1,7 genes using maximum likelihood as implemented in the GARLI software. All internodes received 1% bootstrap support. b, The yeast species recovered from concatenation analysis of 1,7 genes using Bayesian inference as implemented in the MrBayes software. All internodes had 1% posterior probability. WWW.NATURE.COM/NATURE 5

RESEARCH SUPPLEMENTARY INFORMATION.8.7.6 Internode Certainty.5.4.1 1 2 3 4 5 6 7 8 9 1 Support Value X 2 conflicting with support values X and 1-X 3 conflicting with values X, 95-X, and 5 of which only the two highest are used to calculate IC 3 conflicting with values X, 85-X, and 15 of which only the two highest are used to calculate IC 3 conflicting with values X, 75-X, and 25 of which only the two highest are used to calculate IC Supplementary Figure 2 Representative values of the new measure Internode Certainty (IC) for a range of representative support values of two most prevalent and conflicting for a given internode. Each plot on the graph depicts how IC (Y-axis) varies in response to the relative support of conflicting on a given internode. IC can be measured on any given set of trees. For example, if the entire set of gene trees (GTs) is used, the IC value of a given internode will reflect the amount of information available for that internode in the set GTs by considering the internode s gene support frequency jointly with the frequency of the most prevalent bipartition that conflicts with the internode. If the set of bootstrap replicate trees for a given gene is used, then IC will be calculated based on bootstrap support values. From the right to the left of the graph, the first of the four plots shown with triangle symbols corresponds to the case of only two conflicting for one internode with support values X and 1 X. For example, given 1 total GTs, if 6 of them support bipartition 1, the remaining 4 will support the conflicting bipartition. The second, third and fourth of the four plots (shown with diamond, circle, and square symbols, respectively) correspond to case where there are three conflicting for one internode, but only the two most prevalent ones are considered. For example, in the plot with the diamond symbols, given 1 total GTs, if 6 of them support bipartition 1, 35 will support the conflicting bipartition 2, because conflicting bipartition 3 has been set to be supported by 5 GTs. Thus, when the two most prevalent ones are considered, the percentage of GTs supporting the first bipartition will be equal to 6/(6+35), whereas the percentage of GTs supporting the second bipartition will be 35/(6+35). The reason that the number of GTs that support the third conflicting bipartition is not included is because we want IC to measure the magnitude of certainty conveyed by the two most prevalent. This way, IC will equal zero when the two most prevalent are equally prevalent (in this example that would be the case if 1 and 2 were each supported by 42.5 GTs each). 6 WWW.NATURE.COM/NATURE

RESEARCH a b Removing all sites containing gaps 98/.95 37/6 3/.4 33/.9 59/.53 22/.1 28/.1 29/.2 95/.9 98/.97 48/.5 54/4 emrc 7/.48 89/.76 98/.95 85/.73 44/.7 87/.73 28/. 3/.4 Removing all sites with 5% gaps 32/.5 61/.58 23/.2 3/.16 29/.2 95/.88 emrc 29/.1 99/.97 52/.6 45/.9 99/.96 42/3 3/.3 36/.1 59/7 75/.46 9/.78 98/.94 87/.77 9/.75 c Removing genes whose alignment length following gap removal is 7% of the length of the original alignment (374 genes retained) 98/.95 37/6 31/.5 34/.8 63/.52 25/.4 33/.16 29/.3 1/1. 99/.97 55/.1 59/ emrc 76/.55 88/.74 99/.97 86/.82 43/.1 89/.77 29/.1 3/.2 Supplementary Figure 3 Removal of sites containing gaps or of poorly aligned genes does not significantly improve the yeast inferred by concatenation and emrc approaches. Each panel shows the yeast species inferred from concatenation analysis (left panel) and from extended majority rule consensus (emrc) analysis (right panel). All internodes of phylogenies inferred by concatenation received 1% bootstrap support unless otherwise indicated. Values near internodes of phylogenies inferred by emrc analysis correspond to gene support frequency and internode certainty, respectively. a, (left) and emrc (right) phylogenies of all 1,7 genes following removal of all sites containing gaps. b, (left) and emrc (right) phylogenies of all 1,7 genes following removal of all sites where 5% of the character states are gaps. c, (left) and emrc (right) phylogenies of the 374 genes whose alignment length following removal of all gaps is 7% of the length of the original alignment. WWW.NATURE.COM/NATURE 7

RESEARCH SUPPLEMENTARY INFORMATION a b c Excluding 31/.5 98/.94 42/1 37/.13 61/.57 23/.1 29/.12 3/.3 99/.97 95/.88 52/.5 58/8 75/.54 99/.96 9/.8 84/.68 43/.2 89/.76 emrc 48/.6 Excluding 99/.95 39/1 31/.4 36/.1 66/.55 42/ 35/.4 99/.96 95/.89 53/.6 61/ 77/.54 98/.95 emrc 91/.79 87/.78 46/.1 91/.77 28/. 29/.2 Excluding 98/.96 4/6 3/.3 36/.1 66/.6 3/.2 52/6 99/.98 96/.88 51/.5 6/8 76/.51 emrc 98/.95 9/.79 86/.74 46/.9 89/.77 27/. 29/.2 Supplementary Figure 4 Removal of one or more unstable or fast-evolving species has, if any, a minor and local effect on the yeast inferred by concatenation and emrc approaches. Each panel shows the yeast species inferred from concatenation analysis (left panel) and from extended majority rule consensus (emrc) analysis (right panel) following removal of one or more unstable or fast-evolving species from the analysis. All internodes of phylogenies inferred by concatenation received 1% bootstrap support unless otherwise indicated. Values near internodes of phylogenies inferred by emrc analysis correspond to gene support frequency and internode certainty, respectively. a, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the unstable taxon Candida lusitaniae. b, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the fast-evolving and unstable taxon Kluyveromyces polysporus. c, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the fast-evolving and unstable taxon Candida glabrata. 8 WWW.NATURE.COM/NATURE

RESEARCH d e f Excluding 99/.96 41/1 3/.4 34/.8 62/.54 25/.1 4/.16 99/.97 52/.6 95/.9 6/8 77/.54 emrc 98/.94 9/.78 86/.76 46/.1 9/.77 28/. 29/.2 Excluding 99/.95 5/3 37/.1 61/.53 23/.1 29/.12 29/.2 99/.97 95/.9 52/.6 6/1 78/.55 98/.96 emrc 9/.78 85/.74 45/.9 89/.74 29/. 29/.2 Excluding 98/.96 44/.16 4/.4 59/.51 23/.1 29/.12 29/.2 99/.97 95/.86 53/.6 6/ 78/.57 98/.96 9/.79 emrc 85/.75 45/.11 9/.78 29/. 29/.2 Supplementary Figure 4...continued. d, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the unstable taxon Saccharomyces castellii. e, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the fast-evolving and unstable taxon Eremothecium gossypii. f, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the fast-evolving and unstable taxon Kluyveromyces lactis. WWW.NATURE.COM/NATURE 9

RESEARCH SUPPLEMENTARY INFORMATION g Excluding and emrc 66/9 99/.96 57/.42 21/.1 29/.12 28/.2 96/.87 98/.97 52/.6 6/ 76/.53 29/. 46/.1 86/.74 9/.78 98/.96 29/.2 89/.74 h Excluding, and 99/.95 65/5 6/.43 27/.2 49/1 98/.98 5/.4 96/.85 6/8 75/.52 98/.94 emrc 89/.77 86/.75 46/.1 89/.77 28/. 28/.1 Supplementary Figure 4...continued. g, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of both Eremothecium gossypii and Kluyveromyces lactis. h, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of Eremothecium gossypii, Kluyveromyces lactis and Candida glabrata. 1 WWW.NATURE.COM/NATURE

RESEARCH a b Change in GSF compared to default analysis Change in GSF compared to default analysis Change in GSF when removing fast-evolving and unstable species 5 4 3 2 1-1 5 4 3 2 1-1 excluding Egos excluding Cgla excluding Klac excluding Egos, Klac excluding Cgla, Klac excluding Egos, Cgla excluding Egos, Cgla, Klac Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Change in GSF when using only genes that support specific, likely correct, selecting [Cgla,Sbay,Scer,Skud,Smik,Spar] selecting [Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar, Zrou] selecting [Calb,Cdub,Ctro] Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Scer,Spar Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Supplementary Figure 5 Removal of fast-evolving and unstable species or the exclusive use of genes that recover specific has a minor and typically local effect on GSF values of internodes of the yeast. The X-axis shows the 2 present in the yeast suggested by concatenation analysis and the Y-axis the percent change in gene support frequency (GSF) observed for each bipartition between the treatment and the default analysis. Only GSF changes 3% are shown. a, The individual or combined removal of E. gossypii (Egos), K. lactis (Klac), and C. glabrata (Cgla), three of the fastest evolving species as well as of those whose phylogenetic position is most unstable from the dataset has a minor and local effect on the GSF of neighboring internodes. Note that removal of species in the Saccharomyces lineage does not change the GSF of in the Candida lineage. b, The selection of genes whose individual topologies recover well-established of the yeast has a minor effect on the GSF of internodes of the yeast. Note that the [C. albicans, C. dubliniensis, C. tropicalis] (abbreviated [Calb, Cdub, Ctro]) bipartition has 9% GSF in the emrc reconstructed from the 1,7 individual gene trees, the [C. glabrata, K. polysporus, S. bayanus, S. castellii, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus, Z. rouxii] (abbreviated [Zrou, Kpol, Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition has 62% GSF, and the [C. glabrata, S. bayanus, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus] (abbreviated [Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition has 2% GSF. This last bipartition does not appear in the emrc but several independent rare genomic changes strongly suggest that it is the correct one. WWW.NATURE.COM/NATURE 11

RESEARCH SUPPLEMENTARY INFORMATION a b Change in IC when removing fast-evolving and unstable species Change in IC compared to default analysis Change in IC compared to default analysis.5.4.1 -.1 -.5.4.1 -.1 excluding Egos excluding Cgla excluding Klac excluding Egos, Klac excluding Cgla, Klac excluding Egos, Cgla excluding Egos, Cgla, Klac Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Change in IC when using only genes that support specific, likely correct, selecting {Cgla,Sbay,Scer,Skud,Smik,Spar} selecting {Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar, Zrou} selecting {Calb,Cdub,Ctro} Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Supplementary Figure 6 Removal of fast-evolving and unstable species or the exclusive use of genes that recover specific has a minor and typically local effect on IC values of internodes of the yeast. The X-axis shows the 2 present in the yeast suggested by concatenation analysis and the Y-axis the percent change in Internode Certainty (IC) observed for each bipartition between the treatment (removal of fast-evolving and unstable species or of genes that fail to recover specific clades) and the default analysis (all species and genes included). Only GSF changes 3% are shown. a, The individual or combined removal of E. gossypii (Egos), K. lactis (Klac), and C. glabrata (Cgla), three of the fastest evolving species as well as of those whose phylogenetic position is most unstable from the dataset has a minor and local effect on the IC of neighboring internodes. b, The selection of genes whose individual topologies recover well-established of the yeast has a minor effect on the IC of internodes of the yeast. Note that the [C. albicans, C. dubliniensis, C. tropicalis] (abbreviated [Calb, Cdub, Ctro]) bipartition has 9% GSF in the extended majority rule consensus (emrc) reconstructed from the 1,7 individual gene trees, the [C. glabrata, K. polysporus, S. bayanus, S. castellii, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus, Z. rouxii] (abbreviated [Zrou, Kpol, Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition has 62% GSF, and the [C. glabrata, S. bayanus, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus] (abbreviated [Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition has 2% GSF. This last bipartition does not appear in the emrc but, as discussed in the main text, several independent rare genomic changes strongly suggest that it is the correct one. 12 WWW.NATURE.COM/NATURE

RESEARCH a Using only genes that recover the [Calb,Cdub,Ctro] bipartition 99/.96 42/3 31/.4 36/.9 63/.59 24/.2 29/.11 29/.2 99/.99 96/.9 53/.6 61/ 78/.57 99/.95 emrc 1/1. 89/.77 47/.13 9/.76 3/.1 3/.3 b Using only genes that recover the [Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou] bipartition 99/.96 41/4 28/.1 1/1. 31/.3 36/.15 29/.1 99/.98 54/.7 62/1 79/.57 37/.3 98/.92 emrc 91/.8 99/.96 86/.69 47/.1 9/.76 32/.1 28/.1 c Using only genes that recover the [Cgla,Sbay,Scas,Scer,Skud,Smik,Spar] bipartition 99/.96 41/1 3/.1 35/.6 71/.62 28/. 51/1 1/1. 97/.92 1/.96 55/.9 6/6 79/.63 emrc 93/.81 1/1. 83/.66 44/.8 89/.75 32/.2 31/.3 Supplementary Figure 7 Selection of genes that support specific has, if any, a minor and local effect on the yeast inferred by concatenation and emrc approaches. Each panel shows the yeast species inferred from concatenation analysis (left panel) and from extended majority rule consensus (emrc) analysis (right panel) following the selection and use of genes that recover specific. All internodes of phylogenies inferred by concatenation received 1% bootstrap support unless otherwise indicated. Values near internodes of phylogenies inferred by emrc analysis correspond to gene support frequency and internode certainty, respectively. a, (left) and emrc (right) phylogenies using only the genes that recover the [C. albicans, C. dubliniensis, C. tropicalis] (abbreviated [Calb, Cdub, Ctro]) bipartition. b, (left) and emrc (right) phylogenies using only the genes that recover the [C. glabrata, K. polysporus, S. bayanus, S. castellii, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus, Z. rouxii] (abbreviated [Zrou, Kpol, Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition. c, (left) and emrc (right) phylogenies using only the genes that recover the [C. glabrata, S. bayanus, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus] (abbreviated [Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition. This last bipartition does not appear in the emrc but, as discussed in the main text, several independent rare genomic changes strongly suggest that it is the correct one. WWW.NATURE.COM/NATURE 13

RESEARCH SUPPLEMENTARY INFORMATION Selecting the 1 most slowly-evolving genes 98 82 99.6 79 63 95 emrc 97/.92 29/.7 18/.4 11/. 6/.41 22/.1 26/.1 1/1. 27/.3 97/.92 45/.5 4/.8 47/.16 29/.1 95/.86 75/.54 74/.66 43/2 86/.69 23/. Supplementary Figure 8 Selection of the 1 slowest-evolving genes has a large, negative effect on GSF and IC values of internodes of the yeast inferred by concatenation and emrc approaches. Each panel shows the yeast species inferred from concatenation analysis (left panel) and from extended majority rule consensus (emrc) analysis (right panel) following the selection and use of the 1 slowest-evolving genes. All internodes of phylogenies inferred by concatenation received 1% bootstrap support unless otherwise indicated. Values near internodes of phylogenies inferred by emrc analysis correspond to gene support frequency and internode certainty, respectively. 14 WWW.NATURE.COM/NATURE

RESEARCH Using only sites that contain a single rare but conserved amino acid substitution or indels Internode Certainty 1.9.8.7.6.5.4.1 Using only sites that contain single radical substitutions Using only sites that contain a single substitution between physicochemically different amino acids Using only indels Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Supplementary Figure 9 Selection of sites that contain a single rare but conserved amino acid substitution or indels has a large, negative effect on GSF and IC values of internodes of the yeast. The X-axis shows the 2 present in the yeast suggested by concatenation analysis and the Y-axis the percent change in Internode Certainty (IC) observed for each bipartition between the treatment (selection of specific sites or indels) and the default analysis (all sites included). Only GSF changes 3% are shown. The red bars correspond to changes in IC when using only the 2,289 sites that contain single radical substitutions (defined as a substitution with a blosum62 matrix score 3), the blue bars correspond to changes in IC when using only the 4,75 sites that contain a single substitution between amino acids that differ radically in their physicochemical properties, and the yellow bars correspond to changes in IC when using only the 2,474 characters which mark the presence / absence of a single indel that spans 7 or more aa among the 23 yeast species. WWW.NATURE.COM/NATURE 15

RESEARCH SUPPLEMENTARY INFORMATION a Number of Gene Trees 1 8 6 4 2 Distribution of vertebrate gene tree distances from the species 1,86 observed gene trees 1, random gene trees ; Average distance between 1,86 gene trees and the concatenation.42; Average distance between the 1,86 gene trees.4.6.8 1 Normalized Robinson-Foulds Tree Distance b c */98 Internode Certainty 1.9.8.7.6.5.4.1 Vertebrate X. tropicalis G. gallus */8 M. domestica B. taurus */46 */87 E. caballus */22 C. familiaris */89 M. mulatta */86 P. pygmaeus */91 */46 */95 */34 */52 */61 */71 */94 Internode Certainty values for all internodes of the Vertebrate species H. sapiens P. troglodytes R. norvegicus M. musculus C. porcellus D. rerio O. latipes T. nigroviridis T. rubripes G. aculeatus Supplementary Figure 1 High levels of incongruence in Vertebrate and Metazoan phylogenomic datasets despite the inference of highly supported phylogenies by concatenation analysis. a, The distribution of the agreement between the present in the 1,86 individual gene trees (GTs) and the vertebrate concatenation, as well as the distribution of the agreement between the present in 1, randomly generated trees of equal taxon number and the concatenation, measured using the normalized Robinson-Foulds tree distance. The average tree distances between the 1,86 GTs and the concatenation as well as between the 1,86 GTs with each other are also shown. b, The vertebrate species recovered from concatenation analysis of 1,86 genes using maximum likelihood. The extended majority rule consensus (emrc) is topologically identical to the concatenation. Values near internodes correspond to bootstrap support and gene support frequency (GSF), respectively. Asterisks (*) denote internodes that received 1% bootstrap support by the concatenation analysis. c, The distribution of Internode Certainty (IC) values for all internodes of the vertebrate species. 16 WWW.NATURE.COM/NATURE

RESEARCH d Number of Gene Trees 2 16 12 8 4 Distribution of metazoan gene tree distances from the species 225 observed gene trees 1, random gene trees Average distance between.6 225 gene trees and the concatenation Average distance between.72 the 225 gene trees e f Internode Certainty */5.4.6.8 1 Normalized Robinson-Foulds Tree Distance 1.9.8.7.6.5.4.1 */19 Metazoan */1 95/25 */1 92/5 59/12 */23 */23 */97 */39 */59 S. purpuratus B. floridae C. intestinalis */76 G. gallus M. musculus */67 */94 H. sapiens X. tropicalis D. rerio H. robusta L. gigantea */72 98/29 */35 N. vectensis T. adhaerens I. scapularis D. pulex A. mellifera T. castaneum D. melanogaster B. mori M. brevicollis Internode Certainty values for all internodes of the Metazoan species C. elegans S. mansoni Supplementary Figure 1...continued. d, The distribution of the agreement between the present in the 225 individual GTs and the metazoan concatenation, as well as the distribution of the agreement between the present in 1, randomly generated trees of equal taxon number and the concatenation, measured using the normalized Robinson- Foulds tree distance. The average tree distances between the 225 GTs and the concatenation as well as between the 225 GTs with each other are also shown. e, The metazoan species recovered from concatenation analysis of 225 genes using maximum likelihood. The emrc is topologically identical to the concatenation. Values near internodes correspond to bootstrap support and gene support frequency, respectively. Asterisks (*) denote internodes that received 1% bootstrap support by the concatenation analysis. f, The distribution of IC values for all internodes of the metazoan species. Note that GSF and IC values indicate the existence of numerous internodes in the vertebrate and especially in the metazoan that are supported by a small percentage of gene trees and have very small or zero IC values. WWW.NATURE.COM/NATURE 17

RESEARCH SUPPLEMENTARY INFORMATION a b c Using only the 94 genes whose bootstrap consensus trees have average Bootstrap Support 6% 1/.99 43/2 32/.3 38/.9 67/.62 25/.2 32/.13 3/.2 99/.95 98/.93 55/.7 63/5 81/.6 emrc 93/.8 99/.97 9/.77 48/.11 92/.8 31/.1 3/.2 Using only the 545 genes whose bootstrap consensus trees have average Bootstrap Support 7% 1/1. 48/7 34/.2 41/.1 75/.63 27/.2 38/1 73 31/.2 99/.96 1/.98 6/.11 69/.42 88/.68 99/.97 emrc 96/.83 93/.84 55/.15 96/.84 33/.2 31/.2 Using only the 131 genes whose bootstrap consensus trees have average Bootstrap Support 8% 1/1. 51/6 34/. 52/.13 89/.79 98 31/.1 49/1 95 29/. 1/1. 1/1. 73/ 85/.68 98/.84 emrc 98/.89 1/1. 96/.88 62/.17 96/.84 44/.3 37/.1 Supplementary Figure 11 Selection of genes whose bootstrap consensus trees have high average Bootstrap Support (avbs) or Tree Certainty (TC) has a large, positive effect on GSF and IC values of internodes of the yeast inferred by concatenation and emrc approaches. Each panel shows the yeast species inferred from concatenation analysis (left panel) and from extended majority rule consensus (emrc) analysis (right panel) following the selection of genes whose trees have high average bootstrap support (BS) or Tree Certainty (TC). All internodes of phylogenies inferred by concatenation received 1% bootstrap support unless otherwise indicated. Values near internodes of phylogenies inferred by emrc analysis correspond to gene support frequency and internode certainty, respectively. a, (left) and emrc (right) phylogenies of the 94 genes whose gene trees have average BS 6%. b, (left) and emrc (right) phylogenies of the 545 genes whose gene trees have average BS 7%. c, (left) and emrc (right) phylogenies of the 131 genes whose gene trees have average BS 8%. 18 WWW.NATURE.COM/NATURE

RESEARCH d e f 96 64 Using only the 94 genes whose bootstrap consensus trees have highest Tree Certainty 1/.97 43/4 32/.3 38/.9 66/.6 25/.2 31/.13 3/.2 1/.99 97/.92 55/.8 63/5 82/.62 emrc 93/.82 99/.98 9/.78 48/.11 92/.82 32/.1 3/.2 Using only the 545 genes whose bootstrap consensus trees have highest Tree Certainty 1/.98 49/.41 33/.1 42/.1 74/.65 26/.2 37/.18 33/.3 99/.98 98/.94 61/.12 71/.44 88/.7 emrc 98/.89 1/.98 95/.87 55/.17 96/.85 34/.2 31/.2 Using only the 131 genes whose bootstrap consensus trees have highest Tree Certainty 1/1. 51/.41 34/. 5/.9 89/.79 3/.2 47/.14 32/.1 1/1. 1/1. 75/2 84/.67 97/.84 1/1. emrc 98/.93 98/.93 64/1 97/.88 4/.3 37/.1 Supplementary Figure 11...continued. d, (left) and emrc (right) phylogenies of the 94 genes with the highest TC. e, (left) and emrc (right) phylogenies of the 545 genes with the highest TC. f, (left) and emrc (right) phylogenies of the 131 genes with the highest TC. WWW.NATURE.COM/NATURE 19

RESEARCH SUPPLEMENTARY INFORMATION a b Change in Gene Support Frequency for highly supported genes or slow evolving genes Change in GSF compared to default analysis Change in IC compared to default analysis 45 35 25 15 5-5 -15-25 -35-45 Change in Internode Certainty for highly supported genes, or slow evolving genes.5.4.1.1.4.5 Using only the 131 genes with average BS 8% Using only the 131 genes with the highest TC Using only the 1 slowest evolving genes Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Using only the 131 genes with average BS 8% Using only the 131 genes with the highest TC Using only with BS 8% Using only the 1 slowest evolving genes Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Supplementary Figure 12 Selecting highly supported genes or has a large, positive effect on GSF and IC values of internodes of the yeast. The X-axis shows the 2 present in the yeast suggested by concatenation analysis and the Y-axis the percent change in Gene Support Frequency (GSF) and Internode Certainty (IC) observed for each bipartition between the treatment (selection of highly supported genes or internodes) and the default analysis. Only GSF changes 3% and IC changes.3 are shown. The red bars correspond to changes in IC when using only the 131 genes with average bootstrap support 8%, the yellow bars correspond to changes in IC when using only the 131 genes with the highest Tree Certainty, the black bars correspond to changes in IC when using only those found in the bootstrap consensus trees of individual genes that had bootstrap support 8%, and the blue bars correspond to changes in IC when using only the 1 slowestevolving genes. a, Change in GSF for highly supported genes or slow evolving genes. b, Change in IC for highly supported genes, or slow evolving genes. 2 WWW.NATURE.COM/NATURE

RESEARCH a c Using only that received 6 bootstrap support in individual gene tree analyses emrc Using only that received 8 bootstrap support in individual gene tree analyses emrc 18/.9 98/.99 25/.49 22/.15 48/.74 1/.2 18/ 16/.3 98/.99 94/.94 36/.8 37/.47 67/.74 1/.13 35/.8 94/.96 31/. 85/.89 97/.97 81/.88 15/.13 83/.87 12/.58 13/1 2/.8 96/1. 4/.1 1/9 6/.1 96/.99 2/.19 22/.69 49/.85 7/. 94/.99 75/.96 7/.94 2/ 71/.93 1/.12 b Using only that received 7 bootstrap support in individual gene tree analyses 98/1. 18/.51 14/.11 17/3 41/.77 7/.2 14/3 11/.1 97/.99 94/.95 3/.14 31/.54 59/.8 96/.99 emrc 82/.93 75/.93 27/.17 96/.89 12/.1 15/.1 d Internode Certainty 1.8.6.4 Default analysis Kthe,Kwal Calb,Cdub Sbay,Scer,Skud,Smik,Spar Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Ctro Cpar,Lelo Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Using 6 BS in individual gene tree analyses Scer,Spar Calb,Cdub,Cpar,Ctro,Lelo Scer,Smik,Spar Scer,Skud,Smik,Spar Calb,Cdub,Cpar,Ctro,Lelo,Psti Kthe,Kwal,Sklu Egos,Klac Change in Internode Certainty when using that were highly supported in individual gene analyses Using 7 BS in individual gene tree analyses Using 8 BS in individual gene tree analyses Egos,Klac,Kthe,Kwal,Sklu Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Cgui,Dhan Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Supplementary Figure 13 Selection of highly supported from the bootstrap consensus trees of individual genes has a large, positive effect on the IC values of internodes of the yeast inferred by the emrc approach. The first three panels show the yeast species inferred from extended majority rule consensus (emrc) analysis following the selection of that had high bootstrap support (BS) in the bootstrap consensus trees of individual genes. Values near internodes correspond to the percentage of bootstrap consensus trees of individual genes in which this specific bipartition received high BS and to internode certainty (IC), respectively. a, The emrc inferred from selecting that had BS 6% in individual gene analyses. b, The emrc inferred from selecting that had BS 7% in individual gene analyses. c, The emrc inferred from selecting that had BS 8% in individual gene analyses. d, Plot that illustrates the change in IC of internodes relative to the values obtained in the default analysis associated with the use of that had high bootstrap support (BS) in the bootstrap consensus trees of individual genes. Each line of different color depicts the IC value obtained for a given internode in the default analysis (Fig. 1a), when using only that had BS 6%, BS 7%, and BS 8%. WWW.NATURE.COM/NATURE 21

RESEARCH SUPPLEMENTARY INFORMATION a Internode Certainty 1.8.6.4 Change in Internode Certainty of the Vertebrate when using highly-supported genes and Default analysis 6 BS 7 BS 8 BS Using only those genes whose bootstrap consensus trees had XX average bootstrap support (BS) 6 BS 7 BS 8 BS Using only that received XX bootstrap support (BS) in individual gene tree analyses Drer,Gacu,Olat,Tnig,Trub Tnig,Trub Mmus,Rnor Gacu,Olat,Tnig,Trub Drer,Gacu,Ggal,Mdom,Olat,Tnig,Trub,Xtro Drer,Gacu,Ggal,Olat,Tnig,Trub,Xtro Hsap,Mmul,Ppyg,Ptro Drer,Gacu,Olat,Tnig,Trub,Xtro Hsap,Ptro Hsap,Ppyg,Ptro Cpor,Mmus,Rnor Gacu,Tnig,Trub Btau,Cfam,Ecab Cpor,Hsap,Mmul,Mmus,Ppyg,Ptro,Rnor b Change in Internode Certainty of the Metazoan when using highly-supported genes and Internode Certainty 1 Drer,Ggal,Hsap,Mmus,Xtro Hsap,Mmus Ggal,Hsap,Mmus.8 Amel,Bmor,Dmel,Tcas Ggal,Hsap,Mmus,Xtro Amel,Bmor,Dmel,Dpul,Tcas.6 Amel,Bmor,Dmel,Dpul,Isca,Tcas Bmor,Dmel Nvec,Tadh.4 Bmor,Dmel,Tcas Hrob,Lgig Cele,Sman Cint,Drer,Ggal,Hsap,Mmus,Xtro Mbre,Nvec,Tadh Bflo,Cint,Drer,Ggal,Hsap,Mmus,Xtro Amel,Bmor,Cele,Dmel,Dpul,Hrob,Isca,Lgig,Sman,Tcas Default analysis 4 BS 5 BS 6 BS Using only those genes whose bootstrap consensus trees had XX average bootstrap support (BS) 6 BS 7 BS 8 BS Using only that received XX bootstrap support (BS) in individual gene tree analyses Supplementary Figure 14 Selection of highly supported genes and has a large, positive effect on IC values of internodes of the vertebrate and metazoan phylogenies. a, Plot that illustrates the change in IC of internodes of the vertebrate relative to the values obtained in the default analysis associated with the use of genes whose bootstrap consensus trees have high average bootstrap support (BS) or with the use of that had high BS in the bootstrap consensus trees of individual genes. Each line of different color depicts the IC value obtained for a given internode in the default analysis (Supplementary Fig. S1b), when using only genes with average BS 6%, BS 7%, and BS 8%, as well as when using only that had BS 6%, BS 7%, and BS 8%. b, Plot that illustrates the change in IC of internodes of the metazoan relative to the values obtained in the default analysis associated with the use of genes whose bootstrap consensus trees have high average bootstrap support (BS) or with the use of that had high BS in the bootstrap consensus trees of individual genes. Each line of different color depicts the IC value obtained for a given internode in the default analysis (Supplementary Fig. S1e), when using only genes with average BS 4%, BS 5%, and BS 6%, as well as when using only that had BS 6%, BS 7%, and BS 8%. The slight differences in the pattern of change of IC in the metazoan dataset relative to the other two datasets is likely to be due to the much smaller number of metazoan genes in the dataset and their lower average bootstrap support, which will result in smaller sets of genes and with strong phylogenetic signal. 22 WWW.NATURE.COM/NATURE

RESEARCH Kluyveromyces thermotolerans (Kthe) Candida parapsilosis (Cpar) Lodderomyces elongisporus (Lelo) Candida dubliniensis (Cdub).1 Zygosacharomyces rouxii (Zrou) Saccharomyces castellii (Scas) Kluyveromyces polysporus (Kpol) Saccharomyces sensu stricto clade Candida glabrata Cgla) Supplementary Figure 15 The phylogenetic consensus network that describes the 1,7 yeast gene histories. The consensus network inferred using the 1,7 maximum likelihood gene trees under the median network construction algorithm in the SplitsTree4 software. Boxes in the network denote internodes that harbor significant conflict, with the length of each branch in each box being proportional to the number of GTs that support it. Only branches that are present in at least 1% of the GTs are shown in the network..1 S. mikatae (Smik) S.cerevisiae (Scer) S. paradoxus (Spar) S. kudriavzevii (Skud) S. bayanus (Sbay) WWW.NATURE.COM/NATURE 23

RESEARCH SUPPLEMENTARY INFORMATION 95/.9 99/.96 41/2 31/.4 36/.9 62/.59 24/.2 29/.12 29/.2 45/.11 29/.1 29/.2 Zygosacharomyces rouxii (Zrou) 99/.97 86/.76 52/.6 6/ 77/.54 9/.78 98/.95 89/.77 Supplementary Figure 16 Supported, weakly supported, and unresolved internodes in the yeast. Values near internodes correspond to gene support frequency and internode certainty, respectively calculated from the 1,7 yeast gene histories. Note that the validity of certain internodes marked as unresolved is supported by independent data (e.g., rare genomic changes). Supported Weakly supported Unresolved 24 WWW.NATURE.COM/NATURE

RESEARCH a The yeast species inferred using the STAR species tree method b STAR internode length 2.5 2 1.5 1.5 1/2.68 1/1.47 99/.53 89/.45 1/1.36 67/2 83/.4 89/.42 77/5 1/2.13 58/9 96/.53 99/.84 1/1.3 1/.98 1/1.25 84/.58 1/.94 68/4 77/4 4. STAR internode length versus GSF r =.83 r =.88 STAR internode length 2.5 2 1.5 1.5 STAR internode length versus IC 2 4 6 8 1 Gene Support Frequency.4.6.8 1 Internode Certainty Supplementary Figure 17 The yeast inferred using a species tree method that accounts for variation between the 1,7 gene histories is highly supported and has extremely short internodes whose coalescent unit lengths are highly correlated with gene support frequency and internode certainty values. Using the 1,7 gene dataset, we inferred a yeast species under the coalescent model and average ranks of gene coalescence times, as implemented in the STAR species tree method. a, The yeast species under the coalescent. Values near internodes correspond to bootstrap support and internode length in coalescence units, respectively. The inferred topology is identical to the shown in Figure 1a, except with respect to the placement of Candida lusitaniae. b, The lengths of internodes in the inferred using the STAR species tree method, measured in average coalescent units, is highly correlated with internodes Gene Support Frequency (left panel) and Internode Certainty (right panel) values. The strength of each correlation is indicated by r, Pearson s correlation coefficient. WWW.NATURE.COM/NATURE 25