SUPPLEMENTARY INFORMATION

Size: px
Start display at page:

Download "SUPPLEMENTARY INFORMATION"

Transcription

1 doi:1.138/nature1213 Supplementary Table 1. The Taxonomy of the Organisms Used in this Study Organism (acronym) Taxonomy Yeasts Zygosacharomyces rouxii (Zrou) Verterbrates Xenopus tropicalis (Xtro) Gallus gallus (Ggal) Monodelphis domestica (Mdom) Bos taurus (Btau) Equus caballus (Ecab) Canis familiaris (Cfam) Macaca mulatta (Mmul) Pongo pygmaeus (Ppyg) Homo sapiens (Hsap) Pan troglodytes (Ptro) Rattus norvegicus (Rnor) Mus musculus (Mmus) Cavia porcellus (Cpor) Danio rerio (Drer) ; Animalia;Chordata;Aphibia;Anura;Pipidae Animalia;Chordata;Aves;Galliformes;Phasianidae Animalia;Chordata;Monodelphis;Mammalia;Didelphimorphia Animalia;Chordata;Mammalia;Artiodactyla;Bovidae Animalia;Chordata;Mammalia;Perissodactyla;Equidae Animalia;Chordata;Mammalia;Carnivora;Canidae Animalia;Chordata;Mammalia;Primates;Cercopithecidae Animalia;Chordata;Mammalia;Primates;Hominidae Animalia;Chordata;Mammalia;Primates;Hominidae Animalia;Chordata;Mammalia;Primates;Hominidae Animalia;Chordata;Mammalia;Rodentia;Muridae Animalia;Chordata;Mammalia;Rodentia;Muridae Animalia;Chordata;Mammalia;Rodentia;Caviidae Animalia;Chordata;Actinopterygii;Cypriniformes;Cyprinidae 1

2 RESEARCH SUPPLEMENTARY INFORMATION Oryzias latipes (Olat) Tetraodon nigroviridis (Tnig) Takifugu rubripes (Trub) Gasterosteus aculeatus (Gacu) Metazoa Strongylocentrotus purpuratus (Spur) Branchiostoma floridae (Bflo) Ciona intestinalis (Cint) Mus musculus (Mmus) Gallus gallus (Ggal) Homo sapiens (Hsap) Xenopus tropicalis (Xtro) Danio rerio (Drer) Helobdella robusta (Hrob) Lottia gigantea (Lgig) Caenorhabditis elegans (Cele) Schistosoma mansoni (Sman) Ixodes scapularis (Isca) Daphnia pulex (Dpul) Apis mellifera (Amel) Tribolium castaneum (Tcas) Drosophila melanogaster (Dmel) Bombyx mori (Bmor) Monosiga brevicollis (Mbre) Nematostella vectensis (Nvec) Trichoplax adhaerens (Tadh) Animalia;Chordata;Actinopterygii;Beloniformes;Adrianichthyidae Animalia;Chordata;Actinopterygii;Tetraodontiformes;Tetraodontidae Animalia;Chordata;Actinopterygii;Tetraodontiformes;Tetraodontidae Animalia;Chordata;Actinopterygii;Gasterosteiformes;Gasterosteidae Animalia;Echinodermata;Echinoidea;Echinoida;Strongylocentrotidae Animalia;Chordata;Leptocardii;Amphioxiformes;Branchiostomidae Animalia;Chordata;Ascidiaceae;Enterogona;Cionidae Animalia;Chordata;Mammalia;Rodentia;Muridae Animalia;Chordata;Aves;Galliformes;Phasianidae Animalia;Chordata;Mammalia;Primates;Hominidae Animalia;Chordata;Aphibia;Anura;Pipidae Animalia;Chordata;Actinopterygii;Cypriniformes;Cyprinidae Animalia;Annelida;Clitellata;Rhynchobdellida;Glossiphoniidae Animalia;Mollusca;Gastropoda;Patellogastropoda;Lottiidae Animalia;Nematoda;Secernentea;Rhabditida;Rhabditidae Animalia;Platyhelminthes;Digenea;Strigeidida Animalia;Arthropoda;Arachnida;Ixodida;Ixodidae Animalia;Arthropoda;Branchiopoda;Cladocera;Daphniidae Animalia;Arthropoda;Insecta;Hymenoptera;Apidae Animalia;Arthropoda;Insecta;Coleoptera;Tenebrionidae Animalia;Arthropoda;Insecta;Diptera;Drosophilidae Animalia;Arthropoda;Insecta;Lepidopteroa;Bombycidae hoanoflagellida;codonosigidae Animalia;Cnidaria;Anthozoa;Actiniaria;Edwardsiidae Animalia;Placozoa;Tricoplacia;Tricoplaciformes;Trichoplacidae 2

3 RESEARCH Supplementary Table 2.Bipartitions that Significantly Conflict with the Bipartitions Recovered in the Phylogeny of 23 Yeast Genomes Primary Tree Bipartition GSF IC [Conflicting Bipartition]:GSF value Kthe, Kwal None Calb, Cdub None Sbay, Scer, Skud, Smik, Spar None Calb, Cdub, Cgui, Clus, Cpar, Ctro, Dhan, Lelo, Psti 95.9 None Calb, Cdub, Ctro 9.78 None Cpar, Lelo None Calb, Cdub, Cpar, Ctro, Lelo None Scer, Spar [Sbay,Skud,Smik,Spar]:8; [Smik,Spar]:5 Cgla, Kpol, Sbay, Scas, Scer, Skud, Smik, Spar, Zrou [Calb,Cdub,Cgla,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti]:6; [Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti,Zrou]:5 Scer, Smik, Spar 6 Scer, Skud, Smik,Spar 52.6 Calb, Cdub, Cpar, Ctro, Lelo, Psti Kthe, Kwal, Sklu 41 2 Agos, Klac 36.9 Agos, Klac, Kthe, Kwal, Sklu 31.4 [Sbay,Skud,Smik]:14; [Sbay,Scer,Skud,Spar]:11; [Sbay,Skud,Smik,Spar]:8; [Skud,Smik]:7 ; [Scer,Skud,Spar]:6 [Sbay,Skud]:29; [Sbay,Skud,Smik]:14; [Sbay,Scer,Smik,Spar]:11; [Sbay,Scer,Skud,Spar]:11; [Sbay,Skud,Smik,Spar]:8 [Cgui,Clus,Dhan,Psti]:2; [Dhan,Psti]:11; [Cgui,Dhan,Psti]:1; [Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo]:8; [Calb,Cdub,Clus,Cpar,Ctro,Lelo]:5; [Clus,Dhan,Psti]:5 [Agos,Kthe,Kwal]:9; [Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Kthe, Kwal,Lelo,Psti]:9; [Klac,Sklu]:8; [Agos,Klac,Kthe,Kwal]:7; [Agos, Klac,Sklu]:7; [Agos,Sklu]:7; [Cgla,Kpol,Kthe,Kwal,Sbay,Scas, Scer,Skud,Smik,Spar,Zrou]:6; [Klac,Kthe,Kwal]:5; [Cgla,Kpol, Sbay,Scas,Scer,Sklu,Skud,Smik,Spar,Zrou]:5 [Agos,Cgla,Kpol,Kthe,Kwal,Sbay,Scas,Scer,Sklu,Skud,Smik,Spar, Zrou]:17; [Cgla,Klac,Kpol,Kthe,Kwal,Sbay,Scas,Scer,Sklu,Skud, Smik,Spar,Zrou]:13; [Agos,Kthe,Kwal,Sklu]:13; [Klac,Kthe,Kwal, Sklu]:1; [Agos,Kthe,Kwal]:9; [Klac,Sklu]:8; [Agos,Sklu]:7; [Cgla, Klac,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:7; [Klac,Kthe, Kwal]:5 [Agos,Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Klac,Lelo,Psti]:19; [Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Klac,Lelo,Psti]:17; [Agos, Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti]:13; [Calb,Cdub, Cgui,Clus,Cpar,Ctro,Dhan,Kthe,Kwal,Lelo,Psti]:9; [Agos,Cgla, Klac,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:7; [Cgla,Klac, Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:7; [Cgla,Kpol,Kthe, Kwal,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:6; [Cgla,Kpol,Sbay, Scas,Scer,Sklu,Skud,Smik,Spar,Zrou]:5 Cgla, Sbay, Scas, Scer, Skud, Smik, Spar [Cgla,Kpol]:12; [Kpol,Scas]:1; [Kpol,Sbay,Scas,Scer,Skud,Smik, Spar,Zrou]:9; [Kpol,Sbay,Scas,Scer,Skud,Smik,Spar]:8; [Cgla, 3

4 RESEARCH SUPPLEMENTARY INFORMATION Zrou]:8; [Kpol,Sbay,Scer,Skud,Smik,Spar]:8; [Sbay,Scer,Skud, Smik,Spar,Zrou]:7; [Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:7; [Cgla,Kpol,Scas]:6; [Agos,Klac,Kpol,Kthe,Kwal,Sbay,Scas,Scer, Sklu,Skud,Smik,Spar,Zrou]:6; [Scas,Zrou]:5 Sbay, Scas, Scer, Skud, Smik, Spar 29.2 [Cgla,Sbay,Scer,Skud,Smik,Spar]:2; [Cgla,Scas]:17; [Kpol, Scas]:1; [Kpol,Sbay,Scer,Skud,Smik,Spar]:8; [Sbay,Scer,Skud, Smik,Spar,Zrou]:7; [Cgla,Kpol,Scas]:6; [Scas,Zrou]:5 Calb, Cdub, Cgui, Cpar, Ctro, Dhan, Lelo, Psti 29.1 [Cgui,Clus,Dhan]:24; [Cgui,Clus,Dhan,Psti]:2; [Cgui,Clus]:2; [Calb,Cdub,Clus,Cpar,Ctro,Dhan,Lelo,Psti]:16; [Clus,Dhan]:12; [Calb,Cdub,Clus,Cpar,Ctro,Lelo,Psti]:9; [Calb,Cdub,Cgui,Clus, Cpar,Ctro,Dhan,Lelo]:8; [Calb,Cdub,Cgui,Clus,Cpar,Ctro,Lelo, Psti]:6; [Clus,Dhan,Psti]:5; [Calb,Cdub,Clus,Cpar,Ctro,Lelo]:5 Cgui, Dhan 29.2 [Cgui,Clus]:2; [Calb,Cdub,Cpar,Ctro,Dhan,Lelo,Psti]:18; [Calb,Cdub,Clus,Cpar,Ctro,Dhan,Lelo,Psti]:16; [Clus,Dhan]:12; [Dhan,Psti]:11; [Calb,Cdub,Cgui,Cpar,Ctro,Lelo,Psti]:6; [Calb, Cdub,Cgui,Clus,Cpar,Ctro,Lelo,Psti]:6; [Clus,Dhan,Psti]:5 Cgla, Kpol, Sbay, Scas, Scer, Skud, Smik, Spar 24.2 [Kpol,Zrou]:17; [Cgla,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:15; [Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou]:9; [Cgla,Zrou]:8; [Sbay,Scer,Skud,Smik,Spar,Zrou]:7; [Sbay,Scas,Scer,Skud,Smik, Spar,Zrou]:7; [Calb,Cdub,Cgla,Cgui,Clus,Cpar,Ctro,Dhan,Lelo, Psti]:6; [Scas,Zrou]:5 4

5 RESEARCH a b Maximum likelihood species inferred using the GARLI software Bayesian inference species inferred using the MrBayes software.1 Supplementary Figure 1 The topology of the yeast recovered from concatenation analyses using one other maximum likelihood software (GARLI) and one other Bayesian inference (MrBayes) software was identical to the topology recovered by maximum likelihood analysis using the RAxML software. a, The yeast species recovered from concatenation analysis of 1,7 genes using maximum likelihood as implemented in the GARLI software. All internodes received 1% bootstrap support. b, The yeast species recovered from concatenation analysis of 1,7 genes using Bayesian inference as implemented in the MrBayes software. All internodes had 1% posterior probability. 5

6 RESEARCH SUPPLEMENTARY INFORMATION Internode Certainty Support Value X 2 conflicting with support values X and 1-X 3 conflicting with values X, 95-X, and 5 of which only the two highest are used to calculate IC 3 conflicting with values X, 85-X, and 15 of which only the two highest are used to calculate IC 3 conflicting with values X, 75-X, and 25 of which only the two highest are used to calculate IC Supplementary Figure 2 Representative values of the new measure Internode Certainty (IC) for a range of representative support values of two most prevalent and conflicting for a given internode. Each plot on the graph depicts how IC (Y-axis) varies in response to the relative support of conflicting on a given internode. IC can be measured on any given set of trees. For example, if the entire set of gene trees (GTs) is used, the IC value of a given internode will reflect the amount of information available for that internode in the set GTs by considering the internode s gene support frequency jointly with the frequency of the most prevalent bipartition that conflicts with the internode. If the set of bootstrap replicate trees for a given gene is used, then IC will be calculated based on bootstrap support values. From the right to the left of the graph, the first of the four plots shown with triangle symbols corresponds to the case of only two conflicting for one internode with support values X and 1 X. For example, given 1 total GTs, if 6 of them support bipartition 1, the remaining 4 will support the conflicting bipartition. The second, third and fourth of the four plots (shown with diamond, circle, and square symbols, respectively) correspond to case where there are three conflicting for one internode, but only the two most prevalent ones are considered. For example, in the plot with the diamond symbols, given 1 total GTs, if 6 of them support bipartition 1, 35 will support the conflicting bipartition 2, because conflicting bipartition 3 has been set to be supported by 5 GTs. Thus, when the two most prevalent ones are considered, the percentage of GTs supporting the first bipartition will be equal to 6/(6+35), whereas the percentage of GTs supporting the second bipartition will be 35/(6+35). The reason that the number of GTs that support the third conflicting bipartition is not included is because we want IC to measure the magnitude of certainty conveyed by the two most prevalent. This way, IC will equal zero when the two most prevalent are equally prevalent (in this example that would be the case if 1 and 2 were each supported by 42.5 GTs each). 6

7 RESEARCH a b Removing all sites containing gaps 98/.95 37/6 3/.4 33/.9 59/.53 22/.1 28/.1 29/.2 95/.9 98/.97 48/.5 54/4 emrc 7/.48 89/.76 98/.95 85/.73 44/.7 87/.73 28/. 3/.4 Removing all sites with 5% gaps 32/.5 61/.58 23/.2 3/.16 29/.2 95/.88 emrc 29/.1 99/.97 52/.6 45/.9 99/.96 42/3 3/.3 36/.1 59/7 75/.46 9/.78 98/.94 87/.77 9/.75 c Removing genes whose alignment length following gap removal is 7% of the length of the original alignment (374 genes retained) 98/.95 37/6 31/.5 34/.8 63/.52 25/.4 33/.16 29/.3 1/1. 99/.97 55/.1 59/ emrc 76/.55 88/.74 99/.97 86/.82 43/.1 89/.77 29/.1 3/.2 Supplementary Figure 3 Removal of sites containing gaps or of poorly aligned genes does not significantly improve the yeast inferred by concatenation and emrc approaches. Each panel shows the yeast species inferred from concatenation analysis (left panel) and from extended majority rule consensus (emrc) analysis (right panel). All internodes of phylogenies inferred by concatenation received 1% bootstrap support unless otherwise indicated. Values near internodes of phylogenies inferred by emrc analysis correspond to gene support frequency and internode certainty, respectively. a, (left) and emrc (right) phylogenies of all 1,7 genes following removal of all sites containing gaps. b, (left) and emrc (right) phylogenies of all 1,7 genes following removal of all sites where 5% of the character states are gaps. c, (left) and emrc (right) phylogenies of the 374 genes whose alignment length following removal of all gaps is 7% of the length of the original alignment. 7

8 RESEARCH SUPPLEMENTARY INFORMATION a b c Excluding 31/.5 98/.94 42/1 37/.13 61/.57 23/.1 29/.12 3/.3 99/.97 95/.88 52/.5 58/8 75/.54 99/.96 9/.8 84/.68 43/.2 89/.76 emrc 48/.6 Excluding 99/.95 39/1 31/.4 36/.1 66/.55 42/ 35/.4 99/.96 95/.89 53/.6 61/ 77/.54 98/.95 emrc 91/.79 87/.78 46/.1 91/.77 28/. 29/.2 Excluding 98/.96 4/6 3/.3 36/.1 66/.6 3/.2 52/6 99/.98 96/.88 51/.5 6/8 76/.51 emrc 98/.95 9/.79 86/.74 46/.9 89/.77 27/. 29/.2 Supplementary Figure 4 Removal of one or more unstable or fast-evolving species has, if any, a minor and local effect on the yeast inferred by concatenation and emrc approaches. Each panel shows the yeast species inferred from concatenation analysis (left panel) and from extended majority rule consensus (emrc) analysis (right panel) following removal of one or more unstable or fast-evolving species from the analysis. All internodes of phylogenies inferred by concatenation received 1% bootstrap support unless otherwise indicated. Values near internodes of phylogenies inferred by emrc analysis correspond to gene support frequency and internode certainty, respectively. a, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the unstable taxon Candida lusitaniae. b, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the fast-evolving and unstable taxon Kluyveromyces polysporus. c, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the fast-evolving and unstable taxon Candida glabrata. 8

9 RESEARCH d e f Excluding 99/.96 41/1 3/.4 34/.8 62/.54 25/.1 4/.16 99/.97 52/.6 95/.9 6/8 77/.54 emrc 98/.94 9/.78 86/.76 46/.1 9/.77 28/. 29/.2 Excluding 99/.95 5/3 37/.1 61/.53 23/.1 29/.12 29/.2 99/.97 95/.9 52/.6 6/1 78/.55 98/.96 emrc 9/.78 85/.74 45/.9 89/.74 29/. 29/.2 Excluding 98/.96 44/.16 4/.4 59/.51 23/.1 29/.12 29/.2 99/.97 95/.86 53/.6 6/ 78/.57 98/.96 9/.79 emrc 85/.75 45/.11 9/.78 29/. 29/.2 Supplementary Figure 4...continued. d, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the unstable taxon Saccharomyces castellii. e, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the fast-evolving and unstable taxon Eremothecium gossypii. f, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of the fast-evolving and unstable taxon Kluyveromyces lactis. 9

10 RESEARCH SUPPLEMENTARY INFORMATION g Excluding and emrc 66/9 99/.96 57/.42 21/.1 29/.12 28/.2 96/.87 98/.97 52/.6 6/ 76/.53 29/. 46/.1 86/.74 9/.78 98/.96 29/.2 89/.74 h Excluding, and 99/.95 65/5 6/.43 27/.2 49/1 98/.98 5/.4 96/.85 6/8 75/.52 98/.94 emrc 89/.77 86/.75 46/.1 89/.77 28/. 28/.1 Supplementary Figure 4...continued. g, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of both Eremothecium gossypii and Kluyveromyces lactis. h, (left) and emrc (right) phylogenies of all 1,7 genes following the removal of Eremothecium gossypii, Kluyveromyces lactis and Candida glabrata. 1

11 RESEARCH a b Change in GSF compared to default analysis Change in GSF compared to default analysis Change in GSF when removing fast-evolving and unstable species excluding Egos excluding Cgla excluding Klac excluding Egos, Klac excluding Cgla, Klac excluding Egos, Cgla excluding Egos, Cgla, Klac Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Change in GSF when using only genes that support specific, likely correct, selecting [Cgla,Sbay,Scer,Skud,Smik,Spar] selecting [Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar, Zrou] selecting [Calb,Cdub,Ctro] Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Scer,Spar Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Supplementary Figure 5 Removal of fast-evolving and unstable species or the exclusive use of genes that recover specific has a minor and typically local effect on GSF values of internodes of the yeast. The X-axis shows the 2 present in the yeast suggested by concatenation analysis and the Y-axis the percent change in gene support frequency (GSF) observed for each bipartition between the treatment and the default analysis. Only GSF changes 3% are shown. a, The individual or combined removal of E. gossypii (Egos), K. lactis (Klac), and C. glabrata (Cgla), three of the fastest evolving species as well as of those whose phylogenetic position is most unstable from the dataset has a minor and local effect on the GSF of neighboring internodes. Note that removal of species in the Saccharomyces lineage does not change the GSF of in the Candida lineage. b, The selection of genes whose individual topologies recover well-established of the yeast has a minor effect on the GSF of internodes of the yeast. Note that the [C. albicans, C. dubliniensis, C. tropicalis] (abbreviated [Calb, Cdub, Ctro]) bipartition has 9% GSF in the emrc reconstructed from the 1,7 individual gene trees, the [C. glabrata, K. polysporus, S. bayanus, S. castellii, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus, Z. rouxii] (abbreviated [Zrou, Kpol, Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition has 62% GSF, and the [C. glabrata, S. bayanus, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus] (abbreviated [Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition has 2% GSF. This last bipartition does not appear in the emrc but several independent rare genomic changes strongly suggest that it is the correct one. 11

12 RESEARCH SUPPLEMENTARY INFORMATION a b Change in IC when removing fast-evolving and unstable species Change in IC compared to default analysis Change in IC compared to default analysis excluding Egos excluding Cgla excluding Klac excluding Egos, Klac excluding Cgla, Klac excluding Egos, Cgla excluding Egos, Cgla, Klac Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Change in IC when using only genes that support specific, likely correct, selecting {Cgla,Sbay,Scer,Skud,Smik,Spar} selecting {Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar, Zrou} selecting {Calb,Cdub,Ctro} Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Supplementary Figure 6 Removal of fast-evolving and unstable species or the exclusive use of genes that recover specific has a minor and typically local effect on IC values of internodes of the yeast. The X-axis shows the 2 present in the yeast suggested by concatenation analysis and the Y-axis the percent change in Internode Certainty (IC) observed for each bipartition between the treatment (removal of fast-evolving and unstable species or of genes that fail to recover specific clades) and the default analysis (all species and genes included). Only GSF changes 3% are shown. a, The individual or combined removal of E. gossypii (Egos), K. lactis (Klac), and C. glabrata (Cgla), three of the fastest evolving species as well as of those whose phylogenetic position is most unstable from the dataset has a minor and local effect on the IC of neighboring internodes. b, The selection of genes whose individual topologies recover well-established of the yeast has a minor effect on the IC of internodes of the yeast. Note that the [C. albicans, C. dubliniensis, C. tropicalis] (abbreviated [Calb, Cdub, Ctro]) bipartition has 9% GSF in the extended majority rule consensus (emrc) reconstructed from the 1,7 individual gene trees, the [C. glabrata, K. polysporus, S. bayanus, S. castellii, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus, Z. rouxii] (abbreviated [Zrou, Kpol, Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition has 62% GSF, and the [C. glabrata, S. bayanus, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus] (abbreviated [Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition has 2% GSF. This last bipartition does not appear in the emrc but, as discussed in the main text, several independent rare genomic changes strongly suggest that it is the correct one. 12

13 RESEARCH a Using only genes that recover the [Calb,Cdub,Ctro] bipartition 99/.96 42/3 31/.4 36/.9 63/.59 24/.2 29/.11 29/.2 99/.99 96/.9 53/.6 61/ 78/.57 99/.95 emrc 1/1. 89/.77 47/.13 9/.76 3/.1 3/.3 b Using only genes that recover the [Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou] bipartition 99/.96 41/4 28/.1 1/1. 31/.3 36/.15 29/.1 99/.98 54/.7 62/1 79/.57 37/.3 98/.92 emrc 91/.8 99/.96 86/.69 47/.1 9/.76 32/.1 28/.1 c Using only genes that recover the [Cgla,Sbay,Scas,Scer,Skud,Smik,Spar] bipartition 99/.96 41/1 3/.1 35/.6 71/.62 28/. 51/1 1/1. 97/.92 1/.96 55/.9 6/6 79/.63 emrc 93/.81 1/1. 83/.66 44/.8 89/.75 32/.2 31/.3 Supplementary Figure 7 Selection of genes that support specific has, if any, a minor and local effect on the yeast inferred by concatenation and emrc approaches. Each panel shows the yeast species inferred from concatenation analysis (left panel) and from extended majority rule consensus (emrc) analysis (right panel) following the selection and use of genes that recover specific. All internodes of phylogenies inferred by concatenation received 1% bootstrap support unless otherwise indicated. Values near internodes of phylogenies inferred by emrc analysis correspond to gene support frequency and internode certainty, respectively. a, (left) and emrc (right) phylogenies using only the genes that recover the [C. albicans, C. dubliniensis, C. tropicalis] (abbreviated [Calb, Cdub, Ctro]) bipartition. b, (left) and emrc (right) phylogenies using only the genes that recover the [C. glabrata, K. polysporus, S. bayanus, S. castellii, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus, Z. rouxii] (abbreviated [Zrou, Kpol, Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition. c, (left) and emrc (right) phylogenies using only the genes that recover the [C. glabrata, S. bayanus, S. cerevisiae, S. kudriavzevii, S. mikatae, S. paradoxus] (abbreviated [Cgla, Sbay, Skud, Smik, Scer, Spar]) bipartition. This last bipartition does not appear in the emrc but, as discussed in the main text, several independent rare genomic changes strongly suggest that it is the correct one. 13

14 RESEARCH SUPPLEMENTARY INFORMATION Selecting the 1 most slowly-evolving genes emrc 97/.92 29/.7 18/.4 11/. 6/.41 22/.1 26/.1 1/1. 27/.3 97/.92 45/.5 4/.8 47/.16 29/.1 95/.86 75/.54 74/.66 43/2 86/.69 23/. Supplementary Figure 8 Selection of the 1 slowest-evolving genes has a large, negative effect on GSF and IC values of internodes of the yeast inferred by concatenation and emrc approaches. Each panel shows the yeast species inferred from concatenation analysis (left panel) and from extended majority rule consensus (emrc) analysis (right panel) following the selection and use of the 1 slowest-evolving genes. All internodes of phylogenies inferred by concatenation received 1% bootstrap support unless otherwise indicated. Values near internodes of phylogenies inferred by emrc analysis correspond to gene support frequency and internode certainty, respectively. 14

15 RESEARCH Using only sites that contain a single rare but conserved amino acid substitution or indels Internode Certainty Using only sites that contain single radical substitutions Using only sites that contain a single substitution between physicochemically different amino acids Using only indels Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Supplementary Figure 9 Selection of sites that contain a single rare but conserved amino acid substitution or indels has a large, negative effect on GSF and IC values of internodes of the yeast. The X-axis shows the 2 present in the yeast suggested by concatenation analysis and the Y-axis the percent change in Internode Certainty (IC) observed for each bipartition between the treatment (selection of specific sites or indels) and the default analysis (all sites included). Only GSF changes 3% are shown. The red bars correspond to changes in IC when using only the 2,289 sites that contain single radical substitutions (defined as a substitution with a blosum62 matrix score 3), the blue bars correspond to changes in IC when using only the 4,75 sites that contain a single substitution between amino acids that differ radically in their physicochemical properties, and the yellow bars correspond to changes in IC when using only the 2,474 characters which mark the presence / absence of a single indel that spans 7 or more aa among the 23 yeast species. 15

16 RESEARCH SUPPLEMENTARY INFORMATION a Number of Gene Trees Distribution of vertebrate gene tree distances from the species 1,86 observed gene trees 1, random gene trees ; Average distance between 1,86 gene trees and the concatenation.42; Average distance between the 1,86 gene trees Normalized Robinson-Foulds Tree Distance b c */98 Internode Certainty Vertebrate X. tropicalis G. gallus */8 M. domestica B. taurus */46 */87 E. caballus */22 C. familiaris */89 M. mulatta */86 P. pygmaeus */91 */46 */95 */34 */52 */61 */71 */94 Internode Certainty values for all internodes of the Vertebrate species H. sapiens P. troglodytes R. norvegicus M. musculus C. porcellus D. rerio O. latipes T. nigroviridis T. rubripes G. aculeatus Supplementary Figure 1 High levels of incongruence in Vertebrate and Metazoan phylogenomic datasets despite the inference of highly supported phylogenies by concatenation analysis. a, The distribution of the agreement between the present in the 1,86 individual gene trees (GTs) and the vertebrate concatenation, as well as the distribution of the agreement between the present in 1, randomly generated trees of equal taxon number and the concatenation, measured using the normalized Robinson-Foulds tree distance. The average tree distances between the 1,86 GTs and the concatenation as well as between the 1,86 GTs with each other are also shown. b, The vertebrate species recovered from concatenation analysis of 1,86 genes using maximum likelihood. The extended majority rule consensus (emrc) is topologically identical to the concatenation. Values near internodes correspond to bootstrap support and gene support frequency (GSF), respectively. Asterisks (*) denote internodes that received 1% bootstrap support by the concatenation analysis. c, The distribution of Internode Certainty (IC) values for all internodes of the vertebrate species. 16

17 RESEARCH d Number of Gene Trees Distribution of metazoan gene tree distances from the species 225 observed gene trees 1, random gene trees Average distance between gene trees and the concatenation Average distance between.72 the 225 gene trees e f Internode Certainty */ Normalized Robinson-Foulds Tree Distance */19 Metazoan */1 95/25 */1 92/5 59/12 */23 */23 */97 */39 */59 S. purpuratus B. floridae C. intestinalis */76 G. gallus M. musculus */67 */94 H. sapiens X. tropicalis D. rerio H. robusta L. gigantea */72 98/29 */35 N. vectensis T. adhaerens I. scapularis D. pulex A. mellifera T. castaneum D. melanogaster B. mori M. brevicollis Internode Certainty values for all internodes of the Metazoan species C. elegans S. mansoni Supplementary Figure 1...continued. d, The distribution of the agreement between the present in the 225 individual GTs and the metazoan concatenation, as well as the distribution of the agreement between the present in 1, randomly generated trees of equal taxon number and the concatenation, measured using the normalized Robinson- Foulds tree distance. The average tree distances between the 225 GTs and the concatenation as well as between the 225 GTs with each other are also shown. e, The metazoan species recovered from concatenation analysis of 225 genes using maximum likelihood. The emrc is topologically identical to the concatenation. Values near internodes correspond to bootstrap support and gene support frequency, respectively. Asterisks (*) denote internodes that received 1% bootstrap support by the concatenation analysis. f, The distribution of IC values for all internodes of the metazoan species. Note that GSF and IC values indicate the existence of numerous internodes in the vertebrate and especially in the metazoan that are supported by a small percentage of gene trees and have very small or zero IC values. 17

18 RESEARCH SUPPLEMENTARY INFORMATION a b c Using only the 94 genes whose bootstrap consensus trees have average Bootstrap Support 6% 1/.99 43/2 32/.3 38/.9 67/.62 25/.2 32/.13 3/.2 99/.95 98/.93 55/.7 63/5 81/.6 emrc 93/.8 99/.97 9/.77 48/.11 92/.8 31/.1 3/.2 Using only the 545 genes whose bootstrap consensus trees have average Bootstrap Support 7% 1/1. 48/7 34/.2 41/.1 75/.63 27/.2 38/ /.2 99/.96 1/.98 6/.11 69/.42 88/.68 99/.97 emrc 96/.83 93/.84 55/.15 96/.84 33/.2 31/.2 Using only the 131 genes whose bootstrap consensus trees have average Bootstrap Support 8% 1/1. 51/6 34/. 52/.13 89/ /.1 49/ /. 1/1. 1/1. 73/ 85/.68 98/.84 emrc 98/.89 1/1. 96/.88 62/.17 96/.84 44/.3 37/.1 Supplementary Figure 11 Selection of genes whose bootstrap consensus trees have high average Bootstrap Support (avbs) or Tree Certainty (TC) has a large, positive effect on GSF and IC values of internodes of the yeast inferred by concatenation and emrc approaches. Each panel shows the yeast species inferred from concatenation analysis (left panel) and from extended majority rule consensus (emrc) analysis (right panel) following the selection of genes whose trees have high average bootstrap support (BS) or Tree Certainty (TC). All internodes of phylogenies inferred by concatenation received 1% bootstrap support unless otherwise indicated. Values near internodes of phylogenies inferred by emrc analysis correspond to gene support frequency and internode certainty, respectively. a, (left) and emrc (right) phylogenies of the 94 genes whose gene trees have average BS 6%. b, (left) and emrc (right) phylogenies of the 545 genes whose gene trees have average BS 7%. c, (left) and emrc (right) phylogenies of the 131 genes whose gene trees have average BS 8%. 18

19 RESEARCH d e f Using only the 94 genes whose bootstrap consensus trees have highest Tree Certainty 1/.97 43/4 32/.3 38/.9 66/.6 25/.2 31/.13 3/.2 1/.99 97/.92 55/.8 63/5 82/.62 emrc 93/.82 99/.98 9/.78 48/.11 92/.82 32/.1 3/.2 Using only the 545 genes whose bootstrap consensus trees have highest Tree Certainty 1/.98 49/.41 33/.1 42/.1 74/.65 26/.2 37/.18 33/.3 99/.98 98/.94 61/.12 71/.44 88/.7 emrc 98/.89 1/.98 95/.87 55/.17 96/.85 34/.2 31/.2 Using only the 131 genes whose bootstrap consensus trees have highest Tree Certainty 1/1. 51/.41 34/. 5/.9 89/.79 3/.2 47/.14 32/.1 1/1. 1/1. 75/2 84/.67 97/.84 1/1. emrc 98/.93 98/.93 64/1 97/.88 4/.3 37/.1 Supplementary Figure 11...continued. d, (left) and emrc (right) phylogenies of the 94 genes with the highest TC. e, (left) and emrc (right) phylogenies of the 545 genes with the highest TC. f, (left) and emrc (right) phylogenies of the 131 genes with the highest TC. 19

20 RESEARCH SUPPLEMENTARY INFORMATION a b Change in Gene Support Frequency for highly supported genes or slow evolving genes Change in GSF compared to default analysis Change in IC compared to default analysis Change in Internode Certainty for highly supported genes, or slow evolving genes Using only the 131 genes with average BS 8% Using only the 131 genes with the highest TC Using only the 1 slowest evolving genes Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Calb,Cdub Calb,Cdub,Ctro Cpar,Lelo Calb,Cdub,Cpar,Ctro,Lelo Calb,Cdub,Cpar,Ctro,Lelo,Psti Cgui,Dhan Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Kthe,Kwal Egos,Klac Kthe,Kwal,Sklu Egos,Klac,Kthe,Kwal,Sklu Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Candida Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Using only the 131 genes with average BS 8% Using only the 131 genes with the highest TC Using only with BS 8% Using only the 1 slowest evolving genes Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Saccharomyces Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Sbay,Scer,Skud,Smik,Spar Scer,Skud,Smik,Spar Scer,Smik,Spar Scer,Spar Supplementary Figure 12 Selecting highly supported genes or has a large, positive effect on GSF and IC values of internodes of the yeast. The X-axis shows the 2 present in the yeast suggested by concatenation analysis and the Y-axis the percent change in Gene Support Frequency (GSF) and Internode Certainty (IC) observed for each bipartition between the treatment (selection of highly supported genes or internodes) and the default analysis. Only GSF changes 3% and IC changes.3 are shown. The red bars correspond to changes in IC when using only the 131 genes with average bootstrap support 8%, the yellow bars correspond to changes in IC when using only the 131 genes with the highest Tree Certainty, the black bars correspond to changes in IC when using only those found in the bootstrap consensus trees of individual genes that had bootstrap support 8%, and the blue bars correspond to changes in IC when using only the 1 slowestevolving genes. a, Change in GSF for highly supported genes or slow evolving genes. b, Change in IC for highly supported genes, or slow evolving genes. 2

21 RESEARCH a c Using only that received 6 bootstrap support in individual gene tree analyses emrc Using only that received 8 bootstrap support in individual gene tree analyses emrc 18/.9 98/.99 25/.49 22/.15 48/.74 1/.2 18/ 16/.3 98/.99 94/.94 36/.8 37/.47 67/.74 1/.13 35/.8 94/.96 31/. 85/.89 97/.97 81/.88 15/.13 83/.87 12/.58 13/1 2/.8 96/1. 4/.1 1/9 6/.1 96/.99 2/.19 22/.69 49/.85 7/. 94/.99 75/.96 7/.94 2/ 71/.93 1/.12 b Using only that received 7 bootstrap support in individual gene tree analyses 98/1. 18/.51 14/.11 17/3 41/.77 7/.2 14/3 11/.1 97/.99 94/.95 3/.14 31/.54 59/.8 96/.99 emrc 82/.93 75/.93 27/.17 96/.89 12/.1 15/.1 d Internode Certainty Default analysis Kthe,Kwal Calb,Cdub Sbay,Scer,Skud,Smik,Spar Calb,Cdub,Cgui,Clus,Cpar,Ctro,Dhan,Lelo,Psti Calb,Cdub,Ctro Cpar,Lelo Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar,Zrou Using 6 BS in individual gene tree analyses Scer,Spar Calb,Cdub,Cpar,Ctro,Lelo Scer,Smik,Spar Scer,Skud,Smik,Spar Calb,Cdub,Cpar,Ctro,Lelo,Psti Kthe,Kwal,Sklu Egos,Klac Change in Internode Certainty when using that were highly supported in individual gene analyses Using 7 BS in individual gene tree analyses Using 8 BS in individual gene tree analyses Egos,Klac,Kthe,Kwal,Sklu Cgla,Sbay,Scas,Scer,Skud,Smik,Spar Sbay,Scas,Scer,Skud,Smik,Spar Calb,Cdub,Cgui,Cpar,Ctro,Dhan,Lelo,Psti Cgui,Dhan Cgla,Kpol,Sbay,Scas,Scer,Skud,Smik,Spar Supplementary Figure 13 Selection of highly supported from the bootstrap consensus trees of individual genes has a large, positive effect on the IC values of internodes of the yeast inferred by the emrc approach. The first three panels show the yeast species inferred from extended majority rule consensus (emrc) analysis following the selection of that had high bootstrap support (BS) in the bootstrap consensus trees of individual genes. Values near internodes correspond to the percentage of bootstrap consensus trees of individual genes in which this specific bipartition received high BS and to internode certainty (IC), respectively. a, The emrc inferred from selecting that had BS 6% in individual gene analyses. b, The emrc inferred from selecting that had BS 7% in individual gene analyses. c, The emrc inferred from selecting that had BS 8% in individual gene analyses. d, Plot that illustrates the change in IC of internodes relative to the values obtained in the default analysis associated with the use of that had high bootstrap support (BS) in the bootstrap consensus trees of individual genes. Each line of different color depicts the IC value obtained for a given internode in the default analysis (Fig. 1a), when using only that had BS 6%, BS 7%, and BS 8%. 21

22 RESEARCH SUPPLEMENTARY INFORMATION a Internode Certainty Change in Internode Certainty of the Vertebrate when using highly-supported genes and Default analysis 6 BS 7 BS 8 BS Using only those genes whose bootstrap consensus trees had XX average bootstrap support (BS) 6 BS 7 BS 8 BS Using only that received XX bootstrap support (BS) in individual gene tree analyses Drer,Gacu,Olat,Tnig,Trub Tnig,Trub Mmus,Rnor Gacu,Olat,Tnig,Trub Drer,Gacu,Ggal,Mdom,Olat,Tnig,Trub,Xtro Drer,Gacu,Ggal,Olat,Tnig,Trub,Xtro Hsap,Mmul,Ppyg,Ptro Drer,Gacu,Olat,Tnig,Trub,Xtro Hsap,Ptro Hsap,Ppyg,Ptro Cpor,Mmus,Rnor Gacu,Tnig,Trub Btau,Cfam,Ecab Cpor,Hsap,Mmul,Mmus,Ppyg,Ptro,Rnor b Change in Internode Certainty of the Metazoan when using highly-supported genes and Internode Certainty 1 Drer,Ggal,Hsap,Mmus,Xtro Hsap,Mmus Ggal,Hsap,Mmus.8 Amel,Bmor,Dmel,Tcas Ggal,Hsap,Mmus,Xtro Amel,Bmor,Dmel,Dpul,Tcas.6 Amel,Bmor,Dmel,Dpul,Isca,Tcas Bmor,Dmel Nvec,Tadh.4 Bmor,Dmel,Tcas Hrob,Lgig Cele,Sman Cint,Drer,Ggal,Hsap,Mmus,Xtro Mbre,Nvec,Tadh Bflo,Cint,Drer,Ggal,Hsap,Mmus,Xtro Amel,Bmor,Cele,Dmel,Dpul,Hrob,Isca,Lgig,Sman,Tcas Default analysis 4 BS 5 BS 6 BS Using only those genes whose bootstrap consensus trees had XX average bootstrap support (BS) 6 BS 7 BS 8 BS Using only that received XX bootstrap support (BS) in individual gene tree analyses Supplementary Figure 14 Selection of highly supported genes and has a large, positive effect on IC values of internodes of the vertebrate and metazoan phylogenies. a, Plot that illustrates the change in IC of internodes of the vertebrate relative to the values obtained in the default analysis associated with the use of genes whose bootstrap consensus trees have high average bootstrap support (BS) or with the use of that had high BS in the bootstrap consensus trees of individual genes. Each line of different color depicts the IC value obtained for a given internode in the default analysis (Supplementary Fig. S1b), when using only genes with average BS 6%, BS 7%, and BS 8%, as well as when using only that had BS 6%, BS 7%, and BS 8%. b, Plot that illustrates the change in IC of internodes of the metazoan relative to the values obtained in the default analysis associated with the use of genes whose bootstrap consensus trees have high average bootstrap support (BS) or with the use of that had high BS in the bootstrap consensus trees of individual genes. Each line of different color depicts the IC value obtained for a given internode in the default analysis (Supplementary Fig. S1e), when using only genes with average BS 4%, BS 5%, and BS 6%, as well as when using only that had BS 6%, BS 7%, and BS 8%. The slight differences in the pattern of change of IC in the metazoan dataset relative to the other two datasets is likely to be due to the much smaller number of metazoan genes in the dataset and their lower average bootstrap support, which will result in smaller sets of genes and with strong phylogenetic signal. 22

23 RESEARCH Kluyveromyces thermotolerans (Kthe) Candida parapsilosis (Cpar) Lodderomyces elongisporus (Lelo) Candida dubliniensis (Cdub).1 Zygosacharomyces rouxii (Zrou) Saccharomyces castellii (Scas) Kluyveromyces polysporus (Kpol) Saccharomyces sensu stricto clade Candida glabrata Cgla) Supplementary Figure 15 The phylogenetic consensus network that describes the 1,7 yeast gene histories. The consensus network inferred using the 1,7 maximum likelihood gene trees under the median network construction algorithm in the SplitsTree4 software. Boxes in the network denote internodes that harbor significant conflict, with the length of each branch in each box being proportional to the number of GTs that support it. Only branches that are present in at least 1% of the GTs are shown in the network..1 S. mikatae (Smik) S.cerevisiae (Scer) S. paradoxus (Spar) S. kudriavzevii (Skud) S. bayanus (Sbay) 23

24 RESEARCH SUPPLEMENTARY INFORMATION 95/.9 99/.96 41/2 31/.4 36/.9 62/.59 24/.2 29/.12 29/.2 45/.11 29/.1 29/.2 Zygosacharomyces rouxii (Zrou) 99/.97 86/.76 52/.6 6/ 77/.54 9/.78 98/.95 89/.77 Supplementary Figure 16 Supported, weakly supported, and unresolved internodes in the yeast. Values near internodes correspond to gene support frequency and internode certainty, respectively calculated from the 1,7 yeast gene histories. Note that the validity of certain internodes marked as unresolved is supported by independent data (e.g., rare genomic changes). Supported Weakly supported Unresolved 24

25 RESEARCH a The yeast species inferred using the STAR species tree method b STAR internode length /2.68 1/ /.53 89/.45 1/ /2 83/.4 89/.42 77/5 1/ /9 96/.53 99/.84 1/1.3 1/.98 1/ /.58 1/.94 68/4 77/4 4. STAR internode length versus GSF r =.83 r =.88 STAR internode length STAR internode length versus IC Gene Support Frequency Internode Certainty Supplementary Figure 17 The yeast inferred using a species tree method that accounts for variation between the 1,7 gene histories is highly supported and has extremely short internodes whose coalescent unit lengths are highly correlated with gene support frequency and internode certainty values. Using the 1,7 gene dataset, we inferred a yeast species under the coalescent model and average ranks of gene coalescence times, as implemented in the STAR species tree method. a, The yeast species under the coalescent. Values near internodes correspond to bootstrap support and internode length in coalescence units, respectively. The inferred topology is identical to the shown in Figure 1a, except with respect to the placement of Candida lusitaniae. b, The lengths of internodes in the inferred using the STAR species tree method, measured in average coalescent units, is highly correlated with internodes Gene Support Frequency (left panel) and Internode Certainty (right panel) values. The strength of each correlation is indicated by r, Pearson s correlation coefficient. 25

Comparing Genomes! Homologies and Families! Sequence Alignments!

Comparing Genomes! Homologies and Families! Sequence Alignments! Comparing Genomes! Homologies and Families! Sequence Alignments! Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species

More information

Inferring Phylogenies from RAD Sequence Data

Inferring Phylogenies from RAD Sequence Data Inferring Phylogenies from RAD Sequence Data Benjamin E. R. Rubin 1,2 *, Richard H. Ree 3, Corrie S. Moreau 2 1 Committee on Evolutionary Biology, University of Chicago, Chicago, Illinois, United States

More information

The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection

The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection Yun Yu 1, James H. Degnan 2,3, Luay Nakhleh 1 * 1 Department of Computer Science, Rice

More information

Quantitative and qualitative analyses. of in-paralogs

Quantitative and qualitative analyses. of in-paralogs Quantitative and qualitative analyses of in-paralogs Dissertation zur Erlangung des naturwissentschaflichen Doktorgrades der Bayerischen Julius-Maximilians Universität Würzburg vorgelegt von Stanislav

More information

Sheet1. Page 1. protein

Sheet1. Page 1. protein build 1 version 1 Ab initio model Ab initio GPIPE/6085/1.1/gnomon_prot build 1 version 1 Build GPIPE/6085/1.1/ Acyrthosiphon pisum build 2 version 1 Build GPIPE/7029/2.1/ Acyrthosiphon pisum build 2.1

More information

Camello, a novel family of Histone Acetyltransferases that acetylate histone H4 and is essential for zebrafish development

Camello, a novel family of Histone Acetyltransferases that acetylate histone H4 and is essential for zebrafish development Supplementary Information: Camello, a novel family of Histone Acetyltransferases that acetylate histone H4 and is essential for zebrafish development Krishanpal Karmodiya 1, Krishanpal Anamika 1,2, Vijaykumar

More information

Species Tree Inference by Minimizing Deep Coalescences

Species Tree Inference by Minimizing Deep Coalescences Species Tree Inference by Minimizing Deep Coalescences Cuong Than, Luay Nakhleh* Department of Computer Science, Rice University, Houston, Texas, United States of America Abstract In a 1997 seminal paper,

More information

Phylogenetics in the Age of Genomics: Prospects and Challenges

Phylogenetics in the Age of Genomics: Prospects and Challenges Phylogenetics in the Age of Genomics: Prospects and Challenges Antonis Rokas Department of Biological Sciences, Vanderbilt University http://as.vanderbilt.edu/rokaslab http://pubmed2wordle.appspot.com/

More information

Phylogenomics: the beginning of incongruence?

Phylogenomics: the beginning of incongruence? Phylogenomics: the beginning of incongruence? Olivier Jeffroy, Henner Brinkmann, Frédéric Delsuc, Hervé Philippe To cite this version: Olivier Jeffroy, Henner Brinkmann, Frédéric Delsuc, Hervé Philippe.

More information

New Universal Rules of Eukaryotic Translation Initiation Fidelity

New Universal Rules of Eukaryotic Translation Initiation Fidelity New Universal Rules of Eukaryotic Translation Initiation Fidelity Hadas Zur 1, Tamir Tuller 2,3 * 1 The Blavatnik School of Computer Science, Tel Aviv University, Tel-Aviv, Israel, 2 Department of Biomedical

More information

Marine medaka ATP-binding cassette (ABC) superfamily and new insight into teleost Abch nomenclature

Marine medaka ATP-binding cassette (ABC) superfamily and new insight into teleost Abch nomenclature Supplementary file for: Marine medaka ATP-binding cassette (ABC) superfamily and new insight into teleost Abch nomenclature Chang-Bum Jeong a,b,#, Bo-Mi Kim a,#, Hye-Min Kang a, Ik-Young Choi c, Jae-Sung

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Museum of Natural History, New York, New York, USA

Museum of Natural History, New York, New York, USA This article was downloaded by:[american Museum of Natural History] On: 3 July 2008 Access Details: [subscription number 767966983] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales

More information

Biased amino acid composition in warm-blooded animals

Biased amino acid composition in warm-blooded animals Biased amino acid composition in warm-blooded animals Guang-Zhong Wang and Martin J. Lercher Bioinformatics group, Heinrich-Heine-University, Düsseldorf, Germany Among eubacteria and archeabacteria, amino

More information

A novel laminin β gene BmLanB1-w regulates wing-specific cell adhesion in silkworm, Bombyx mori

A novel laminin β gene BmLanB1-w regulates wing-specific cell adhesion in silkworm, Bombyx mori Supplementary information A novel laminin β gene BmLanB1-w regulates wing-specific cell adhesion in silkworm, Bombyx mori Xiaoling Tong*, Songzhen He *, Jun Chen, Hai Hu, Zhonghuai Xiang, Cheng Lu and

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

Combination of X-ray crystallography, SAXS and DEER to obtain the structure of the FnIII-3,4 domains of integrin α6β4

Combination of X-ray crystallography, SAXS and DEER to obtain the structure of the FnIII-3,4 domains of integrin α6β4 Acta Cryst. (2015). D71, 969-985, doi:10.1107/s1399004715002485 Supporting information Volume 71 (2015) Supporting information for article: Combination of X-ray crystallography, SAXS and DEER to obtain

More information

Supplemental Figure 1.

Supplemental Figure 1. Supplemental Material: Annu. Rev. Genet. 2015. 49:213 42 doi: 10.1146/annurev-genet-120213-092023 A Uniform System for the Annotation of Vertebrate microrna Genes and the Evolution of the Human micrornaome

More information

Anatomy of a species tree

Anatomy of a species tree Anatomy of a species tree T 1 Size of current and ancestral Populations (N) N Confidence in branches of species tree t/2n = 1 coalescent unit T 2 Branch lengths and divergence times of species & populations

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors

Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors Genes 2011, 2, 449-501; doi:10.3390/genes2030449 Article OPEN ACCESS genes ISSN 2073-4425 www.mdpi.com/journal/genes Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene

More information

Supplementary Information for

Supplementary Information for Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford

More information

Evolutionary Rate Covariation of Domain Families

Evolutionary Rate Covariation of Domain Families Evolutionary Rate Covariation of Domain Families Author: Brandon Jernigan A Thesis Submitted to the Department of Chemistry and Biochemistry in Partial Fulfillment of the Bachelors of Science Degree in

More information

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny Phylogenetics and chromosomal synteny of the GATAs 1273 GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny CHUNJIANG HE, HANHUA CHENG* and RONGJIA ZHOU* Department

More information

Fine-Scale Phylogenetic Discordance across the House Mouse Genome

Fine-Scale Phylogenetic Discordance across the House Mouse Genome Fine-Scale Phylogenetic Discordance across the House Mouse Genome Michael A. White 1,Cécile Ané 2,3, Colin N. Dewey 4,5,6, Bret R. Larget 2,3, Bret A. Payseur 1 * 1 Laboratory of Genetics, University of

More information

Supporting Information

Supporting Information Supporting Information Systematic analyses reveal uniqueness and origin of the CFEM domain in fungi Zhen-Na Zhang 1,2,, Qin-Yi Wu 1,, Gui-Zhi Zhang 2, Yue-Yan Zhu 1, Robert W. Murphy 3, Zhen Liu 3, * and

More information

Master Biomedizin ) UCSC & UniProt 2) Homology 3) MSA 4) Phylogeny. Pablo Mier

Master Biomedizin ) UCSC & UniProt 2) Homology 3) MSA 4) Phylogeny. Pablo Mier Master Biomedizin 2018 1) UCSC & UniProt 2) Homology 3) MSA 4) 1 12 a. All of the sequences in file1.fasta (https://cbdm.uni-mainz.de/mb18/) are homologs. How many groups of orthologs would you say there

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting

Coalescent Histories on Phylogenetic Networks and Detection of Hybridization Despite Incomplete Lineage Sorting Syst. Biol. 60(2):138 149, 2011 c The Author(s) 2011. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com

More information

Pyrobayes: an improved base caller for SNP discovery in pyrosequences

Pyrobayes: an improved base caller for SNP discovery in pyrosequences Pyrobayes: an improved base caller for SNP discovery in pyrosequences Aaron R Quinlan, Donald A Stewart, Michael P Strömberg & Gábor T Marth Supplementary figures and text: Supplementary Figure 1. The

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION doi:10.1038/nature11510 Supplementary Table 1. Indel Index Removal Gene Number of Starting Sequences Number of Final Sequences Percentage of Sequences Removed based on the Indel

More information

Application of new distance matrix to phylogenetic tree construction

Application of new distance matrix to phylogenetic tree construction Application of new distance matrix to phylogenetic tree construction P.V.Lakshmi Computer Science & Engg Dept GITAM Institute of Technology GITAM University Andhra Pradesh India Allam Appa Rao Jawaharlal

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species

Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species Zhiqiang Li 1 and Peter Z. Revesz 1,a 1 Department of Computer Science, University of Nebraska-Lincoln, Lincoln, NE,

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species DNA sequences II Analyses of multiple sequence data datasets, incongruence tests, gene trees vs. species tree reconstruction, networks, detection of hybrid species DNA sequences II Test of congruence of

More information

Genome-scale approaches to resolving incongruence in molecular phylogenies

Genome-scale approaches to resolving incongruence in molecular phylogenies Genome-scale approaches to resolving incongruence in molecular phylogenies Antonis Rokas*, Barry L. Williams*, Nicole King & Sean B. Carroll Howard Hughes Medical Institute, Laboratory of Molecular Biology,

More information

Species Level Deconvolution of Metagenome Assemblies with Hi C Based Contact Probability Maps

Species Level Deconvolution of Metagenome Assemblies with Hi C Based Contact Probability Maps Species Level Deconvolution of Metagenome Assemblies with Hi C Based Contact Probability Maps Joshua N. Burton*, Ivan Liachko*, Maitreya J. Dunham, Jay Shendure Department of Genome Sciences, University

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support Siavash Mirarab University of California, San Diego Joint work with Tandy Warnow Erfan Sayyari

More information

Evolutionary Bioinformatics

Evolutionary Bioinformatics Open Access: Full open access to this and thousands of other papers at http://www.la-press.com. Evolutionary Bioinformatics TreSpEx Detection of Misleading Signal in Phylogenetic Reconstructions Based

More information

Bayesian Estimation of Concordance among Gene Trees

Bayesian Estimation of Concordance among Gene Trees University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Faculty Publications in the Biological Sciences Papers in the Biological Sciences 2007 Bayesian Estimation of Concordance

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature12414 Supplementary Figure 1 a b Illumina 521M reads, single-end/76 bp Illumina 354M reads, paired-end/ bp raw read processing Trinity assembly primary assembly: 189,820 transcripts Frequency

More information

The nematode arthropod clade revisited: phylogenomic analyses from ribosomal protein genes misled by shared evolutionary biases

The nematode arthropod clade revisited: phylogenomic analyses from ribosomal protein genes misled by shared evolutionary biases Cladistics Cladistics 23 (27) 3 44./j.96-3.26.32.x The nematode arthropod clade revisited: phylogenomic analyses from ribosomal protein genes misled by shared evolutionary biases Stuart J. Longhorn,2,3,

More information

Evolution by duplication

Evolution by duplication 6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Supplementary Figure 1 Histogram of the marginal probabilities of the ancestral sequence reconstruction without gaps (insertions and deletions).

Supplementary Figure 1 Histogram of the marginal probabilities of the ancestral sequence reconstruction without gaps (insertions and deletions). Supplementary Figure 1 Histogram of the marginal probabilities of the ancestral sequence reconstruction without gaps (insertions and deletions). Supplementary Figure 2 Marginal probabilities of the ancestral

More information

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University PhyloNet Yun Yu Department of Computer Science Bioinformatics Group Rice University yy9@rice.edu Symposium And Software School 2016 The University Of Texas At Austin Installation System requirement: Java

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Evolutionary Genomics. Antonis Rokas Department of Biological Sciences, Vanderbilt University

Evolutionary Genomics. Antonis Rokas Department of Biological Sciences, Vanderbilt University Evolutionary Genomics Antonis Rokas Department of Biological Sciences, Vanderbilt University http://www.rokaslab.org @RokasLab Lecture Outline Introduction to evolutionary genomics Phylogenomics, act 1

More information

Supplementary Materials for

Supplementary Materials for advances.sciencemag.org/cgi/content/full/1/8/e1500527/dc1 Supplementary Materials for A phylogenomic data-driven exploration of viral origins and evolution The PDF file includes: Arshan Nasir and Gustavo

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

A Kolmogorov-Smirnov test for the molecular clock on Bayesian ensembles of phylogenies

A Kolmogorov-Smirnov test for the molecular clock on Bayesian ensembles of phylogenies A Kolmogorov-Smirnov test for the molecular clock on Bayesian ensembles of phylogenies Fernando Antoneli 1,2,, Fernando M. Passos 1,3,, Luciano R. Lopes 1,2 and Marcelo R. S. Briones 1,2,* 1 Laboratório

More information

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Many of the slides that I ll use have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Bioinformatics Bioinf and E ormatics volu and E tionar volu y Genomics Genom 5/2/2011 1

Bioinformatics Bioinf and E ormatics volu and E tionar volu y Genomics Genom 5/2/2011 1 Bioinformatics and Evolutionary Genomics 5/2/2011 1 Fritz-Laylin et al. cell 2010 Gene Trees, Gene Duplications, and Orthology How to make trees Bootstrap Interpreting trees duplications vs speciations

More information

Symmetric Tree, ClustalW. Divergence x 0.5 Divergence x 1 Divergence x 2. Alignment length

Symmetric Tree, ClustalW. Divergence x 0.5 Divergence x 1 Divergence x 2. Alignment length ONLINE APPENDIX Talavera, G., and Castresana, J. (). Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology, -. Symmetric

More information

The MANTiS Manual. Contents. MANTiS Version 1.1

The MANTiS Manual. Contents. MANTiS Version 1.1 The MANTiS Manual MANTiS Version 1.1 Contents Connection to the MANTiS database... 2 Memory settings... 2 Main functionalities... 2 Character Mapping View... 4 Genome content View... 5 Biological processes

More information

Presentation by Julie Hudson MAT5313

Presentation by Julie Hudson MAT5313 Proc. Natl. Acad. Sci. USA Vol. 89, pp. 6575-6579, July 1992 Evolution Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome (genomics/algorithm/inversions/edit distance/conserved

More information

Hillis DM Inferring complex phylogenies. Nature 383:

Hillis DM Inferring complex phylogenies. Nature 383: Hillis DM. 1996. Inferring complex phylogenies. Nature 383: 130-131. Triangles: parsimony Squares: neighbor-joining (under specified model) Circles: UPGMA Designing your phylogenetic analysis Choice of

More information

A Functional Selection Model Explains Evolutionary Robustness Despite Plasticity in Regulatory Networks

A Functional Selection Model Explains Evolutionary Robustness Despite Plasticity in Regulatory Networks Functional Selection Model Explains Evolutionary Robustness espite Plasticity in Regulatory Networks The Harvard community has made this article openly available. Please share how this access benefits

More information

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1 Smith et al. American Journal of Botany 98(3):404-414. 2011. Data Supplement S1 page 1 Smith, Stephen A., Jeremy M. Beaulieu, Alexandros Stamatakis, and Michael J. Donoghue. 2011. Understanding angiosperm

More information

Comparative genome analysis across a kingdom of eukaryotic organisms: Specialization and diversification in the Fungi

Comparative genome analysis across a kingdom of eukaryotic organisms: Specialization and diversification in the Fungi Resource Comparative genome analysis across a kingdom of eukaryotic organisms: Specialization and diversification in the Fungi Michael J. Cornell, 1,2 Intikhab Alam, 1 Darren M. Soanes, 3 Han Min Wong,

More information

1 ATGGGTCTC 2 ATGAGTCTC

1 ATGGGTCTC 2 ATGAGTCTC We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

More information

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13

Phylogenomics. Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University. Tuesday, January 29, 13 Phylogenomics Jeffrey P. Townsend Department of Ecology and Evolutionary Biology Yale University How may we improve our inferences? How may we improve our inferences? Inferences Data How may we improve

More information

The Lophotrochozoan TGF-b signalling cassette - diversification and conservation in a key signalling pathway

The Lophotrochozoan TGF-b signalling cassette - diversification and conservation in a key signalling pathway Int. J. Dev. Biol. 58: 533-549 (204) doi: 0.387/ijdb.40080nk www.intjdevbiol.com The Lophotrochozoan TGF-b signalling cassette - diversification and conservation in a key signalling pathway NATHAN J. KENNY,3,

More information

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons Letter to the Editor Slow Rates of Molecular Evolution Temperature Hypotheses in Birds and the Metabolic Rate and Body David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons *Department

More information

RNA polymerase II clustering through carboxyterminal domain phase separation

RNA polymerase II clustering through carboxyterminal domain phase separation SUPPLEMENTARY INFORMATION Articles https://doi.org/10.1038/s41594-018-0112-y In the format provided by the authors and unedited. RNA polymerase II clustering through carboxyterminal domain phase separation

More information

Cryo-EM data collection, refinement and validation statistics

Cryo-EM data collection, refinement and validation statistics 1 Table S1 Cryo-EM data collection, refinement and validation statistics Data collection and processing CPSF-160 WDR33 (EMDB-7114) (PDB 6BM0) CPSF-160 WDR33 (EMDB-7113) (PDB 6BLY) CPSF-160 WDR33 CPSF-30

More information

Synonymous Codon Substitution Matrices

Synonymous Codon Substitution Matrices Synonymous Codon Substitution Matrices Adrian Schneider, Gaston H. Gonnet, and Gina M. Cannarozzi Computational Biology Research Group, Institute for Computational Science, ETH Zürich, Universitätstrasse

More information

Multiple Sequence Alignments

Multiple Sequence Alignments Multiple Sequence Alignments...... Elements of Bioinformatics Spring, 2003 Tom Carter http://astarte.csustan.edu/ tom/ March, 2003 1 Sequence Alignments Often, we would like to make direct comparisons

More information

Emergence of Xin Demarcates a Key Innovation in Heart Evolution

Emergence of Xin Demarcates a Key Innovation in Heart Evolution Emergence of Xin Demarcates a Key Innovation in Heart Evolution Shaun E. Grosskurth, Debashish Bhattacharya, Qinchuan Wang, Jim Jung-Ching Lin* Department of Biology, University of Iowa, Iowa City, Iowa,

More information

Consensus Methods. * You are only responsible for the first two

Consensus Methods. * You are only responsible for the first two Consensus Trees * consensus trees reconcile clades from different trees * consensus is a conservative estimate of phylogeny that emphasizes points of agreement * philosophy: agreement among data sets is

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Bioinformatics Report Branchiostoma lanceolatum dopamine D 1 / receptor protein phylogenetic analysis. Alanna Lewis

Bioinformatics Report Branchiostoma lanceolatum dopamine D 1 / receptor protein phylogenetic analysis. Alanna Lewis Bioinformatics Report Branchiostoma lanceolatum dopamine D 1 / receptor protein phylogenetic analysis Alanna Lewis 0 Abstract: Dopamine is an essential neurotransmitter for many species of chordates. The

More information

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides hanks to Paul Lewis, Jeff horne, and Joe Felsenstein for the use of slides Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPM infers a tree from a distance

More information

Supplementary Figure 1

Supplementary Figure 1 Supplementary Figure Input Species Tree A Genomic Sequences X Y B Gene Order (optional) A A2 A3 C Pre-Processing A B A2 B2 A3 B3 Identify Orthogroups SYNERGY algorithm Output Y Y8 Y2 Y4 X X8 Bn C B A B8

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/312/5780/1653/dc1 Supporting Online Material for The Xist RNA Gene Evolved in Eutherians by Pseudogenization of a Protein-Coding Gene Laurent Duret,* Corinne Chureau,

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature11541 Supplementary Figure 1. Genome-wide analysis of the phylogenetic relationships Phylogenetic tree of NDH-2s from Prokaryota, Protista, Plantae, Fungi and Metazoa. The WWW.NATURE.COM/NATURE

More information

Contentious relationships in phylogenomic studies can be driven by a handful of genes

Contentious relationships in phylogenomic studies can be driven by a handful of genes PUBLISHED: 1 APRIL 217 VOLUME: 1 ARTICLE NUMBER: 126 Contentious relationships in phylogenomic studies can be driven by a handful of genes Xing-Xing Shen 1, Chris Todd Hittinger 2 and Antonis Rokas 1 *

More information

Exploring metazoan evolution through dynamic and holistic changes in protein families and domains

Exploring metazoan evolution through dynamic and holistic changes in protein families and domains Wang et al. BMC Evolutionary Biology 2012, 12:138 RESEARCH ARTICLE Open Access Exploring metazoan evolution through dynamic and holistic changes in protein families and domains Zhengyuan Wang 1, Dante

More information

MegAlign Pro Pairwise Alignment Tutorials

MegAlign Pro Pairwise Alignment Tutorials MegAlign Pro Pairwise Alignment Tutorials All demo data for the following tutorials can be found in the MegAlignProAlignments.zip archive here. Tutorial 1: Multiple versus pairwise alignments 1. Extract

More information

Procedure to Create NCBI KOGS

Procedure to Create NCBI KOGS Procedure to Create NCBI KOGS full details in: Tatusov et al (2003) BMC Bioinformatics 4:41. 1. Detect and mask typical repetitive domains Reason: masking prevents spurious lumping of non-orthologs based

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family Jieming Shen 1,2 and Hugh B. Nicholas, Jr. 3 1 Bioengineering and Bioinformatics Summer

More information

Fast coalescent-based branch support using local quartet frequencies

Fast coalescent-based branch support using local quartet frequencies Fast coalescent-based branch support using local quartet frequencies Molecular Biology and Evolution (2016) 33 (7): 1654 68 Erfan Sayyari, Siavash Mirarab University of California, San Diego (ECE) anzee

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

In Search of the Biological Significance of Modular Structures in Protein Networks

In Search of the Biological Significance of Modular Structures in Protein Networks In Search of the Biological Significance of Modular Structures in Protein Networks Zhi Wang, Jianzhi Zhang * Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan,

More information

Molecular Coevolution of the Vertebrate Cytochrome c 1 and Rieske Iron Sulfur Protein in the Cytochrome bc 1 Complex

Molecular Coevolution of the Vertebrate Cytochrome c 1 and Rieske Iron Sulfur Protein in the Cytochrome bc 1 Complex Molecular Coevolution of the Vertebrate Cytochrome c 1 and Rieske Iron Sulfur Protein in the Cytochrome bc 1 Complex Kimberly Baer *, David McClellan Department of Integrative Biology, Brigham Young University,

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

The impact of sequence parameter values on phylogenetic accuracy

The impact of sequence parameter values on phylogenetic accuracy eissn: 09748369, www.biolmedonline.com The impact of sequence parameter values on phylogenetic accuracy Bhakti Dwivedi 1, *Sudhindra R Gadagkar 1,2 1 Department of Biology, University of Dayton, Dayton,

More information

Paulo Jorge Dias 1,2 and Isabel Sá-Correia 1,2*

Paulo Jorge Dias 1,2 and Isabel Sá-Correia 1,2* Dias and Sá-Correia BMC Genomics 2013, 14:901 RESEARCH ARTICLE Open Access The drug:h + antiporters of family 2 (DHA2), siderophore transporters (ARN) and glutathione: H + antiporters (GEX) have a common

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Evolution takes place in an ecological context that shapes the

Evolution takes place in an ecological context that shapes the Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts Chris Todd Hittinger, Antonis Rokas, and Sean B. Carroll Howard Hughes Medical Institute and Laboratories of

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Zn 2+ -binding sites in USP18. (a) The two molecules of USP18 present in the asymmetric unit are shown. Chain A is shown in blue, chain B in green. Bound Zn 2+ ions are shown as

More information

I. Short Answer Questions DO ALL QUESTIONS

I. Short Answer Questions DO ALL QUESTIONS EVOLUTION 313 FINAL EXAM Part 1 Saturday, 7 May 2005 page 1 I. Short Answer Questions DO ALL QUESTIONS SAQ #1. Please state and BRIEFLY explain the major objectives of this course in evolution. Recall

More information