Phylogenomics Resolves The Timing And Pattern Of Insect Evolution. - Supplementary File Archives - This README was written in June 2014 For any questions regarding the nature of our data, please contact Bernhard Misof (b.misof.zfmk AT uni-bonn.de) Misof et al. 2014: Phylogenomics resolves the timing and pattern of insect evolution. NOTE: In several files of these archives, some species names are different compared to the Supplementary Online Material and the original publication due to identification updates. The names in the publication are those which are valid. Here we provide a dictionary to accommodate these identification/name updates. Name in analyses files (outdated) Name in Suppl. Material Online/Manuscript (up to date) Explanation Sminthurus_vir_nig Sminthurus viridis Sample contains S. viridis and S. nigromaculatus. It is not clear whether S. nigromaculatus is a valid species or a synonym. Therefore, we consider this as S. viridis, submitted to NCBI as S. viridis. Pogonognathellus_lon_fla Pogonognathellus spp. Species mixture of P. longicornis and flavescens, listed as Pogonognathellus spp., submitted to NCBI as Pogonognathellus sp. Cheumatopsyche_sp Annulipalpia chimera Accidental mixture of 2 species both belonging to monophyletic Annulipalpia; listed as Annulipalpia chimera, submitted to NCBI as Annulipalpia sp. Hydroptilidae_sp Hydroptila spp. Species mixture of H. actia and H. argosa, listed as Hydroptila spp., submitted to NCBI as Hydroptila sp. AD-2013.
Eriocrania_subpurpurella Dyseriocrania subpurpurella Changed to valid name Dyseriocrania subpurpurella. Parides_arcas Parides eurimedes Changed to valid name Parides eurimedes. Trichocera_fuscata Trichocera saltator Changed to valid name Trichocera saltator. Cryptocercus_sp Cryptocercus wrighti Identified as Cryptocercus wrighti during the analyses process. Nannochorista_sp Nannochorista philpotti Identified as Nannochorista philpotti during the analyses process. Baetis pumilus Baetis sp. Identification not possible, changed to Baetis sp. Dichochrysa prasina Pseudomallada prasinus Changed to valid name Pseudomallada prasinus. MATERIALS AND METHODS Supplementary Archive 1. Directory including the ortholog set including 1,478 ortholog groups of the 12 reference species: alignments (FASTA format) serving as input for the profile Hidden Markov Models (phmms), generated phmms, and BLAST databases generated from the official gene sets for the reciprocal BLAST search (= ready to use for HaMStRad). Supplementary_Archive_1.tar.gz [236 MB] Supplementary Archive 2. Directory including (refined) multiple sequence alignments (MSAs) (not masked) of 1,478 ortholog groups (OGs) on amino acid level and corresponding nucleotide MSAs after removal of outliers. Supplementary_Archive_2.tar.gz [110 MB] Supplementary Archive 3. Directories including coordinates of the annotated Pfam-A and Pfam-B domains on amino acid (Pfam_coordinates_aa/) and on nucleotide level (Pfam_coordinates_nuc/) for each gene separately (amino acid level: *.aa_coords.txt; nucleotide level: *.nuc_coords.txt). Supplementary_Archive_3.tar.gz [399 KB] Supplementary Archive 4. Directory including MSAs of 85 meta-partitions (PHYLIP format) extracted from supermatrix C and used for estimating divergence times. File S4. Models for the analyses with BEAST v1.8 (supermatrix C).
Statistics. Supplementary_Archive_4.tar [8.5 MB] SUPPLEMENTARY TEXT Supplementary Archive 5. File S5: List of identified outliers after refinement of the multiple sequence alignments of all 1,478 OGs. IDs of the OGs correspond to OrthoDB 5.0. Abbreviations of reference species correspond to abbreviations used in the official gene set releases. a) List of 472 multiple sequence alignments of the OGs containing outlier amino acid transcripts after the alignment refinement. b) List of multiple sequence alignments of OGs containing outlier amino acid transcripts of the reference species after alignment refinement. Log files of identified outliers prior and after alignment refinement. Supplementary_Archive_5.tar.gz [2.6 MB] Supplementary Archive 6. Directory including the annotation of protein domains using the Pfam database. Input files for analyses with PartitionFinder using different starting schemes (OGs versus clans/domains/voids) for supermatrix A. Supplementary_Archive_6.tar.gz [10 MB] Supplementary Archive 7. Directory including Aliscore output list for each gene with positions suggested to exclude from further analyses; 2 subdirectories with 1,478 files each; amino acid level (aa_aliscore_lists/): *aa.fas_list_random.txt; nucleotide level (nt_aliscore_lists/): *nt.fas_list_random.txt). Directories including MSAs of the 1,478 OSGs on amino acid level (aa_masked/) and nucleotide level (nt_masked/): ambiguously aligned regions had been removed (i.e., masked alignments); gappy ends were filled with 'X' or 'N' respectively (see Supplementary Text, Chapter 2.3). Supplementary_Archive_7.tar [72 MB] Supplementary Archive 8. Directory (supermatrices_partitions/) including: Supermatrix A, B, C and D (amino acid level, FASTA format). Partition schemes for supermatrix A (OGs versus protein clans, domains, voids: *.partitions). Partition schemes for supermatrix B and C prior to Partitionfinder (*.partitions). Supplementary_Archive_8.tar.gz [101.6 MB] Supplementary Archive 9. Directory (supermatrices_fclm/) including:
Original supermatrices generated for testing the 12 selected hypotheses using Four-cluster Likelihood Mapping (FASTA format), and respective partition file(s). Subdirectory supermatrix_c_fclm: 12 supermatrices, 1 partition file (similar for all matrices) Subdirectory supermatrix_d_fclm: 12 supermatrices, 12 partition files. Supplementary_Archive_9.tar.gz [167 MB] Supplementary Archive 10. File S6. Summary of symmetry tests of pairwise sequence comparisons for the "SRH" sub-alignment (supermatrix D) derived from supermatrix C. File S7. Summary of symmetry tests of pairwise sequence comparisons for the "non-srh" sub-alignment derived from supermatrix C. File S8. Comparison of p-values based on the pairwise sequences comparisons (Bowker's test) from the "SRH" (supermatrix D) and "non-srh" subalignment derived from supermatrix C. Supplementary_Archive_10.tar.gz [3.2 MB] Supplementary Archive 11. File S9, S10. Starting scheme (1,478 data blocks) and best partition scheme (AICc, 727 meta-partitions) for supermatrix A based on orthologous genes. File S11, S12. Starting scheme (2,673 data blocks) and best partition scheme (AICc, 821 meta-partitions) for supermatrix A based on clans, protein domains File S13, S14. Starting scheme (2,263 data blocks) and best partition scheme (AICc, 770 meta-partitions) for supermatrix B based on clans, protein domains File S15, S16. Starting scheme (1,240 data blocks) and best partition scheme (AICc, 479 meta-partitions) for supermatrix C based on clans, protein domains Supplementary_Archive_11.tar.gz [768 KB] Supplementary Archive 12. Directory (supermatrix_c_nt/) including Supermatrix C on nucleotide level including second codon positions (PHYLIP format) plus partition file. Supermatrix C on nucleotide level, including all codon positions (PHYLIP format) plus partition file. Supplementary_Archive_12.tar.gz [46 MB] Supplementary Archive 13. Pruned consensus bootstrap trees (threshold of 75%, t75 ) for supermatrix B and D on amino acid level and for supermatrix C on nucleotide level (in NEWICK [*.tre] and pdf format).
Supplementary_Archive_13.tar.gz [74 KB] Supplementary Archive 14. File S17. Estimated divergence dates based on 37 calibration points for each of the 105 (sub-)meta-partitions. Supplementary_Archive_14.tar.gz [390 KB]