Phylogenomics Resolves The Timing And Pattern Of Insect Evolution. - Supplementary File Archives -

Similar documents
Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Session 5: Phylogenomics

SUPPLEMENTARY INFORMATION

EECS730: Introduction to Bioinformatics

Large-Scale Genomic Surveys

Week 10: Homology Modelling (II) - HHpred

Using Bioinformatics to Study Evolutionary Relationships Instructions

Comparing whole genomes

Evaluating the usefulness of alignment filtering methods to reduce the impact of errors on evolutionary inferences

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Sequence Analysis and Databases 2: Sequences and Multiple Alignments

Supplementary Information

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Sequence Alignment Techniques and Their Uses

Multiple sequence alignment

Homology and Information Gathering and Domain Annotation for Proteins

GEP Annotation Report

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

1 Abstract. 2 Introduction. 3 Requirements. 4 Procedure

Homology. and. Information Gathering and Domain Annotation for Proteins

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Similarity searching summary (2)

SUPPLEMENTARY INFORMATION

Phylogenetic analyses. Kirsi Kostamo

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative Genomics II

Hands-On Nine The PAX6 Gene and Protein

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Sequence Analysis '17- lecture 8. Multiple sequence alignment

C3020 Molecular Evolution. Exercises #3: Phylogenetics

The Phylogenetic Handbook

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Gene function annotation

Introduction to Bioinformatics Online Course: IBT

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

7. Tests for selection

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Scoring Matrices. Shifra Ben-Dor Irit Orr

Emily Blanton Phylogeny Lab Report May 2009

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

PGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species

Synteny Portal Documentation

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

BLAST. Varieties of BLAST

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

Domain-based computational approaches to understand the molecular basis of diseases

An Introduction to Bioinformatics Algorithms Hidden Markov Models


Introduction to the SNP/ND concept - Phylogeny on WGS data

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

CSCE555 Bioinformatics. Protein Function Annotation

Hidden Markov Models

Ch. 9 Multiple Sequence Alignment (MSA)

-max_target_seqs: maximum number of targets to report

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Symmetric Tree, ClustalW. Divergence x 0.5 Divergence x 1 Divergence x 2. Alignment length

IMPLEMENTING HIERARCHICAL CLUSTERING METHOD FOR MULTIPLE SEQUENCE ALIGNMENT AND PHYLOGENETIC TREE CONSTRUCTION

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Basic Local Alignment Search Tool

objective functions...

Genome Annotation Project Presentation

Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution

Alignment & BLAST. By: Hadi Mozafari KUMS

Comparative Bioinformatics Midterm II Fall 2004

Tools and Algorithms in Bioinformatics

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Introduction to Bioinformatics

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

IMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Constructing Evolutionary/Phylogenetic Trees

Phylogenetic Tree Reconstruction

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Software review. Detecting horizontal gene transfer with T-REX and RHOM programs

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

USE OF CLUSTERING TECHNIQUES FOR PROTEIN DOMAIN ANALYSIS

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Large-scale gene family analysis of 76 Arthropods

MEME - Motif discovery tool REFERENCE TRAINING SET COMMAND LINE SUMMARY

Objectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.

Subfamily HMMS in Functional Genomics. D. Brown, N. Krishnamurthy, J.M. Dale, W. Christopher, and K. Sjölander

Transcription:

Phylogenomics Resolves The Timing And Pattern Of Insect Evolution. - Supplementary File Archives - This README was written in June 2014 For any questions regarding the nature of our data, please contact Bernhard Misof (b.misof.zfmk AT uni-bonn.de) Misof et al. 2014: Phylogenomics resolves the timing and pattern of insect evolution. NOTE: In several files of these archives, some species names are different compared to the Supplementary Online Material and the original publication due to identification updates. The names in the publication are those which are valid. Here we provide a dictionary to accommodate these identification/name updates. Name in analyses files (outdated) Name in Suppl. Material Online/Manuscript (up to date) Explanation Sminthurus_vir_nig Sminthurus viridis Sample contains S. viridis and S. nigromaculatus. It is not clear whether S. nigromaculatus is a valid species or a synonym. Therefore, we consider this as S. viridis, submitted to NCBI as S. viridis. Pogonognathellus_lon_fla Pogonognathellus spp. Species mixture of P. longicornis and flavescens, listed as Pogonognathellus spp., submitted to NCBI as Pogonognathellus sp. Cheumatopsyche_sp Annulipalpia chimera Accidental mixture of 2 species both belonging to monophyletic Annulipalpia; listed as Annulipalpia chimera, submitted to NCBI as Annulipalpia sp. Hydroptilidae_sp Hydroptila spp. Species mixture of H. actia and H. argosa, listed as Hydroptila spp., submitted to NCBI as Hydroptila sp. AD-2013.

Eriocrania_subpurpurella Dyseriocrania subpurpurella Changed to valid name Dyseriocrania subpurpurella. Parides_arcas Parides eurimedes Changed to valid name Parides eurimedes. Trichocera_fuscata Trichocera saltator Changed to valid name Trichocera saltator. Cryptocercus_sp Cryptocercus wrighti Identified as Cryptocercus wrighti during the analyses process. Nannochorista_sp Nannochorista philpotti Identified as Nannochorista philpotti during the analyses process. Baetis pumilus Baetis sp. Identification not possible, changed to Baetis sp. Dichochrysa prasina Pseudomallada prasinus Changed to valid name Pseudomallada prasinus. MATERIALS AND METHODS Supplementary Archive 1. Directory including the ortholog set including 1,478 ortholog groups of the 12 reference species: alignments (FASTA format) serving as input for the profile Hidden Markov Models (phmms), generated phmms, and BLAST databases generated from the official gene sets for the reciprocal BLAST search (= ready to use for HaMStRad). Supplementary_Archive_1.tar.gz [236 MB] Supplementary Archive 2. Directory including (refined) multiple sequence alignments (MSAs) (not masked) of 1,478 ortholog groups (OGs) on amino acid level and corresponding nucleotide MSAs after removal of outliers. Supplementary_Archive_2.tar.gz [110 MB] Supplementary Archive 3. Directories including coordinates of the annotated Pfam-A and Pfam-B domains on amino acid (Pfam_coordinates_aa/) and on nucleotide level (Pfam_coordinates_nuc/) for each gene separately (amino acid level: *.aa_coords.txt; nucleotide level: *.nuc_coords.txt). Supplementary_Archive_3.tar.gz [399 KB] Supplementary Archive 4. Directory including MSAs of 85 meta-partitions (PHYLIP format) extracted from supermatrix C and used for estimating divergence times. File S4. Models for the analyses with BEAST v1.8 (supermatrix C).

Statistics. Supplementary_Archive_4.tar [8.5 MB] SUPPLEMENTARY TEXT Supplementary Archive 5. File S5: List of identified outliers after refinement of the multiple sequence alignments of all 1,478 OGs. IDs of the OGs correspond to OrthoDB 5.0. Abbreviations of reference species correspond to abbreviations used in the official gene set releases. a) List of 472 multiple sequence alignments of the OGs containing outlier amino acid transcripts after the alignment refinement. b) List of multiple sequence alignments of OGs containing outlier amino acid transcripts of the reference species after alignment refinement. Log files of identified outliers prior and after alignment refinement. Supplementary_Archive_5.tar.gz [2.6 MB] Supplementary Archive 6. Directory including the annotation of protein domains using the Pfam database. Input files for analyses with PartitionFinder using different starting schemes (OGs versus clans/domains/voids) for supermatrix A. Supplementary_Archive_6.tar.gz [10 MB] Supplementary Archive 7. Directory including Aliscore output list for each gene with positions suggested to exclude from further analyses; 2 subdirectories with 1,478 files each; amino acid level (aa_aliscore_lists/): *aa.fas_list_random.txt; nucleotide level (nt_aliscore_lists/): *nt.fas_list_random.txt). Directories including MSAs of the 1,478 OSGs on amino acid level (aa_masked/) and nucleotide level (nt_masked/): ambiguously aligned regions had been removed (i.e., masked alignments); gappy ends were filled with 'X' or 'N' respectively (see Supplementary Text, Chapter 2.3). Supplementary_Archive_7.tar [72 MB] Supplementary Archive 8. Directory (supermatrices_partitions/) including: Supermatrix A, B, C and D (amino acid level, FASTA format). Partition schemes for supermatrix A (OGs versus protein clans, domains, voids: *.partitions). Partition schemes for supermatrix B and C prior to Partitionfinder (*.partitions). Supplementary_Archive_8.tar.gz [101.6 MB] Supplementary Archive 9. Directory (supermatrices_fclm/) including:

Original supermatrices generated for testing the 12 selected hypotheses using Four-cluster Likelihood Mapping (FASTA format), and respective partition file(s). Subdirectory supermatrix_c_fclm: 12 supermatrices, 1 partition file (similar for all matrices) Subdirectory supermatrix_d_fclm: 12 supermatrices, 12 partition files. Supplementary_Archive_9.tar.gz [167 MB] Supplementary Archive 10. File S6. Summary of symmetry tests of pairwise sequence comparisons for the "SRH" sub-alignment (supermatrix D) derived from supermatrix C. File S7. Summary of symmetry tests of pairwise sequence comparisons for the "non-srh" sub-alignment derived from supermatrix C. File S8. Comparison of p-values based on the pairwise sequences comparisons (Bowker's test) from the "SRH" (supermatrix D) and "non-srh" subalignment derived from supermatrix C. Supplementary_Archive_10.tar.gz [3.2 MB] Supplementary Archive 11. File S9, S10. Starting scheme (1,478 data blocks) and best partition scheme (AICc, 727 meta-partitions) for supermatrix A based on orthologous genes. File S11, S12. Starting scheme (2,673 data blocks) and best partition scheme (AICc, 821 meta-partitions) for supermatrix A based on clans, protein domains File S13, S14. Starting scheme (2,263 data blocks) and best partition scheme (AICc, 770 meta-partitions) for supermatrix B based on clans, protein domains File S15, S16. Starting scheme (1,240 data blocks) and best partition scheme (AICc, 479 meta-partitions) for supermatrix C based on clans, protein domains Supplementary_Archive_11.tar.gz [768 KB] Supplementary Archive 12. Directory (supermatrix_c_nt/) including Supermatrix C on nucleotide level including second codon positions (PHYLIP format) plus partition file. Supermatrix C on nucleotide level, including all codon positions (PHYLIP format) plus partition file. Supplementary_Archive_12.tar.gz [46 MB] Supplementary Archive 13. Pruned consensus bootstrap trees (threshold of 75%, t75 ) for supermatrix B and D on amino acid level and for supermatrix C on nucleotide level (in NEWICK [*.tre] and pdf format).

Supplementary_Archive_13.tar.gz [74 KB] Supplementary Archive 14. File S17. Estimated divergence dates based on 37 calibration points for each of the 105 (sub-)meta-partitions. Supplementary_Archive_14.tar.gz [390 KB]