Supplementary Figure 1 Histogram of the marginal probabilities of the ancestral sequence reconstruction without gaps (insertions and deletions).

Similar documents
Finding Motifs in Protein Sequences and Marking Their Positions in Protein Structures

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Genomics and bioinformatics summary. Finding genes -- computer searches

Molecular modeling. A fragment sequence of 24 residues encompassing the region of interest of WT-

Some Problems from Enzyme Families

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

MegAlign Pro Pairwise Alignment Tutorials

Protein Structures: Experiments and Modeling. Patrice Koehl

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Detection of Protein Binding Sites II

Acta Crystallographica Section D

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Introduction to Evolutionary Concepts

IT og Sundhed 2010/11

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Structure of the SPRY domain of human DDX1 helicase, a putative interaction platform within a DEAD-box protein

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Cryo-EM data collection, refinement and validation statistics

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Supplementary Information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain, Rensselaer Polytechnic Institute

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

BIOINFORMATICS LAB AP BIOLOGY

Homology Modeling. Roberto Lins EPFL - summer semester 2005

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

BIOINFORMATICS: An Introduction

Computational methods for predicting protein-protein interactions

Lecture 7 Sequence analysis. Hidden Markov Models

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

A New Similarity Measure among Protein Sequences

Modelling of Possible Binding Modes of Caffeic Acid Derivatives to JAK3 Kinase

Chapter 26 Phylogeny and the Tree of Life

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Homology models of the tetramerization domain of six eukaryotic voltage-gated potassium channels Kv1.1-Kv1.6

, Work in progress

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Analysis on sliding helices and strands in protein structural comparisons: A case study with protein kinases

SUPPLEMENTARY INFORMATION

Effects of Gap Open and Gap Extension Penalties

Erasing Errors Due to Alignment Ambiguity When Estimating Positive Selection

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Protein Bioinformatics Computer lab #1 Friday, April 11, 2008 Sean Prigge and Ingo Ruczinski

Algorithms in Bioinformatics

The PRALINE online server: optimising progressive multiple alignment on the web

Supplementary Information. Recognition of the pre-mirna structure by Drosophila Dicer-1

Comparative Genomics II

Multiple Sequence Alignment. Sequences

Hands-On Nine The PAX6 Gene and Protein

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

SUPPLEMENTARY INFORMATION

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

PHYLOGENY & THE TREE OF LIFE

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Some Statistical Inferences For Two Frequency Distributions Arising In Bioinformatics

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

Supplementary Figure 1 Crystal packing of ClR and electron density maps. Crystal packing of type A crystal (a) and type B crystal (b).

Protein Secondary Structure Prediction using Feed-Forward Neural Network

Introduction to protein alignments

SUPPLEMENTARY INFORMATION

Variable-Length Protein Sequence Motif Extraction Using Hierarchically-Clustered Hidden Markov Models

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Supporting Online Material for

Diphthamide biosynthesis requires a radical iron-sulfur enzyme. Pennsylvania State University, University Park, Pennsylvania 16802, USA

Software GASP: Gapped Ancestral Sequence Prediction for proteins Richard J Edwards* and Denis C Shields

Comparing Genomes! Homologies and Families! Sequence Alignments!

BIOINFORMATICS. NIFAS: visual analysis of domain evolution in proteins. Christian E. V. Storm and Erik L. L. Sonnhammer INTRODUCTION

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Protein structure alignments

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

RELATING PHYSICOCHEMMICAL PROPERTIES OF AMINO ACIDS TO VARIABLE NUCLEOTIDE SUBSTITUTION PATTERNS AMONG SITES ZIHENG YANG

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Optimization of the Sliding Window Size for Protein Structure Prediction

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

SUPPLEMENTARY INFORMATION

Supplementary Figure 1 Preparation of PDA nanoparticles derived from self-assembly of PCDA. (a)

Motivating the need for optimal sequence alignments...

Biological networks CS449 BIOINFORMATICS

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

CSCE555 Bioinformatics. Protein Function Annotation

A profile-based protein sequence alignment algorithm for a domain clustering database

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Exploring Evolution & Bioinformatics

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Today s Lecture: HMMs

Supplementary Figure 1

Transcription:

Supplementary Figure 1 Histogram of the marginal probabilities of the ancestral sequence reconstruction without gaps (insertions and deletions).

Supplementary Figure 2 Marginal probabilities of the ancestral sequence reconstruction (via FastML) (Ashkenazy et al., 2012) with respect to amino acid position and secondary structure prediction (JPred3) (Cole et al., 2006).

Supplementary Figure 3 Structural assessment via SWISS-MODEL (Arnold et al. 2006) for the best model built for the ancestral sequence between vertebrate thymidine kinase 2 and the arthropod multisubstrate deoxyribonucleoside kinases.

Supplementary Figure 4 Structural assessment via SWISS-MODEL (Arnold et al. 2006) for the best model built for human thymidine kinase 2

Supplementary Figure 5 Partial alignment of the TK2/dNK sequences (a) arranged by the topology of the corresponding tree (b). Notice the insertion unique to Monodelphis domestica and Sarcophilus harrissii (top middle, likely the cause of topological discrepancy between the gene and species trees). (a)

(b)

Supplementary Figure 6 DSSP (Kabsch and Sander, 1983) inference of secondary structure for D. melanogaster dnk, both truncated (1J90 ; Johansson et al. 2001) and non-trancated (2VP0; Mikkelsen et al. 2008), H. sapiens dck (2A2Z; Godsey et al. 2006), and H. sapiens dgk (2OCP; Johansson et al. 2001), and for the ROSETTA and DMD modeled structures of human TK2 and the ancestral sequence.

Supplementary Figure 7 Tree reconstruction of a subset of sequences from the original dataset. While the grouping differs from that in figure 2, the tree still implies that crustacean and some insect species have multiple dnks grouping paraphyletically.

Supplementary Figure 8: Gu99 results (Gu and Vander Velden, 2002) are shown for posterior probability cutoff values of 0.5 on the alignment (a) and the structures of Homo sapiens TK2 (as modeled by Rosetta) (b) and Drosophila melanogaster dnk (c). Coloring of the alignment corresponds to similarity in amino acid properties. Posterior probability profiles are given in Supplementary Figure 11. (a) (b) Homo sapiens TK2 Rosetta Model (c) 1J90 (Drosophila melanogaster dnk)

Supplementary Figure 9: Type 2 divergence (Gu and Vander Velden, 2002) results are shown for posterior probability cutoff values of 0.5 on the alignment (a) and the structures of Homo sapiens TK2 (as modeled by Rosetta) (b) and Drosophila melanogaster dnk (c). Coloring of the alignment corresponds to similarity in amino acid properties. Posterior probability profiles are given in Supplementary Figure 11. (a) (b) Homo sapiens TK2 Rosetta Model (c) 1J90 (Drosophila melanogaster dnk)

Supplementary Figure 10: Site specific profiles for a three-cluster analysis for type 1 divergence (a) (Gu99 algorithm), type 2 divergence (b), and type 1 divergence (c) (2013 implementation) (Gu and Vander Velden, 2002; Gu et al., 2013). The three clusters contain the TK2, the crustacean and arachnid dnks, and the insect dnks. (a) The Gu99 analysis only identified residue 237 to be of significance between the crustacean/arachnid and the insect dnks (yellow). No residues were identified to be significant in type 1 divergence between TK2 and the insect dnks (orange), while a number were found to have potential type 2 divergence patterns between TK2 and crustacean/arachnid dnk (blue). The residues of interest above posterior probabilities of 0.5 are highlighted in suppl. Fig. 9. (b) Posterior probabilities for type 2 divergence indicate that multiple residues may have contributed to functional divergence between TK2 and the crustacean/arachnid dnks (blue), as well as between the crustacean/arachnid and the insect dnks (yellow). These residues of interest above posterior probabilities of 0.5 are highlighted in suppl. Fig. 10. (c) Type 1 functional divergence analysis (2013 algorithm) implies with high posterior probability that none of the three clusters experienced type 1 functional divergence (red bars). The xaxis refers to residue position, while the y-axis indicates posterior probabilities. Blue bars represent the probability that cluster 1 only experienced type 1 divergence. Cluster 2 and 3 had posterior probabilities of zero for type 1 divergence (not shown). (a) (b) (c)

Supplementary Figure 11 Gu99 results (Gu and Vander Velden, 2002) are shown for posterior probability cutoff values of 0.5 on the alignment (a) and the structures of Homo sapiens dck (b) and Homo sapiens dgk (c). Coloring of the alignment corresponds to similarity in amino acid properties. Posterior probability profiles are given in Supplementary Figure 14. (a) (b) Homo sapiens dck (c) Homo sapiens dgk

Supplementary Figure 12 Type 2 divergence (Gu and Vander Velden, 2002) results are shown for posterior probability cutoff values of 0.5 on the alignment (a) and the structures of Homo sapiens dck (b) and Homo sapiens dgk (c). Coloring of the alignment corresponds to similarity in amino acid properties. Posterior probability profiles are given in Supplementary Figure 14. (a) (b) Homo sapiens dck (c) Homo sapiens dgk

Supplementary Figure 13 Site specific profiles for a three-cluster analysis for type 1 divergence (a) (Gu99 algorithm), type 2 divergence (b), and type 1 divergence (c) (2013 implementation) (Gu and Vander Velden, 2002; Gu et al., 2013). The three clusters contain the dgksand the dcks/dck2s. (a) The Gu99 did not identify any type 1 residues between dck and dck2, however, a cluster of residues seem to have experienced type 1 divergence between dgk and dck2 (blue) and possibly between dgk and dck (orange). The residues of interest above posterior probabilities of 0.5 are highlighted in suppl. fig. 12. (b) Posterior probabilities for type 2 divergence indicate that multiple residues may have contributed to functional divergence between dgk and dck2 (blue), as well as between the dck and dck2 (yellow). These residues of interest above posterior probabilities of 0.5 are highlighted in suppl. fig. 13. (c) Type 1 functional divergence analysis (2013 algorithm): the x-axis refers to residue position, while the y-axis indicates posterior probabilities. Red bars represent the posterior probability that none of the three clusters experienced type 1 functional divergence. Blue bars represent the probability that cluster 1 only experienced type 1 divergence. Purple bars represent type 1 functional divergence in cluster 2. Cluster 3 had posterior probabilities of zero for type 1 divergence (not shown). (a) (b) (c)

Supplementary Figure 14 Site specific profiles for the Gu99 two-cluster analyses (Gu and Vander Velden, 2002) of dnk and TK2 (a) and of dck and dgk (b). (a) (b)

References: Arnold K, Bordoli L, Kopp J, Schwede T (2006) The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics 22:195-201. Ashkenazy H, Penn O, Doron-Faigenboim A, Cohen O, Cannarozzi G, Zomer O, Pupko T (2012) FastML: a web server for probabilistic reconstruction of ancestralsequences. Nucl Acids Res 40(Web Server issue):w580-w584. Cole C, Barber JD, Barton GJ (2008) The Jpred 3 secondary structure prediction server. Nucl Acids Res 36(suppl 2):W197-W201. Gu X, Vander Velden K. (2002). DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics 18(3):500-501. Gu X, Zou Y, Su Z, Huang W, Zhou Z, Arendsee Z, Zeng Y. (2013). An update of DIVERGE software for functional divergence analysis of protein family. Mol Biol Evol 30(7):1713-1719. Godsey MH, Ort S, Sabini E, Konrad M, Lavie A (2006) Structural basis for the preference of UTP over ATP in human deoxycytidine kinase: illuminating the role of main-chain reorganization. Biochemistry 45:452-461. Johansson K, Ramaswamy S, Ljungcrantz C, Knecht W, Piškur J, Munch-Petersen B, Eriksson S, Eklund H (2001) Structural basis for substrate specificities of cellular deoxyribonucleoside kinases. Nat Struct Biol 8(7):616-620. Kabsch W, Sander C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogenbonded and geometrical features. Biopolymers 22:2577-2637. Mikkelsen NE, Munch-Petersen B, Eklund H (2008) Structural studies of nucleoside analog and feedback inhibitor binding to Drosophila melanogaster multisubstrate deoxyribonucleoside kinase. FEBS J 275(9):2151-2160.