Supplementary Figure 1 Histogram of the marginal probabilities of the ancestral sequence reconstruction without gaps (insertions and deletions).
Supplementary Figure 2 Marginal probabilities of the ancestral sequence reconstruction (via FastML) (Ashkenazy et al., 2012) with respect to amino acid position and secondary structure prediction (JPred3) (Cole et al., 2006).
Supplementary Figure 3 Structural assessment via SWISS-MODEL (Arnold et al. 2006) for the best model built for the ancestral sequence between vertebrate thymidine kinase 2 and the arthropod multisubstrate deoxyribonucleoside kinases.
Supplementary Figure 4 Structural assessment via SWISS-MODEL (Arnold et al. 2006) for the best model built for human thymidine kinase 2
Supplementary Figure 5 Partial alignment of the TK2/dNK sequences (a) arranged by the topology of the corresponding tree (b). Notice the insertion unique to Monodelphis domestica and Sarcophilus harrissii (top middle, likely the cause of topological discrepancy between the gene and species trees). (a)
(b)
Supplementary Figure 6 DSSP (Kabsch and Sander, 1983) inference of secondary structure for D. melanogaster dnk, both truncated (1J90 ; Johansson et al. 2001) and non-trancated (2VP0; Mikkelsen et al. 2008), H. sapiens dck (2A2Z; Godsey et al. 2006), and H. sapiens dgk (2OCP; Johansson et al. 2001), and for the ROSETTA and DMD modeled structures of human TK2 and the ancestral sequence.
Supplementary Figure 7 Tree reconstruction of a subset of sequences from the original dataset. While the grouping differs from that in figure 2, the tree still implies that crustacean and some insect species have multiple dnks grouping paraphyletically.
Supplementary Figure 8: Gu99 results (Gu and Vander Velden, 2002) are shown for posterior probability cutoff values of 0.5 on the alignment (a) and the structures of Homo sapiens TK2 (as modeled by Rosetta) (b) and Drosophila melanogaster dnk (c). Coloring of the alignment corresponds to similarity in amino acid properties. Posterior probability profiles are given in Supplementary Figure 11. (a) (b) Homo sapiens TK2 Rosetta Model (c) 1J90 (Drosophila melanogaster dnk)
Supplementary Figure 9: Type 2 divergence (Gu and Vander Velden, 2002) results are shown for posterior probability cutoff values of 0.5 on the alignment (a) and the structures of Homo sapiens TK2 (as modeled by Rosetta) (b) and Drosophila melanogaster dnk (c). Coloring of the alignment corresponds to similarity in amino acid properties. Posterior probability profiles are given in Supplementary Figure 11. (a) (b) Homo sapiens TK2 Rosetta Model (c) 1J90 (Drosophila melanogaster dnk)
Supplementary Figure 10: Site specific profiles for a three-cluster analysis for type 1 divergence (a) (Gu99 algorithm), type 2 divergence (b), and type 1 divergence (c) (2013 implementation) (Gu and Vander Velden, 2002; Gu et al., 2013). The three clusters contain the TK2, the crustacean and arachnid dnks, and the insect dnks. (a) The Gu99 analysis only identified residue 237 to be of significance between the crustacean/arachnid and the insect dnks (yellow). No residues were identified to be significant in type 1 divergence between TK2 and the insect dnks (orange), while a number were found to have potential type 2 divergence patterns between TK2 and crustacean/arachnid dnk (blue). The residues of interest above posterior probabilities of 0.5 are highlighted in suppl. Fig. 9. (b) Posterior probabilities for type 2 divergence indicate that multiple residues may have contributed to functional divergence between TK2 and the crustacean/arachnid dnks (blue), as well as between the crustacean/arachnid and the insect dnks (yellow). These residues of interest above posterior probabilities of 0.5 are highlighted in suppl. Fig. 10. (c) Type 1 functional divergence analysis (2013 algorithm) implies with high posterior probability that none of the three clusters experienced type 1 functional divergence (red bars). The xaxis refers to residue position, while the y-axis indicates posterior probabilities. Blue bars represent the probability that cluster 1 only experienced type 1 divergence. Cluster 2 and 3 had posterior probabilities of zero for type 1 divergence (not shown). (a) (b) (c)
Supplementary Figure 11 Gu99 results (Gu and Vander Velden, 2002) are shown for posterior probability cutoff values of 0.5 on the alignment (a) and the structures of Homo sapiens dck (b) and Homo sapiens dgk (c). Coloring of the alignment corresponds to similarity in amino acid properties. Posterior probability profiles are given in Supplementary Figure 14. (a) (b) Homo sapiens dck (c) Homo sapiens dgk
Supplementary Figure 12 Type 2 divergence (Gu and Vander Velden, 2002) results are shown for posterior probability cutoff values of 0.5 on the alignment (a) and the structures of Homo sapiens dck (b) and Homo sapiens dgk (c). Coloring of the alignment corresponds to similarity in amino acid properties. Posterior probability profiles are given in Supplementary Figure 14. (a) (b) Homo sapiens dck (c) Homo sapiens dgk
Supplementary Figure 13 Site specific profiles for a three-cluster analysis for type 1 divergence (a) (Gu99 algorithm), type 2 divergence (b), and type 1 divergence (c) (2013 implementation) (Gu and Vander Velden, 2002; Gu et al., 2013). The three clusters contain the dgksand the dcks/dck2s. (a) The Gu99 did not identify any type 1 residues between dck and dck2, however, a cluster of residues seem to have experienced type 1 divergence between dgk and dck2 (blue) and possibly between dgk and dck (orange). The residues of interest above posterior probabilities of 0.5 are highlighted in suppl. fig. 12. (b) Posterior probabilities for type 2 divergence indicate that multiple residues may have contributed to functional divergence between dgk and dck2 (blue), as well as between the dck and dck2 (yellow). These residues of interest above posterior probabilities of 0.5 are highlighted in suppl. fig. 13. (c) Type 1 functional divergence analysis (2013 algorithm): the x-axis refers to residue position, while the y-axis indicates posterior probabilities. Red bars represent the posterior probability that none of the three clusters experienced type 1 functional divergence. Blue bars represent the probability that cluster 1 only experienced type 1 divergence. Purple bars represent type 1 functional divergence in cluster 2. Cluster 3 had posterior probabilities of zero for type 1 divergence (not shown). (a) (b) (c)
Supplementary Figure 14 Site specific profiles for the Gu99 two-cluster analyses (Gu and Vander Velden, 2002) of dnk and TK2 (a) and of dck and dgk (b). (a) (b)
References: Arnold K, Bordoli L, Kopp J, Schwede T (2006) The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics 22:195-201. Ashkenazy H, Penn O, Doron-Faigenboim A, Cohen O, Cannarozzi G, Zomer O, Pupko T (2012) FastML: a web server for probabilistic reconstruction of ancestralsequences. Nucl Acids Res 40(Web Server issue):w580-w584. Cole C, Barber JD, Barton GJ (2008) The Jpred 3 secondary structure prediction server. Nucl Acids Res 36(suppl 2):W197-W201. Gu X, Vander Velden K. (2002). DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics 18(3):500-501. Gu X, Zou Y, Su Z, Huang W, Zhou Z, Arendsee Z, Zeng Y. (2013). An update of DIVERGE software for functional divergence analysis of protein family. Mol Biol Evol 30(7):1713-1719. Godsey MH, Ort S, Sabini E, Konrad M, Lavie A (2006) Structural basis for the preference of UTP over ATP in human deoxycytidine kinase: illuminating the role of main-chain reorganization. Biochemistry 45:452-461. Johansson K, Ramaswamy S, Ljungcrantz C, Knecht W, Piškur J, Munch-Petersen B, Eriksson S, Eklund H (2001) Structural basis for substrate specificities of cellular deoxyribonucleoside kinases. Nat Struct Biol 8(7):616-620. Kabsch W, Sander C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogenbonded and geometrical features. Biopolymers 22:2577-2637. Mikkelsen NE, Munch-Petersen B, Eklund H (2008) Structural studies of nucleoside analog and feedback inhibitor binding to Drosophila melanogaster multisubstrate deoxyribonucleoside kinase. FEBS J 275(9):2151-2160.