Supporting Information Manning et al. 1.173/pnas.8131415 SI Text RM1 Motif. This Monosiga-specific motif of 22 aa is repeated 8 13 times in the extracellular regions of two RTKA s, one RTKG, and 4 other Monosiga predicted proteins (see.com for sequences and domain analysis). Fourteen of those proteins have N-terminal predicted signal peptides, but none have likely transmembrane regions, and only one has additional known domains (EF hands). While these gene predictions are preliminary, this suggests that some of these proteins might be secreted and possibly interact with the RTKs via homophilic adhesion. No clear examples of the domain repeat are found outside of Monosiga, though there are some scattered weakly similar sequences particularly in bacterial surface proteins, and profileprofile analysis with prc shows a weak overlap between RM1 and part of the eukaryotic Recep L domain. The logo view (Fig. S4) below of the alignment of all Monosiga RM1 motifs shows a partially conserved LxxL repeated pattern within the motif, which appears to be the main feature shared in these weak hits. RM2. This 8-aa domain is found in the cytoplasmic tail of four of the nine RTKB s and is repeated six times in RTKB2. It has not been found elsewhere in Monosiga or any published sequence. There is some substructure within the domain, including four conserved tyrosines followed by acidic residues (Fig. S5). These score highly by Scansite prediction (http://scansite.mit-.edu) both as Src phosphorylation sites and SH2 binding sites. This domain overlaps the MR motif seen in RTKB2, but due to the substructure within the domain, the MR phase is different to that of RM2. RM1-LRR. The RM1 motif emerged from a MEME search, and is found to partially overlap with the Pfam LRR (leucine-rich repeat) domain, so appears to be a Monosiga-specific extension of that domain. LRR-RM1 annotations refer to the merged domain. a. Similar to RM1, we found a variant LDL receptor type A repeat using Smart and Pfam models, and extended with a Monosiga-specific sequence extension. Unlike many other proteins, this domain is found only once per gene, and is specific both to Monosiga and to RTKs. -Related Domains. A number of weakly scoring (Hyalin Repeat) domain hits resolved into three major subclasses of this domain (2, 3, 4), with distinct patterns of conservation within the domain, but also considerable sequence variation, indels and partial hits within each domain, so this classification should be used with caution. 2 domains are most common in s, while 3 is found predominantly in SH2 proteins. 1. Obenauer JC, Cantley LC, Yaffe MB (23) Scansite 2.: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 31:3635 3641. 2. King N, Carroll SB (21) A receptor tyrosine from choanoflagellates: molecular insights into early animal evolution. Proc Natl Acad Sci USA 98: 1532 1537. Manning et al. www.pnas.org/cgi/content/short/8131415 1of13
Fig S1. Domain architecture of all Monosiga TKs. Manning et al. www.pnas.org/cgi/content/short/8131415 2of13
25 24 2 22 21 2 1 18 17 16 15 14 1 12 11 1 8 7 6 5 4 2 1 1 8 7 6 5 4 2 1 1 8 7 6 5 4 2 1 Receptor Tyrosine Kinases RTKC1 RTKC2 RTKC3 RTKC5 RTKC6 RTKC7 RTKC8 RTKC9 RTKC1 RTKD1 RTKD2 RTKD3 RTKD4 RTKE1 RM1-LRR RTKE2 RM1-LRR RTKE3 RM1-LRR RTKE4 RTKF1 RTKF2 RTKF3 RTKE5 RTKE6 RM21 36 3 27 24 21 18 15 12 6 PA26 RTKC4 25 24 2 22 21 2 1 18 17 16 15 14 1 12 11 1 8 7 6 5 4 2 1 FN3 FN3 FN3 FN3 FGTK1 FGTK2 FGTK3 FGTK4 FGTK5 FGTK6 FGTK7 FGTK9 FGTK1 FGTK11 FGTK12 FGTK13 FGTK14 FGTK8 ANF receptor LRTK1 RM1-LRR LRTK2 RM1-LRR LRTK3 RM1-LRR LRTK4 RM1-LRR LRTK5 RM1-LRR RM1-LRR Fig S1. Continued. Manning et al. www.pnas.org/cgi/content/short/8131415 3of13
25 24 2 22 21 2 1 18 17 16 15 14 1 12 11 1 8 7 6 5 4 2 1 Receptor Tyrosine Kinases RTKL1 RM1-LRR RTKL2 RM1-LRR RTKL3 RTKM1 RTKM2 Unclassified Tyrosine Kinases UTK1 UTK2 UTK3 UTK4 UTK5 UTK6 UTK7 UTK8 UTK9 UTK1 UTK11 UTK12 UTK13 UTK14 UTK15 UTK16 UTK17 UTK18 UTK19 UTK2 UTK21 UTK22 receptor L FN3 SH2 SH2 SH2 UTK23 UTK24 UTK25 UTK26 ANF receptor SAM FN3 25 24 2 22 21 2 1 18 17 16 15 14 1 12 11 1 8 7 6 5 4 2 1 25 24 2 22 21 2 1 18 17 16 15 14 1 12 11 1 8 7 6 5 4 2 1 RTKH1 FN3 RTKH2 RTKJ1 FN3 RTKK1 receptor L RTKJ2 receptor L pbh1 36 3 27 24 21 18 15 12 6 RTKK2 pbh1 MFS RTKG1 RTKG2 RM1 Fig S1. Continued. Manning et al. www.pnas.org/cgi/content/short/8131415 4of13
Fig S2. Domain architecture for all Monosiga PTP, SH2 and PTB domain containing proteins. SH2 domains in s and PTPs are listed under those headings. Manning et al. www.pnas.org/cgi/content/short/8131415 5of13
Fig S2. Continued. Manning et al. www.pnas.org/cgi/content/short/8131415 6of13
Fig S2. Continued. Manning et al. www.pnas.org/cgi/content/short/8131415 7of13
Fig S2. Continued. Manning et al. www.pnas.org/cgi/content/short/8131415 8of13
Fig S3. HMM logo comparison of Monosiga TKs with those of human, Drosophila, and C. elegans. Manning et al. www.pnas.org/cgi/content/short/8131415 9of13
Fig S4. Logo view of RM1 motif. Manning et al. www.pnas.org/cgi/content/short/8131415 1 of 13
Fig S5. Logo view of RM2 motif. Manning et al. www.pnas.org/cgi/content/short/8131415 11 of 13
Table S1. Accessory domain and motifs in Monosiga TKs Human TKs Name No. genes (families) Copies/ gene Related to/description with domain Extracellular motifs and domains RM1 3 (RTKA, G) 8 13 Unique to choanoflagellates - 11 (RTKC, E) 3 Family of domains, related to Ig, FN3 (Ig): FGFR, Trk, VR, Tie, Axl, PDGFR, CCK4 L 25 1 Similar to part of LDL receptor A motif Recep_L_domain 5 (RTKA, G, J, UTK) 1 2 Fragment of domain found in, Insulin receptors R, InsR 9 (FGTK) 3 2 Alpha-Integrin repeat motif - /CA- 1 (RTKB-D, H J) 1 9 Epidermal Growth Factor repeats Tie, Eph, ALK LRR 11 (FGTK, LRTK, RTKE, L) 1 4 Leucine Rich Repeat Trk 21 (RTKB-E, J, M) Rich in C and CxxC. Weakly similar to TNFR, furin, GCC2 repeats ANF_receptor 2 (UTK, RTKC) 1 Ligand binding domain of RGCs, which contain an inactive domain (Furin) R, InsR FN3 5 (RTKC, RTKH, UTK) 1 2 Fibronectin Type 3 domain Axl, Eph, InsR, Sev, Tie Intracellular motifs and domains SH2 14 (SFK, FVTK, 1 CTKA, 3 UTK) 1 Ptyr binding Src, Tec, Abl, Csk, Fer, Syk SH3 8 (SFK) 1 Binds PxxP motifs Src, Tec, Abl, Csk PTB 9 (HMTK) 1 4 Peptide and ptyr binding - FYVE 2 (FVTK) 1 Zinc Finger implicated in lipid binding - RGC CAP GLY 9 (RTKC) 1 Cytoskeleton-associated (19 copies in genome, including one PTP) PH 2 (CTKA, Tec) 1 Binds to lipids and signaling proteins Tec CH 1 (CTKB) 1 Calponin Homology. Actin-binding and signaling roles, also seen in many SH2-containing proteins C2 1 (Src) 1 Ca-dependent lipid association, maybe a substitution for missing myristoylation site SAM 1 (UTK) 1 Sterile Alpha Motif, also seen in many SH2-containing adaptors RM2 ( MR (3)) 4 (RTKB) 1 6 Novel motif, C-terminal of domain. 3 conserved tyrosine residues include conserved Src-like phosphorylation/sh2 binding motif. - - - ACK - Manning et al. www.pnas.org/cgi/content/short/8131415 12 of 13
Other Supporting Information Files Dataset S1 (PDF) Dataset S2 (XLS) Manning et al. www.pnas.org/cgi/content/short/8131415 13 of 13