In silico analysis of the NBS protein family in Ectocarpus siliculosus

Indian Journal of Biotechnology Vol 12, January 2013, pp 98-102 In silico analysis of the NBS protein family in Ectocarpus siliculosus Niaz Mahmood 1 * and Mahdi Muhammad Moosa 1,2 1 Molecular Biology Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka Ramna, Dhaka 1000, Bangladesh 2 Graduate Program in Biological Sciences, The Scripps Research Institute, 10550 North Torrey Pines Road La Jolla, California 92037, USA Nucleotide-binding site (NBS) domain containing proteins belong to one of the most well characterized family of proteins; they are found in almost all higher eukaryotes. Extensive studies have been done on the plant NBS proteins, but similar studies on the brown algae NBS proteins are not available. In the present study, authors examined the diversity of NBS proteins in model brown algae, Ectocarpus siliculosus. A total of twenty six NBS proteins were identified and classified into different subfamilies based on their distinct domain organizations. Although many characteristics of the protein family are similar to those of plant species, several features are quite distinct. One such characteristic is the presence of tetratrico peptide repeat (TPR) motifs at the C-terminal ends of these proteins. Another interesting finding is the presence of two E. siliculosus specific conserved motifs leading to novel combination of the NBS domain. The remarkable structural diversity found among these proteins further strengthens the idea that diversifying selection may have played an important role in their evolution. Keywords: Brown algae, Ectocarpus siliculosus, NBS, TPR Introduction Nucleotide-binding site (NBS) containing proteins have been identified to play a vital role in defense mechanisms in a wide range of species, from humans to plants 1. In metazoans, proteins containing this domain, like the human APAF-1 (apoptotic protease activating factor 1) and Caenorhabditis elegans CED- 4 (cell death 4), are involved in regulating programmed cell death 2. In plants, this domain is found in disease resistant NBS-LRRs (nucleotidebinding site leucine-rich repeats), which are proteins that play role in mediating a hypersensitive response at the site of pathogen infection 3. Typically, the NBS region is well conserved across different species which is marked by the presence of several conserved motifs. These include a p-loop (kinase 1), kinase 2, kinase 3 and other short motifs of unknown functions 2. The function of kinase 1 and kinase 2 is to bind with the phosphates of ATP, whereas kinase 3 interacts with the purine or ribose moiety 4. On the other hand, the C-termini of the *Author for correspondence: E-mail: niazmahmood.ami@gmail.com Note: After presentation of this paper in the conference, Zambounis et al [Mol Biol Evol, 29 (2012) 1263-1276] published an article containing detailed analysis of the same family. However, authors would like to declare no conflict of interest with any one. proteins are highly variable and are marked by the presence of different types of protein-protein interacting domains; the most common of which is the leucine-rich repeats (LRRs) found in plant NBS-LRR 5 and human NLR (nucleotide-binding domain, leucinerich repeat containing) family of proteins. However, other domains may also be present; as for example human APAF-1 proteins contain WD-40 repeats at their C-termini 1. A plethora of information on the structural and functional aspect of the NBS proteins in other lineages is available, but not much has been found for their counterparts in brown algae. Ectocarpus siliculosus is the only member of this lineage with full genome sequence available. Thus, a genome wide investigation on the NBS containing sequences in E. siliculosus was performed in the present study. Materials and Methods Identification and Classification of NBS Containing Proteins The E. siliculosus protein sequences were obtained from the database of Bioinformatics Gent Online Genome Annotation System (BOGAS; https://bioinformatics.psb.ugent.be/gdb/ectocarpus/). To identify the NBS domain containing sequences, InterProScan 6 searches using models for the NB-ARC (IPR002182) and disease resistance (IPR000767)

MAHMOOD & MOOSA: NBS PROTEINS IN E. SILICULOSUS 99 domains were used. The identified domains were curated manually. The Coiled Coil (CC) motifs were identified by COILS program 7 using a threshold of 0.9. Conserved motifs within the NBS domains were identified using MEME (Multiple Expectation Maximization for Motif Elicitation) 8,9. These detailed data on protein motifs and domains were used to classify the proteins into different subgroups. Sequence Alignment and Phylogenetic Analysis For the alignment, complete predicted protein sequences were trimmed in such a way that only the NBS domain containing regions are present. Then the sequences were aligned by Clustal X version 2.0 10 with default options and phylogenetic tree was constructed based on the bootstrap neighbor-joining method 11 using Molecular Evolutionary Genetics Analysis (MEGA) software version 4.1 12. The stability of internal nodes was assessed by bootstrap analysis with 10,000 replicates. Substitution Pattern of NBS Region in E. siliculosus Synonymous and non-synonymous substitution pattern was determined using the modified Nei- Gojobori 13 method with the Jukes-Cantor correction as implemented in MEGA 4.1 12. Results and Discussion E. siliculosus NBS Proteins are Few in Number but Diverse in Class Thirty one prospective proteins with highly conserved NBS regions were initially identified in E. siliculosus. Five of these proteins were subsequently found to be the products of pseudogenes and were excluded from further analysis. The percentage of pseudogenes (~16%) is quite high considering the total number of NBS proteins identified and is reminiscent of the fast evolution patterns described for plant disease resistance genes 14. Unlike their counterparts in Viridiplantae, the NBS proteins in E. siliculosus do not harbour C-terminal leucine rich repeats (LRRs), which play role in pathogen effector binding and signal transduction regulation. Instead of LRRs they have tetratrico peptide repeats (TPR) at their C-termini. Such TPR motifs are found in proteins from a wide range of organisms, ranging from bacteria to humans, and are involved in mediating protein-protein interactions and the assembly of multi protein complexes 15. With exception to four proteins (GenBank ID CBN79515, CBN74455, CBJ27327 & CBJ33350), all the E. siliculosus NBS proteins have TPRs at their C-termini. The number of TPRs within each members of NBS-TPR subfamily varied greatly, ranging from as high as 18 repeats in CBJ25655 to as low as one repeat in CBN79956 and CBJ26776; with an overall average of seven repeats. However, the exact reason for selecting TPRs instead of LRRs in the NBS proteins is still not clear. It is known that the TPRs can act as scaffolds for signaling proteins for helping in the recognition of their targets; hence, defining a novel mechanism for protein recognition 16. It is possible that different signaling pathways are activated by these proteins upon brown algae specific pathogen attack. Cock et al 17 in their study found that the E. siliculosus NBS proteins do not have any toll interleukin receptor (TIR) at their N-termini 17 and the only known motif found at that region is the CC motif. COILS 7 detected the presence of CC motifs in nine out of the twenty six NBS proteins. Previous studies suggested the involvement of the CC motif in proteinprotein interaction and signaling 18. Based on the differences in the N-terminal and C-terminal regions, the sequences were classified into four classes: NBS, NBS-TPR, CC-NBS-TPR and CC-NBS (Table 1). Clearly, the E. siliculosus NBS proteins are quite different in their structure compared to their counterparts in other species and are also significantly lower in number. NBS accounts for 0.16% of the total number of proteins in E. siliculosus, whereas the percentage is 1.38 and 0.68%, respectively in photosynthetic organisms like rice and Arabidopsis. Ancient duplication of the entire genome and subsequent chromosome rearrangements played a key role in the amplification of resistance genes in Arabidopsis 19,20, and gene duplication caused by expansion of diversity played a role in the amplification of NBS proteins in rice 21. Brown algae had diverged from other multicellular organisms more than a billion years ago and were subjected to Table 1 Classification of E. siliculosus NBS proteins based on the predicted domains Predicted protein Letter code E. siliculosus With TPRs CC-NBS-TPR CNT 08 NBS-TPR NT 14 Without TPRs CC-NBS CN 01 NBS N 03 Total NBS 26

100 INDIAN J BIOTECHNOL, JANUARY 2013 different selective pressure 22. So the possibility of developing alternate means of pathogen resistance cannot be over ruled. Conserved Motifs in NBS Region Irrespective of their differential classification, the identified proteins showed conservation in the NBS region. MEME identified several conserved motifs like the p-loop, RNBS-A, kinase 2, kinase 3 and RNBS-C motifs within the NBS regions of E. siliculosus; most of which were previously described in other species (Table 2). The hydrophobic GLPL domain found in other species was slightly modified into GHLPL (example CBN75740, CBJ26207) in E. siliculosus. In addition, we found two E. siliculosus specific conserved motifs within the NBS region. To our knowledge, these two motifs were not described in other organisms. We named the motifs as ENBS 1 and ENBS 2. While the ENBS 1 motif was found between kinase 2 and kinase 3 motifs, the ENBS 2 was mostly found after the GHLPL motif. However, the functions of these two noble motifs remain to be elucidated experimentally. The position of the motifs within the NBS domain is shown in Fig. 1. The length and order of motifs represent the actual organization in each protein. Table 2 Sequence of the major conserved motifs within E. siliculosus NBS domains Motif Motif name Sequence in E. siliculosus 1 P-loop or Kinase 1 GPSGAGKSTIAS 2 RNBS-A VRRHFRDGIFWL 3 Kinase 2 KCLVVADNVWE 4 ENBS 1 GKGAKDRLPALM 5 Kinase 3 or RNBS-B TGFHVLVTTRQR 6 RNBS-C EEEALELLRKTS 7 GHLPL CGHLPLVLAIAG 8 ENBS 2 RWSTVRGRSDRT Fig. 1 Distribution of conserved motifs within the NBS domains as identified by MEME. The name of each member and combined P value are shown on the left side of the figure. Different motifs are indicated with different colour boxes.

MAHMOOD & MOOSA: NBS PROTEINS IN E. SILICULOSUS 101 Evolutionary Analyses To clarify the phylogenetic relationship among the NBS proteins and infer the evolutionary history of this family, a phylogenetic tree was constructed using the protein sequences of the NBS region. The tree divided the sequences into two major clusters; with two sequences (CBN79515, CBJ26776) placed outside these clusters due to significant variation in their sequences (Fig. 2a). Most members of the same subfamilies were clustered together in different groups. Another tree has been constructed using the NBS regions of model organisms like rice, human and C. elegans as reference along with their E. siliculosus counterparts (Fig. 2b). This time the tree was generated from the Hidden Markov Model (HMM) based multiple sequence alignments of the sequences done by SATCHMO-JS 23. This tree showed almost similar distribution of the E. siliculosus NBS sequences in consistent with the previous tree (Fig. 2 a) and the sequences from other species were placed separately as out groups (Fig. 2 b). To analyze the selection pressure, nucleotide sequences within the NBS region was retrieved and synonymous and non-synonymous substitution pattern was determined, which was found to be 1.33. This indicated an overall positive selection within the NBS regions of the genes. Conclusion NBS domains are conserved throughout different eukaryote lineages. Proteins having this domain are known to function in a number of cellular processes, such as, regulating programmed cell death in metazoans and disease resistance in plants. Our analyses of NBS proteins from model brown algae, E. siliculosus, revealed that despite similarities with other lineages, brown algae have distinct NBS domain organization. Further experimental studies can elucidate functional and evolutionary significance of the novel domain organization. Fig. 2 (a & b) (a) Phylogenetic tree of the E. siliculosus NBS sequences based on the amino acid sequences in the NBS domain; & (b) Comparative phylogenetic tree using NBS regions of model organisms like rice (acc.no. NP_001044526), human (NP_863651) and C. elegans (CCD66782) as reference along with their E. siliculosus counterparts. This tree does not show the bootstrap values as it was generated by a different algorithm. (Black diamond NBS-TPR; white diamond NBS; black triangle CC-NBS-TPR; white triangle CC-NBS family of proteins; & black circle reference sequences) Acknowledgement The authors would like to thank Professor Haseena Khan, Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka for her constant support. References 1 Ting J P, Willingham S B & Bergstralh D T, NLRs at the intersection of cell death and immunity, Nat Rev Immunol, 8 (2008) 372-379.

102 INDIAN J BIOTECHNOL, JANUARY 2013 2 van der Biezen E A & Jones J D, The NB-ARC domain: A novel signalling motif shared by plant resistance gene products and regulators of cell death in animals, Curr Biol, 8 (1998) 226-227. 3 Dhaliwal H & Uchimiya H, Genetic engineering for disease and pest resistance in plants, Plant Biotechnol (Tokyo), 16 (1999) 255-262. 4 Traut T W, The functions and consensus motifs of nine types of peptide segments that form different types of nucleotidebinding sites, Eur J Biochem, 222 (1994) 9-19. 5 Dangl J L & Jones J D G, Plant pathogens and integrated defence responses to infection, Nature (Lond), 411 (2001) 826-833. 6 Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N et al, InterProScan: Protein domains identifier, Nucleic Acids Res, 33 (2005) 116-120. 7 Lupas A, Van Dyke M & Stock J, Predicting coiled coils from protein sequences, Science, 252 (1991) 1162-1164. 8 Bailey T L & Elkan C, Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proc Int Conf Intell Syst Mol Biol, 2 (1994) 28-36. 9 Bailey T L, Williams N, Misleh C & Li W W, MEME: Discovering and analyzing DNA and protein sequence motifs, Nucleic Acids Res, 34 (2006) 369-373. 10 Larkin M A, Blackshields G, Brown N P, Chenna R, McGettigan P A et al, Clustal W and Clustal X version 2.0, Bioinformatics, 23 (2007) 2947-2948. 11 Saitou N & Nei M, The neighbor-joining method: A new method for reconstructing phylogenetic trees, Mol Biol Evol, 4 (1987) 406-425. 12 Tamura K, Dudley J, Nei M & Kumar S, MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0, Mol Biol Evol, 24 (2007) 1596-1599. 13 Zhang J, Rosenberg H F & Nei M, Positive Darwinian selection after gene duplication in primate ribonuclease genes, Proc Natl Acad Sci USA, 95 (1998) 3708-3713. 14 Meyers B C, Kaushik S & Nandety R S, Evolving disease resistance genes, Curr Opin Plant Biol, 8 (2005) 129-134. 15 D'Andrea L D & Regan L, TPR proteins: The versatile helix, Trends Biochem Sci, 28 (2003) 655-662. 16 Das A K, Cohen P T W & Barford D, The structure of the tetratricopeptide repeats of protein phosphatase 5: Implications for TPR-mediated protein-protein interactions, EMBO J, 17 (1998) 1192-1199. 17 Cock J M, Sterck L, Rouzé P, Scornet D, Allen A E et al, The Ectocarpus genome and the independent evolution of multicellularity in brown algae, Nature (Lond), 465 (2010) 617-621. 18 Martin G B, Bogdanove A J & Sessa G, Understanding the functions of plant disease resistance proteins, Annu Rev Plant Biol, 54 (2003) 23-61. 19 Meyers B C, Kozik A, Griego A, Kuang H & Michelmore R W, Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis, Plant Cell, 15 (2003) 809-834. 20 Richly E, Kurth J & Leister D, Mode of amplification and reorganization of resistance genes during recent Arabidopsis thaliana evolution, Mol Biol Evol, 19 (2002) 76-84. 21 Zhou T, Wang Y, Chen J-Q, Araki H, Jing Z et al, Genomewide identification of NBS genes in japonica rice reveals significant expansion of divergent non-tir NBS-LRR genes, Mol Genet Genomics, 271 (2004) 402-415. 22 Yoon H S, Hackett J D, Ciniglia C, Pinto G & Bhattacharya D, A molecular timeline for the origin of photosynthetic eukaryotes, Mol Biol Evol, 21 (2004) 809-818. 23 Hagopian R, Davidson J R, Datta R S, Samad B, Jarvis G R et al, SATCHMO-JS: A webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction, Nucleic Acids Res,38 (2010) 29-34.