In silico analysis of the NBS protein family in Ectocarpus siliculosus

Similar documents
Genomewide analysis of NBS-encoding genes in kiwi fruit (Actinidia chinensis)

Genome-Wide Analysis of NBS-LRR Encoding Genes in Arabidopsis

Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain, Rensselaer Polytechnic Institute

A genomic analysis of disease-resistance genes encoding nucleotide binding sites in Sorghum bicolor

Systematic Analysis and Comparison of Nucleotide-Binding Site Disease Resistance Genes in a Diploid Cotton Gossypium raimondii

Objectives. Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain 1,2 Mentor Dr.

Small RNA in rice genome

Computational approaches for functional genomics

Computational Biology: Basics & Interesting Problems

Computational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins

BIOINFORMATICS: An Introduction

Curriculum Links. AQA GCE Biology. AS level

Genomes and Their Evolution

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

CSCE555 Bioinformatics. Protein Function Annotation

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

Host-Pathogen interaction-ii. Pl Path 604 PN Sharma Department of Plant Pathology CSK HPKV, Palampur

Computational Structural Bioinformatics

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Resistance gene analogues of Arabidopsis thaliana: recognition by structure

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Supplementary Information

SUPPLEMENTARY INFORMATION

Cladistics and Bioinformatics Questions 2013

Comparative Bioinformatics Midterm II Fall 2004

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

I. Molecules and Cells: Cells are the structural and functional units of life; cellular processes are based on physical and chemical changes.

Effects of Gap Open and Gap Extension Penalties

I. Molecules & Cells. A. Unit One: The Nature of Science. B. Unit Two: The Chemistry of Life. C. Unit Three: The Biology of the Cell.

Research Article Genome Wide Analysis of Nucleotide-Binding Site Disease Resistance Genes in Brachypodium distachyon

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Predicting Protein Functions and Domain Interactions from Protein Interactions

Dr. Amira A. AL-Hosary

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny

Genome-wide analysis of nucleotide-binding site disease resistance genes in Medicago truncatula

SUPPLEMENTARY INFORMATION

AP Biology Essential Knowledge Cards BIG IDEA 1

Chapters AP Biology Objectives. Objectives: You should know...

Sequence analysis and comparison

Supporting Information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

SUPPLEMENTARY INFORMATION

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

7. Tests for selection

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Introduction to Molecular and Cell Biology

VCE BIOLOGY Relationship between the key knowledge and key skills of the Study Design and the Study Design

Hiromi Nishida. 1. Introduction. 2. Materials and Methods

Introduction to Bioinformatics Online Course: IBT

Protein Architecture V: Evolution, Function & Classification. Lecture 9: Amino acid use units. Caveat: collagen is a. Margaret A. Daugherty.

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

Temporal Trails of Natural Selection in Human Mitogenomes. Author. Published. Journal Title DOI. Copyright Statement.

O 3 O 4 O 5. q 3. q 4. Transition

Quantifying sequence similarity

Supplementary Materials for

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

Computational methods for predicting protein-protein interactions

Warm Up. What are some examples of living things? Describe the characteristics of living things

Phylogenetic Tree Generation using Different Scoring Methods

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Sequence Alignment Techniques and Their Uses

Microbiology / Active Lecture Questions Chapter 10 Classification of Microorganisms 1 Chapter 10 Classification of Microorganisms

PAMP-triggered immunity (PTI)

Comparative genomics: Overview & Tools + MUMmer algorithm

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

AP BIOLOGY SUMMER ASSIGNMENT

Introduction to Bioinformatics

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Quantitative Measurement of Genome-wide Protein Domain Co-occurrence of Transcription Factors

a,bD (modules 1 and 10 are required)

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

5/4/05 Biol 473 lecture

ADVANCED PLACEMENT BIOLOGY

Molecular evolution - Part 1. Pawan Dhar BII


Genome-wide analysis of the MYB transcription factor superfamily in soybean

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Subfamily HMMS in Functional Genomics. D. Brown, N. Krishnamurthy, J.M. Dale, W. Christopher, and K. Sjölander

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Understanding relationship between homologous sequences

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Origins of Life. Fundamental Properties of Life. Conditions on Early Earth. Evolution of Cells. The Tree of Life

Sequence Based Bioinformatics

Essential knowledge 1.A.2: Natural selection

Letter to the Editor. Department of Biology, Arizona State University

What can sequences tell us?

Leber s Hereditary Optic Neuropathy

Tracing the origin and evolutionary history of plant nucleotide-binding site leucine-rich repeat (NBS-LRR) genes

Biological Systems: Open Access

Transcription:

Indian Journal of Biotechnology Vol 12, January 2013, pp 98-102 In silico analysis of the NBS protein family in Ectocarpus siliculosus Niaz Mahmood 1 * and Mahdi Muhammad Moosa 1,2 1 Molecular Biology Laboratory, Department of Biochemistry and Molecular Biology, University of Dhaka Ramna, Dhaka 1000, Bangladesh 2 Graduate Program in Biological Sciences, The Scripps Research Institute, 10550 North Torrey Pines Road La Jolla, California 92037, USA Nucleotide-binding site (NBS) domain containing proteins belong to one of the most well characterized family of proteins; they are found in almost all higher eukaryotes. Extensive studies have been done on the plant NBS proteins, but similar studies on the brown algae NBS proteins are not available. In the present study, authors examined the diversity of NBS proteins in model brown algae, Ectocarpus siliculosus. A total of twenty six NBS proteins were identified and classified into different subfamilies based on their distinct domain organizations. Although many characteristics of the protein family are similar to those of plant species, several features are quite distinct. One such characteristic is the presence of tetratrico peptide repeat (TPR) motifs at the C-terminal ends of these proteins. Another interesting finding is the presence of two E. siliculosus specific conserved motifs leading to novel combination of the NBS domain. The remarkable structural diversity found among these proteins further strengthens the idea that diversifying selection may have played an important role in their evolution. Keywords: Brown algae, Ectocarpus siliculosus, NBS, TPR Introduction Nucleotide-binding site (NBS) containing proteins have been identified to play a vital role in defense mechanisms in a wide range of species, from humans to plants 1. In metazoans, proteins containing this domain, like the human APAF-1 (apoptotic protease activating factor 1) and Caenorhabditis elegans CED- 4 (cell death 4), are involved in regulating programmed cell death 2. In plants, this domain is found in disease resistant NBS-LRRs (nucleotidebinding site leucine-rich repeats), which are proteins that play role in mediating a hypersensitive response at the site of pathogen infection 3. Typically, the NBS region is well conserved across different species which is marked by the presence of several conserved motifs. These include a p-loop (kinase 1), kinase 2, kinase 3 and other short motifs of unknown functions 2. The function of kinase 1 and kinase 2 is to bind with the phosphates of ATP, whereas kinase 3 interacts with the purine or ribose moiety 4. On the other hand, the C-termini of the *Author for correspondence: E-mail: niazmahmood.ami@gmail.com Note: After presentation of this paper in the conference, Zambounis et al [Mol Biol Evol, 29 (2012) 1263-1276] published an article containing detailed analysis of the same family. However, authors would like to declare no conflict of interest with any one. proteins are highly variable and are marked by the presence of different types of protein-protein interacting domains; the most common of which is the leucine-rich repeats (LRRs) found in plant NBS-LRR 5 and human NLR (nucleotide-binding domain, leucinerich repeat containing) family of proteins. However, other domains may also be present; as for example human APAF-1 proteins contain WD-40 repeats at their C-termini 1. A plethora of information on the structural and functional aspect of the NBS proteins in other lineages is available, but not much has been found for their counterparts in brown algae. Ectocarpus siliculosus is the only member of this lineage with full genome sequence available. Thus, a genome wide investigation on the NBS containing sequences in E. siliculosus was performed in the present study. Materials and Methods Identification and Classification of NBS Containing Proteins The E. siliculosus protein sequences were obtained from the database of Bioinformatics Gent Online Genome Annotation System (BOGAS; https://bioinformatics.psb.ugent.be/gdb/ectocarpus/). To identify the NBS domain containing sequences, InterProScan 6 searches using models for the NB-ARC (IPR002182) and disease resistance (IPR000767)

MAHMOOD & MOOSA: NBS PROTEINS IN E. SILICULOSUS 99 domains were used. The identified domains were curated manually. The Coiled Coil (CC) motifs were identified by COILS program 7 using a threshold of 0.9. Conserved motifs within the NBS domains were identified using MEME (Multiple Expectation Maximization for Motif Elicitation) 8,9. These detailed data on protein motifs and domains were used to classify the proteins into different subgroups. Sequence Alignment and Phylogenetic Analysis For the alignment, complete predicted protein sequences were trimmed in such a way that only the NBS domain containing regions are present. Then the sequences were aligned by Clustal X version 2.0 10 with default options and phylogenetic tree was constructed based on the bootstrap neighbor-joining method 11 using Molecular Evolutionary Genetics Analysis (MEGA) software version 4.1 12. The stability of internal nodes was assessed by bootstrap analysis with 10,000 replicates. Substitution Pattern of NBS Region in E. siliculosus Synonymous and non-synonymous substitution pattern was determined using the modified Nei- Gojobori 13 method with the Jukes-Cantor correction as implemented in MEGA 4.1 12. Results and Discussion E. siliculosus NBS Proteins are Few in Number but Diverse in Class Thirty one prospective proteins with highly conserved NBS regions were initially identified in E. siliculosus. Five of these proteins were subsequently found to be the products of pseudogenes and were excluded from further analysis. The percentage of pseudogenes (~16%) is quite high considering the total number of NBS proteins identified and is reminiscent of the fast evolution patterns described for plant disease resistance genes 14. Unlike their counterparts in Viridiplantae, the NBS proteins in E. siliculosus do not harbour C-terminal leucine rich repeats (LRRs), which play role in pathogen effector binding and signal transduction regulation. Instead of LRRs they have tetratrico peptide repeats (TPR) at their C-termini. Such TPR motifs are found in proteins from a wide range of organisms, ranging from bacteria to humans, and are involved in mediating protein-protein interactions and the assembly of multi protein complexes 15. With exception to four proteins (GenBank ID CBN79515, CBN74455, CBJ27327 & CBJ33350), all the E. siliculosus NBS proteins have TPRs at their C-termini. The number of TPRs within each members of NBS-TPR subfamily varied greatly, ranging from as high as 18 repeats in CBJ25655 to as low as one repeat in CBN79956 and CBJ26776; with an overall average of seven repeats. However, the exact reason for selecting TPRs instead of LRRs in the NBS proteins is still not clear. It is known that the TPRs can act as scaffolds for signaling proteins for helping in the recognition of their targets; hence, defining a novel mechanism for protein recognition 16. It is possible that different signaling pathways are activated by these proteins upon brown algae specific pathogen attack. Cock et al 17 in their study found that the E. siliculosus NBS proteins do not have any toll interleukin receptor (TIR) at their N-termini 17 and the only known motif found at that region is the CC motif. COILS 7 detected the presence of CC motifs in nine out of the twenty six NBS proteins. Previous studies suggested the involvement of the CC motif in proteinprotein interaction and signaling 18. Based on the differences in the N-terminal and C-terminal regions, the sequences were classified into four classes: NBS, NBS-TPR, CC-NBS-TPR and CC-NBS (Table 1). Clearly, the E. siliculosus NBS proteins are quite different in their structure compared to their counterparts in other species and are also significantly lower in number. NBS accounts for 0.16% of the total number of proteins in E. siliculosus, whereas the percentage is 1.38 and 0.68%, respectively in photosynthetic organisms like rice and Arabidopsis. Ancient duplication of the entire genome and subsequent chromosome rearrangements played a key role in the amplification of resistance genes in Arabidopsis 19,20, and gene duplication caused by expansion of diversity played a role in the amplification of NBS proteins in rice 21. Brown algae had diverged from other multicellular organisms more than a billion years ago and were subjected to Table 1 Classification of E. siliculosus NBS proteins based on the predicted domains Predicted protein Letter code E. siliculosus With TPRs CC-NBS-TPR CNT 08 NBS-TPR NT 14 Without TPRs CC-NBS CN 01 NBS N 03 Total NBS 26

100 INDIAN J BIOTECHNOL, JANUARY 2013 different selective pressure 22. So the possibility of developing alternate means of pathogen resistance cannot be over ruled. Conserved Motifs in NBS Region Irrespective of their differential classification, the identified proteins showed conservation in the NBS region. MEME identified several conserved motifs like the p-loop, RNBS-A, kinase 2, kinase 3 and RNBS-C motifs within the NBS regions of E. siliculosus; most of which were previously described in other species (Table 2). The hydrophobic GLPL domain found in other species was slightly modified into GHLPL (example CBN75740, CBJ26207) in E. siliculosus. In addition, we found two E. siliculosus specific conserved motifs within the NBS region. To our knowledge, these two motifs were not described in other organisms. We named the motifs as ENBS 1 and ENBS 2. While the ENBS 1 motif was found between kinase 2 and kinase 3 motifs, the ENBS 2 was mostly found after the GHLPL motif. However, the functions of these two noble motifs remain to be elucidated experimentally. The position of the motifs within the NBS domain is shown in Fig. 1. The length and order of motifs represent the actual organization in each protein. Table 2 Sequence of the major conserved motifs within E. siliculosus NBS domains Motif Motif name Sequence in E. siliculosus 1 P-loop or Kinase 1 GPSGAGKSTIAS 2 RNBS-A VRRHFRDGIFWL 3 Kinase 2 KCLVVADNVWE 4 ENBS 1 GKGAKDRLPALM 5 Kinase 3 or RNBS-B TGFHVLVTTRQR 6 RNBS-C EEEALELLRKTS 7 GHLPL CGHLPLVLAIAG 8 ENBS 2 RWSTVRGRSDRT Fig. 1 Distribution of conserved motifs within the NBS domains as identified by MEME. The name of each member and combined P value are shown on the left side of the figure. Different motifs are indicated with different colour boxes.

MAHMOOD & MOOSA: NBS PROTEINS IN E. SILICULOSUS 101 Evolutionary Analyses To clarify the phylogenetic relationship among the NBS proteins and infer the evolutionary history of this family, a phylogenetic tree was constructed using the protein sequences of the NBS region. The tree divided the sequences into two major clusters; with two sequences (CBN79515, CBJ26776) placed outside these clusters due to significant variation in their sequences (Fig. 2a). Most members of the same subfamilies were clustered together in different groups. Another tree has been constructed using the NBS regions of model organisms like rice, human and C. elegans as reference along with their E. siliculosus counterparts (Fig. 2b). This time the tree was generated from the Hidden Markov Model (HMM) based multiple sequence alignments of the sequences done by SATCHMO-JS 23. This tree showed almost similar distribution of the E. siliculosus NBS sequences in consistent with the previous tree (Fig. 2 a) and the sequences from other species were placed separately as out groups (Fig. 2 b). To analyze the selection pressure, nucleotide sequences within the NBS region was retrieved and synonymous and non-synonymous substitution pattern was determined, which was found to be 1.33. This indicated an overall positive selection within the NBS regions of the genes. Conclusion NBS domains are conserved throughout different eukaryote lineages. Proteins having this domain are known to function in a number of cellular processes, such as, regulating programmed cell death in metazoans and disease resistance in plants. Our analyses of NBS proteins from model brown algae, E. siliculosus, revealed that despite similarities with other lineages, brown algae have distinct NBS domain organization. Further experimental studies can elucidate functional and evolutionary significance of the novel domain organization. Fig. 2 (a & b) (a) Phylogenetic tree of the E. siliculosus NBS sequences based on the amino acid sequences in the NBS domain; & (b) Comparative phylogenetic tree using NBS regions of model organisms like rice (acc.no. NP_001044526), human (NP_863651) and C. elegans (CCD66782) as reference along with their E. siliculosus counterparts. This tree does not show the bootstrap values as it was generated by a different algorithm. (Black diamond NBS-TPR; white diamond NBS; black triangle CC-NBS-TPR; white triangle CC-NBS family of proteins; & black circle reference sequences) Acknowledgement The authors would like to thank Professor Haseena Khan, Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka for her constant support. References 1 Ting J P, Willingham S B & Bergstralh D T, NLRs at the intersection of cell death and immunity, Nat Rev Immunol, 8 (2008) 372-379.

102 INDIAN J BIOTECHNOL, JANUARY 2013 2 van der Biezen E A & Jones J D, The NB-ARC domain: A novel signalling motif shared by plant resistance gene products and regulators of cell death in animals, Curr Biol, 8 (1998) 226-227. 3 Dhaliwal H & Uchimiya H, Genetic engineering for disease and pest resistance in plants, Plant Biotechnol (Tokyo), 16 (1999) 255-262. 4 Traut T W, The functions and consensus motifs of nine types of peptide segments that form different types of nucleotidebinding sites, Eur J Biochem, 222 (1994) 9-19. 5 Dangl J L & Jones J D G, Plant pathogens and integrated defence responses to infection, Nature (Lond), 411 (2001) 826-833. 6 Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N et al, InterProScan: Protein domains identifier, Nucleic Acids Res, 33 (2005) 116-120. 7 Lupas A, Van Dyke M & Stock J, Predicting coiled coils from protein sequences, Science, 252 (1991) 1162-1164. 8 Bailey T L & Elkan C, Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proc Int Conf Intell Syst Mol Biol, 2 (1994) 28-36. 9 Bailey T L, Williams N, Misleh C & Li W W, MEME: Discovering and analyzing DNA and protein sequence motifs, Nucleic Acids Res, 34 (2006) 369-373. 10 Larkin M A, Blackshields G, Brown N P, Chenna R, McGettigan P A et al, Clustal W and Clustal X version 2.0, Bioinformatics, 23 (2007) 2947-2948. 11 Saitou N & Nei M, The neighbor-joining method: A new method for reconstructing phylogenetic trees, Mol Biol Evol, 4 (1987) 406-425. 12 Tamura K, Dudley J, Nei M & Kumar S, MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0, Mol Biol Evol, 24 (2007) 1596-1599. 13 Zhang J, Rosenberg H F & Nei M, Positive Darwinian selection after gene duplication in primate ribonuclease genes, Proc Natl Acad Sci USA, 95 (1998) 3708-3713. 14 Meyers B C, Kaushik S & Nandety R S, Evolving disease resistance genes, Curr Opin Plant Biol, 8 (2005) 129-134. 15 D'Andrea L D & Regan L, TPR proteins: The versatile helix, Trends Biochem Sci, 28 (2003) 655-662. 16 Das A K, Cohen P T W & Barford D, The structure of the tetratricopeptide repeats of protein phosphatase 5: Implications for TPR-mediated protein-protein interactions, EMBO J, 17 (1998) 1192-1199. 17 Cock J M, Sterck L, Rouzé P, Scornet D, Allen A E et al, The Ectocarpus genome and the independent evolution of multicellularity in brown algae, Nature (Lond), 465 (2010) 617-621. 18 Martin G B, Bogdanove A J & Sessa G, Understanding the functions of plant disease resistance proteins, Annu Rev Plant Biol, 54 (2003) 23-61. 19 Meyers B C, Kozik A, Griego A, Kuang H & Michelmore R W, Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis, Plant Cell, 15 (2003) 809-834. 20 Richly E, Kurth J & Leister D, Mode of amplification and reorganization of resistance genes during recent Arabidopsis thaliana evolution, Mol Biol Evol, 19 (2002) 76-84. 21 Zhou T, Wang Y, Chen J-Q, Araki H, Jing Z et al, Genomewide identification of NBS genes in japonica rice reveals significant expansion of divergent non-tir NBS-LRR genes, Mol Genet Genomics, 271 (2004) 402-415. 22 Yoon H S, Hackett J D, Ciniglia C, Pinto G & Bhattacharya D, A molecular timeline for the origin of photosynthetic eukaryotes, Mol Biol Evol, 21 (2004) 809-818. 23 Hagopian R, Davidson J R, Datta R S, Samad B, Jarvis G R et al, SATCHMO-JS: A webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction, Nucleic Acids Res,38 (2010) 29-34.