We used the PSI-BLAST program (http://www.ncbi.nlm.nih.gov/blast/) to search the

Similar documents
Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor

From Amino Acids to Proteins - in 4 Easy Steps

Patrick: An Introduction to Medicinal Chemistry 5e Chapter 04

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Protein Secondary Structure Prediction

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

CAP 5510 Lecture 3 Protein Structures

SUPPLEMENTARY INFORMATION

Bioinformatics: Secondary Structure Prediction

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Homology models of the tetramerization domain of six eukaryotic voltage-gated potassium channels Kv1.1-Kv1.6

Bioinformatics: Secondary Structure Prediction

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Week 10: Homology Modelling (II) - HHpred

Modeling for 3D structure prediction

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

IT og Sundhed 2010/11

Analysis and Prediction of Protein Structure (I)

ALL LECTURES IN SB Introduction

1. Amino Acids and Peptides Structures and Properties

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Protein structure alignments

Protein Structure Basics

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration:

Supplementary Note on In Silico Protein Analysis

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Getting To Know Your Protein

CS612 - Algorithms in Bioinformatics

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Protein Struktur (optional, flexible)

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Sequence analysis and comparison

Molecular modeling. A fragment sequence of 24 residues encompassing the region of interest of WT-

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Homology and Information Gathering and Domain Annotation for Proteins

C h a p t e r 2 A n a l y s i s o f s o m e S e q u e n c e... methods use different attributes related to mis sense mutations such as

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Heteropolymer. Mostly in regular secondary structure

Structural analysis of the EGR family of transcription factors: Templates for predicitng protein - DNA internations

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES

Protein Structure Prediction, Engineering & Design CHEM 430

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Packing of Secondary Structures

Properties of amino acids in proteins

Orientational degeneracy in the presence of one alignment tensor.

Genomics and bioinformatics summary. Finding genes -- computer searches

CSCE555 Bioinformatics. Protein Function Annotation

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Lecture 7. Protein Secondary Structure Prediction. Secondary Structure DSSP. Master Course DNA/Protein Structurefunction.

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

NGF - twenty years a-growing

Structure to Function. Molecular Bioinformatics, X3, 2006

FlexPepDock In a nutshell

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

SUPPLEMENTARY MATERIALS

Measuring quaternary structure similarity using global versus local measures.

Protein Structures: Experiments and Modeling. Patrice Koehl

DATE A DAtabase of TIM Barrel Enzymes

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

Protein Secondary Structure Prediction

Useful background reading

Protein Structure Marianne Øksnes Dalheim, PhD candidate Biopolymers, TBT4135, Autumn 2013

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Protein Structure. Role of (bio)informatics in drug discovery. Bioinformatics

Pymol Practial Guide

D Dobbs ISU - BCB 444/544X 1

Structure of the SPRY domain of human DDX1 helicase, a putative interaction platform within a DEAD-box protein

Protein Secondary Structure Assignment and Prediction

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Transmembrane Domains (TMDs) of ABC transporters

DATA MINING OF ELECTROSTATIC INTERACTIONS BETWEEN AMINO ACIDS IN COILED-COIL PROTEINS USING THE STABLE COIL ALGORITHM ANKUR S.

Basics of protein structure

Template Free Protein Structure Modeling Jianlin Cheng, PhD

SUPPLEMENTARY INFORMATION

7 Protein secondary structure

Chemistry Chapter 22

8 Protein secondary structure

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Lipid Regulated Intramolecular Conformational Dynamics of SNARE-Protein Ykt6

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Structural characterization of NiV N 0 P in solution and in crystal.

3D Structure. Prediction & Assessment Pt. 2. David Wishart 3-41 Athabasca Hall

Supplementary Figure 1. Aligned sequences of yeast IDH1 (top) and IDH2 (bottom) with isocitrate

Transcription:

SUPPLEMENTARY METHODS - in silico protein analysis We used the PSI-BLAST program (http://www.ncbi.nlm.nih.gov/blast/) to search the Protein Data Bank (PDB, http://www.rcsb.org/pdb/) and the NCBI non-redundant (NR) database for NELL1 homologs. Protein domain architectures were retrieved from the databases Pfam (http://www.sanger.ac.uk/software/pfam) and SMART (http://smart.embl.de/) and examined through the NCBI conserved domain search website (http://www.ncbi.nlm.nih.gov:80/structure/cdd/wrpsb.cgi). Using MUSCLE (http://phylogenomics.berkeley.edu/cgi-bin/muscle/input_muscle.py), we created a multiple sequence alignment of the following protein sequences from the UniProt database: NELL1: human, Q92832; mouse, Q2VWQ2; rat, Q62919; NELL2: human, Q99435; mouse, Q61220; rat, Q62918; frog, Q7ZXL5; NEL chicken, Q90827; TSP1 human, P07996 (PDB identifier 1z78, chain A). The secondary structure assignment of the PDB structure 1z78 was obtained from the DSSP database (http://www.cmbi.kun.nl/gv/dssp/) and added to the alignment. To predict the secondary structure of NELL1, we contacted the protein structure prediction server PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/). The cleavage site of the N-terminal signal peptide in NELL1 was predicted by the SignalP server (http://www.cbs.dtu.dk/services/signalp/). The sequence-structure alignment figure was prepared using GeneDoc (http://www.psc.edu/biomed/genedoc/). To predict the 3D structure of the NELL1 gene product, we used the fold recognition servers GenTHREADER (http://bioinf.cs.ucl.ac.uk/psipred/) and FFAS03 (http://ffas.ljcrf.edu). In agreement with the consistent results returned by PSI-BLAST, GenTHREADER, and FFAS03, we selected the thrombospondin-1 (TSP1) N-terminal domain (TSPN, PDB code 1z78, chain A) to model the 3D structure of NELL1. The structure of TSPN comprises the residues 28-233 (UniProt sequence P07996).

Based upon the 3D prediction results, a pairwise sequence-structure alignment of human NELL1 and TSPN was constructed and formed the input into the 3D-modeling server WHAT IF (http://swift.cmbi.kun.nl/wiwwwi/), which returned a full-atom structure model of the N-terminal domain of NELL1. The protein structure image of the model was illustrated using Yasara (http://www.yasara.org) and POV-Ray (http://www.povray.org). SUPPLEMENTARY RESULTS AND DISCUSSION in silico protein analysis Multiple sequence alignment of NELL homologs Based on the consistent and very reliable results returned by the alignment methods PSI-BLAST, Pfam, SMART, GenTHREADER and FFAS03, we assembled a structure-based multiple sequence alignment (Figure S4) of the N-terminal domain of NELL homologs and the thrombospondin-1 (TSP1) N-terminal domain (TSPN). The secondary structure prediction by PSIPRED agreed well with the predicted 3D structure of NELL1 (Figure S6). TSP1 is a secreted protein that binds to a variety of extracellular matrix components and proteins involved in the regulation of cell growth and inflammatory response[2,3]. The domain composition of TSP1 bears striking resemblance to NELL proteins, which suggests functional similarities between TSP1 and NELL1 and NELL2 (Kuroda BBRC_1999, PMID 10548494). Domains in common include a well-conserved N-terminal TSPN domain connected via a coiled coil region to von Willebrand factor, type C domains (VWC), which are followed by epidermal growth factor (EGF) domains (Figure 4A). The N-terminal domains of NELL1 and TSPN share 20% identical amino acids. The equivalent TSPN cysteines of the two NELL1 cysteines C167 and C220 form a disulfide bond in TSPN and stabilize a C-terminal helix[4]. Interestingly, the resulting arginine of the NELL1 variant Q82R is identical in all NELL homologs and TSP-1 except human NELL1, which has the glutamine. The original arginine

and alanine of the two variants R136S and A153T, respectively, are strictly conserved in all NELL1 orthologs, but not in NELL2 orthologs and TSPN (Figure S4A and S4B). R136 is replaced by a histidine in human, mouse, rat NELL2, TSPN, and by a glutamine in frog NELL2. The amino acid change R136S could not be observed in any homologous sequence. While A153 is replaced by a serine in NELL2 orthologs, it is a threonine in TSPN. A threonine is also found in other remotely related proteins at equivalent positions, for instance, in the laminin G-like domains of Gas6[5] and in LAMA2[6]. The variant R354W is located outside the TSPN domain within a VWC domain predicted by Pfam (identifier PF00093) and SMART (identifier SM00214). As shown in Figure S4B, the arginine R354 is identical in all NELL1 homologs, but changed to a serine in NELL2 homologs. Since VWC domains occur in numerous proteins of diverse functions and are generally assumed to be involved in protein oligomerization[7], the variant R354W may interfere with NELL1 trimerization[8]. 3D structure model We investigated the effects of the variants Q82R, R136S, and A153T on the structure of the predicted NELL1 N-terminal TSP domain in silico. To this end, we extracted a pairwise sequence-structure alignment of human NELL1 to the crystal structure of the human thrombospondin-1 N-terminal domain (TSPN) from the multiple sequence alignment (Figure S4A) and applied the WHAT IF server to construct a 3D model of the N-terminal domain of NELL1 using the PDB structure of TSPN (PDB code 1z78, chain A) as template (Figure 2B). The TSPN structure is a β-sandwich with 13 anti-parallel β-strands, adopting the concanavalin A-like lectins/glucanases fold[4]. Many amino acids of the hydrophobic core are conserved between TSPN and NELL1, including two cysteines forming a disulfide bond in the TSPN structure. The variant Q82R changing a neutral glutamine to a positively charged arginine is located outside the β-sandwich in an α-helix. The respective arginine in the TSPN domain is part of a

small patch of positive charge on the surface of the protein in a hypothetical heparin-binding site[4]. The variant R136S changes a positively charged arginine to a polar serine on the surface of the protein. The corresponding position 139 in the TSPN domain is also a polar histidine. As in case of Q82R, charge changes on the surface of a domain may lead to the loss of molecular interactions. Variant A153T is located close to the two C-terminal cysteines forming a disulfide bond in the TSPN structure. The structural role of this bond is the stabilization of an α-helix located at the C-terminus of the domain[4]. NELL homologs lack four amino acids in this helical region, and the secondary structure prediction of NELL1 indicates a loop. Based on the conserved cysteines and our close inspection of the atomic distances in this region, we propose a substitution of the TSPN helix with a loop structure, which buries roughly the same amino acids as observed in TSPN. Presumably, the variant A153T is located opposite the proposed loop and thus may be buried.

SUPPLEMENTARY REFERENCES 1. Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263-265. 2. Bornstein P (1995) Diversity of function is inherent in matricellular proteins: an appraisal of thrombospondin 1. J Cell Biol 130: 503-506. 3. Adams JC, Lawler J (2004) The thrombospondins. Int J Biochem Cell Biol 36: 961-968. 4. Tan K, Duquette M, Liu JH, Zhang R, Joachimiak A, et al. (2006) The structures of the thrombospondin-1 N-terminal domain and its complex with a synthetic pentameric heparin. Structure 14: 33-42. 5. Sasaki T, Knyazev PG, Cheburkin Y, Gohring W, Tisi D, et al. (2002) Crystal structure of a C-terminal fragment of growth arrest-specific protein Gas6. Receptor tyrosine kinase activation by laminin G-like domains. J Biol Chem 277: 44164-44170. 6. Hohenester E, Tisi D, Talts JF, Timpl R (1999) The crystal structure of a laminin G-like module reveals the molecular basis of alpha-dystroglycan binding to laminins, perlecan, and agrin. Mol Cell 4: 783-792. 7. Voorberg J, Fontijn R, Calafat J, Janssen H, van Mourik JA, et al. (1991) Assembly and routing of von Willebrand factor variants: the requirements for disulfide-linked dimerization reside within the carboxy-terminal 151 amino acids. J Cell Biol 113: 195-205. 8. Kuroda S, Oyasu M, Kawakami M, Kanayama N, Tanizawa K, et al. (1999) Biochemical characterization and expression analysis of neural thrombospondin-1-like proteins NELL1 and NELL2. Biochem Biophys Res Commun 265: 79-86.