Heteropolymer. Mostly in regular secondary structure

Similar documents
remembering Secondary Structures Does everyone know what the backbone and residue/side chains are? Clear about 1, 2 3 structures?

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Packing of Secondary Structures

Protein Structure: Data Bases and Classification Ingo Ruczinski

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

Supersecondary Structures (structural motifs)

Introduction to Comparative Protein Modeling. Chapter 4 Part I

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Protein structure alignments

Physiochemical Properties of Residues

BIRKBECK COLLEGE (University of London)

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Objective: Students will be able identify peptide bonds in proteins and describe the overall reaction between amino acids that create peptide bonds.

Model Mélange. Physical Models of Peptides and Proteins

Major Types of Association of Proteins with Cell Membranes. From Alberts et al

Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy

The CATH Database provides insights into protein structure/function relationships

CS612 - Algorithms in Bioinformatics

A General Model for Amino Acid Interaction Networks

Supporting information to: Time-resolved observation of protein allosteric communication. Sebastian Buchenberg, Florian Sittel and Gerhard Stock 1

Basic structures of proteins

D Dobbs ISU - BCB 444/544X 1

Protein Structure. Role of (bio)informatics in drug discovery. Bioinformatics

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Analysis and Prediction of Protein Structure (I)

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

Structural Alignment of Proteins

Study of Mining Protein Structural Properties and its Application

Protein Structure & Motifs

Protein Structure Prediction and Display

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Protein structure analysis. Risto Laakso 10th January 2005

Bahnson Biochemistry Cume, April 8, 2006 The Structural Biology of Signal Transduction

Homology and Information Gathering and Domain Annotation for Proteins

Any protein that can be labelled by both procedures must be a transmembrane protein.

ALL LECTURES IN SB Introduction

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Structure to Function. Molecular Bioinformatics, X3, 2006

DATE A DAtabase of TIM Barrel Enzymes

Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description. Version Document Published by the wwpdb

Lecture 10 (10/4/17) Lecture 10 (10/4/17)

Protein Structure Prediction

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Motif Prediction in Amino Acid Interaction Networks

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Properties of amino acids in proteins

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Basics of protein structure

Chapter 2 Structures. 2.1 Introduction Storing Protein Structures The PDB File Format

Sequential resonance assignments in (small) proteins: homonuclear method 2º structure determination

Peptides And Proteins

Central Dogma. modifications genome transcriptome proteome

Basic Principles of Protein Structures

Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two

Supplementary Figure 1. Aligned sequences of yeast IDH1 (top) and IDH2 (bottom) with isocitrate

Bioinformatics Practical for Biochemists

NUCLEOTIDE BINDING ENZYMES

B O C 4 H 2 O O. NOTE: The reaction proceeds with a carbonium ion stabilized on the C 1 of sugar A.

Homology. and. Information Gathering and Domain Annotation for Proteins

Analysis on sliding helices and strands in protein structural comparisons: A case study with protein kinases

Details of Protein Structure

Hidden symmetries in primary sequences of small α proteins

1. What is an ångstrom unit, and why is it used to describe molecular structures?

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Supplemental Materials for. Structural Diversity of Protein Segments Follows a Power-law Distribution

MSAT a Multiple Sequence Alignment tool based on TOPS

Getting To Know Your Protein

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

What makes a good graphene-binding peptide? Adsorption of amino acids and peptides at aqueous graphene interfaces: Electronic Supplementary

Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

Review. Membrane proteins. Membrane transport

Orientational degeneracy in the presence of one alignment tensor.

Membrane proteins Porins: FadL. Oriol Solà, Dimitri Ivancic, Daniel Folch, Marc Olivella

EBI web resources II: Ensembl and InterPro

Large-Scale Genomic Surveys

Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27

Translation. A ribosome, mrna, and trna.

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach

Protein Structure Bioinformatics Introduction

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

SUPPLEMENTARY INFORMATION

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Genome Databases The CATH database

Bioinformatics. Macromolecular structure

Sequence analysis and comparison

Amino Acids: General Properties

CAP 5510 Lecture 3 Protein Structures

Lecture 10: Cyclins, cyclin kinases and cell division

Computer simulations of protein folding with a small number of distance restraints

Supporting Online Material for

Transcription:

Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4

C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6! 7/'8#9!40#1:!;%0!0;!0/'!4#5'<!,4!=#'>'5!#6!,! &#??;6!5#,$&,-!4/;>6!;6!0/'!8'@0*!!A6!0/'!&#$/0!#4!,!/'8#1,8!>/''8!=#'>!;@!,6! 7/'8#9! >#0/!0/'!(B!&'4#5%'4!4/;>6!,?;='*!!3/#4!'9,-.8'!4/;>4!,6!,-./#.,0/#1!/'8#9<!>#0/!.;8,&!,65!6;67.;8,&!&'4#5%'4!;6!;..;4#0'!4#5'4*!!3/'!/'8#1,8!>/''8!#4!0,:'6!@&;-!,! C,=,!D..8'0!>&#00'6!?E!F5>,&5!G*!AHI'#8!,65!J/,&8'4!K*!L&#4/,-!MN6#='&4#0E!;@! O#&$#6#,!#6!J/,&8;00'4=#88'<!O#&$#6#,P2! /00.2QQ10#*#01*O#&$#6#,*FRNQS1-$QR'-;Q>/''8Q>/''8D..*/0-8*!, Each side can have different properties All of the amino acids are on the outside Gennis 1f3c 31-50 7 8

notice up-down-up-down the boxes show amino acids 9 10 11 12

Motifs Scop: Mruzin et at JMB 1995 (Janet Thornton) Cath: Orengo et al Structure 1997 (Cyrus Chothia) Each starts with domains 13 Proteins are made of domains. A domain is a structural and an evolutionary unit. They have 50-200 residues. Domains that are families or superfamilies come from a common ancestor. similar sequence - family diverged sequence but similar fold and function - superfamily Chothia and Gough (Biochem J (2009) 419, 15-28 14 www.sciencemag.org SCIENCE VOL 300 13 JUNE 2003 Figure 3. Glycosyl Hydrolases (A C) (A) (1/3)-b-glucanase (Varghese et al., 1994) represents the basic (Trans) glycosidases superfamily (c.1.8). Homologous catalytic domains are found in (B) b-glucuronidase and (C) b-galactosidase. (B) In b-glucuronidase (Jain et al., 1996), the catalytic domain is 3 (in red) and is joined by two other domains: 1 restricts the binding site, and 2 links 1 to 3. (C) b-galactosidase.the first three domains have the same structure as b-glucuronidase (Jacobson et al., 1994). Domain 4 links domain 3 to 5, which contributes to the active site. Bashton and Chothia: structure 15: 85-99 (2007) 15 16

Dominant mechanisms that produce new proteins are Duplication of the genes of old proteins divergence of these sequences to produce modified functions Some superfamilies have many protein domains found (9 take up 20% of the human genome) and others have few. There are 800-1000 superfamilies in animals; bacterial 250-700. combination of genes to further modify properties Many superfamilies are found in all kingdoms of life. Chothia and Gough (Biochem J (2009) 419, 15-28 17 Chothia and Gough (Biochem J (2009) 419, 15-28 18 Classification: based on structure and sequence Class (C-level): secondary structure composition and contacts. The first, most general level of the classification, class, describes the relative content of! helices and " sheets in a similar way to that described by Levitt and Chothia [29], except that we only define three major classes mainly!, mainly " and! ". Although the latter class can be sub- divided into alternating!/" and!+", in CATH, this information is considered at a lower level describing topology. Architecture (A-level): description of the gross arrangement of secondary structures, independent of connectivity This level distinguishes structures in the same class with different architectures, but does not distinguish between different topologies (connectivities). The architectural groupings can sometimes be rather broad as they describe general features of protein-fold shape, for example, the number of layers in an!-" sandwich. A given architecture will contain structures with diverse connectivities (see Figure 2) which will be distinguished at the next level down (topology). For example, in the!-" class (C = 3), there are two common architectures each containing a large number of different fold families. One is the barrel- like architecture (A = 20) adopted, (egtim-barrel folds). These have an inner " barrel and an outer layer of! helices (Figure 2). Alternatively, the three-layer!-" sandwich architecture (A = 40) consists of a central " sheet which is covered by a layer of! helices on both sides of the sheet (Figure 2). Topology (T-level): fold families Structures which are grouped at the T-level have the same overall fold, which means that they have a similar number and arrangement of secondary structures and that the connectivity linking their secondary structure elements is the same. In this paper, the words fold and topology have the same meaning. Proteins with the same CAT numbers have the same class, architecture and topology but do not necessarily belong to the same homologous superfamily.within a given topology level, the structures are similar, but may have diverse functions. Homologous superfamily (H-level): highly similar structures and functional similarity At the H-level, structures are grouped by their high structural similarity and similar functions, which suggest that they may have evolved from a common ancestor, particularly, where there are resemblances in core packing or putative active sites. Using the example of the mainly!.non-bundle. globin-like folds the erythrocruorins, colicins, phycocya- nins and domain 1 of diptheria toxin all have the same CAT number (1.10.340), but are differentiated by their H numbers 10, 20, 30 and 40, respectively (see Figure 3). Sequence family (S-level): significant sequence similarity and thus a high probability of having similar structure/function Members which are clustered at this level (having the same CATHS number) have sequence identities >35% and as such are presumed to have extremely similar structures and functions they may be slightly different examples of the same protein from different species belonging to the same sequence superfamily. 19 CATH Class: α,β,αβ Architecture: gross arrangement of 2 structure independent of connectivity Topology: Fold family linking of 2 structure Fold=Topology Homologous superfamily structure similar function similar Sequence family >35% identity Scop Class: α,β,αβ,α+β Fold same 2 structure elements same topology not related Superfamily Common evolutionary origin low seq identity Family >30% identical or >15% with same function 20

21 22 9-14 17-21 31-36 Illustration of motif overlaps in the mainly! 46-51 sandwich architecture. Each structure shown can be related to the central tenascin structure by a motif containing at56-60 least four! strands (although these are not sequential in the transthyretin structure) up to seven! strands in plastocyanin and the 66-76 immunoglobulin variable domain structures. It can be seen that this results in the possible merging of the immunoglobulin fold family 88-94 (2rhe) and the jelly-roll fold family (1tnfA) ure 1997, Vol 5 No 8 through overlap of a large motif containing five! strands. This is not currently done in C ATH, as both families are commonly referred to as separate folds in the literature. 1TTF.pdb SHEET 1 SHEET 2 SHEET 3 SHEET 1 SHEET 2 SHEET 3 SHEET 4 1 3 GLU A 9 THR A 14 0 1 3 SER A 17 ASP A 23-1 O SER A 21 N GLU A 9 1 3 THR A 56 SER A 60-1 N ALA A 57 O ILE A 20 2 4 GLN A 46 PRO A 51 0 2 4 TYR A 31 GLU A 38-1 N TYR A 36 O GLN A 46 2 4 VAL A 66 THR A 76-1 N VAL A 75 O TYR A 31 2 4 ILE A 88 THR A 94-1 N ILE A 88 O VAL A 72 Brandon and Tooze 23 24

25 26 SCOP: Structural Classification of Proteins. 1.75 release 38221 PDB Entries (23 Feb 2009). 110800 Domains. 1 Literature Reference (excluding nucleic acids and theoretical models) Class Number of folds Number of superfamilies Number of families All alpha proteins 284 507 871 All beta proteins 174 354 742 Alpha and beta proteins (a/b) 147 244 803 Alpha and beta proteins (a+b) 376 552 1055 Multi-domain proteins 66 66 89 Membrane and cell surface proteins 58 110 123 Small proteins 90 129 219 Total 1195 1962 3902 SCOP: Structural Classification of Proteins. 1.37 release 6497 PDB Entries (20 Oct 1997). 13073 Domains. 101 Literature References (including nucleic acids and theoretical models) Class Number of folds Number of superfamilies Number of families All alpha proteins 97 126 178 All beta proteins 61 112 163 Alpha and beta proteins (a/b) 75 110 188 Alpha and beta proteins (a+b) 101 146 201 Multi-domain proteins 20 20 25 Membrane and cell surface proteins 10 16 17 Small proteins 41 58 78 Total 405 558 850 http://scop.mrc-lmb.cam.ac.uk/scop/count.html#scop-1.75 27 28

CATH numbering scheme for representative structures from the globin-like fold family in the mainly " class. Four of the seven levels within the CATH database are shown, associated with Class, Architecture, Topology, and Homology. Each level is associated with a unique number. The (A), (T) and (H) levels are numbered in bins of ten to allow expansion of the database. Class Architecture Topology Homology 1 Mainly " 2 Mainly! 10 Non-bundle 3 "! 20 Bundle 460... 4 Few SS 30 Few SS 470 Variant surface glycoprotein... 480 Glucoamylase, domain 2 490 500 510 Globin-like!lactamase, domain 2 Casein kinase #.. 10 1hlm 20 1cpc chain A 520... 30 1col chain A 40 1ddt domain 2 1.10.490.20 Mainly ".Non-bundle.Globin-like.1cpc chain A 2BMF-Bovine ATPase F1 29 30 Table 1 The numbers of families identified at different levels in the CATH hierarchy is shown for the mainly ", mainly! and "$! classes. A T H S N I Domains Class Number % Number % Number % Number % Number % Number % Number % Mainly " 3 9.7 145 28.7 157 24.3 232 21.7 380 20.9 837 26.4 1793 22.2 Mainly! 17 54.8 102 20.2 137 21.2 266 24.9 585 32.1 891 28.1 2625 32.5 "! 10 32.3 244 48.3 337 52.2 556 52.1 829 45.5 1411 44.5 3562 44.1 Few SS* 1 3.2 14 2.8 14 2.2 14 1.3 27 1.5 32 1.0 98 1.2 Total 31 100.0 505 100.0 645 100.0 1068 100.0 1821 100.0 3171 100.0 8078 100.0 *The number of families for proteins having few secondary structure (SS) elements is also shown at each level in the hierarchy. Chain A 24-94 all beta 358-475 Left handed superhelix 95-379 P-loop containing nucleoside trip hydrolase http://scop.mrc-lmb.cam.ac.uk/scop/search.cgi?ver=1.75&key=1bmf Vogal et al: Current Opinion in Structural Biology 2004: 14: 208-216 31 32