The CATH Database provides insights into protein structure/function relationships
|
|
- Mark Harris
- 6 years ago
- Views:
Transcription
1 1999 Oxford University Press Nucleic Acids Research, 1999, Vol. 27, No The CATH Database provides insights into protein structure/function relationships C. A. Orengo, F. M. G. Pearl, J. E. Bray, A. E. Todd, A. C. Martin, L. Lo Conte and J. M. Thornton Department of Biochemistry and Molecular Biology, Darwin Building, Univeristy College London, Gower Street, London WC1E 6BT, UK Received October 2, 1998; Revised October 13, 1998; Accepted October 28, 1998 ABSTRACT We report the latest release (version 1.4) of the CATH protein domains database ( ac.uk/bsm/cath ). This is a hierarchical classification of protein domain structures into evolutionary families and structural groupings. We currently identify 827 homologous families in which the proteins have both structual similarity and sequence and/or functional similarity. These can be further clustered into 593 fold groups and 32 distinct architectures. Using our structural classification and associated data on protein functions, stored in the database (EC identifiers, SWISS-PROT keywords and information from the Enzyme database and literature) we have been able to analyse the correlation between the 3D structure and function. More than 96% of folds in the PDB are associated with a single homologous family. However, within the superfolds, three or more different functions are observed. Considering enzyme functions, more than 95% of clearly homologous families exhibit either single or closely related functions, as demonstrated by the EC identifiers of their relatives. Our analysis supports the view that determining structures, for example as part of a structural genomics initiative, will make a major contribution to interpreting genome data. INTRODUCTION The CATH classification of protein domain structures was established in 1993 (1) as a hierarchical clustering of protein domain structures into evolutionary families and structural groupings, depending on sequence and structure similarity. There are four major levels, corresponding to protein class, architecture, topology or fold and homologous family (Fig. 1). Since 1995, information about these structural groups and protein families has been accessible over the Web ( bsm/cath ), together with summary information about each individual protein structure (PDBsum) (2). CATH consists of both phylogenetic and phenetic descriptors for protein domain relationships. At the lowest levels in the Figure 1. Schematic representation of the (C)lass, (A)rchitecture and (T)opology/fold levels in the CATH database. hierarchy, proteins are grouped into evolutionary families (Homologous familes), for having either significant sequence similarity ( 35% identity) or high structural similarity and some sequence similarity ( 20% identity). Structural similarity is assessed using an automatic method (SSAP) (3,4), which scores *To whom correspondence should be addressed. Tel: ; Fax; ; orengo@biochem.ucl.ac.uk
2 276 Nucleic Acids Research, 1999, Vol. 27, No. 1 Figure 2. Snapshot of a web page showing data available in the CATH dictionary of homologous superfamilies, for the subtilisin family (CATH id: ). Tables display the PDB codes for non-identical relatives in the family, together with EC identifier codes and information about the enzyme reactions. The multiple structural alignment, shown, has been coloured according to secondary structure assignments (red for helix, blue for strands). 100 for identical proteins and generally returns scores above 80 for homologous proteins. More distantly related folds generally give scores above 70 (Topology or fold level), though in the absence of any sequence or functional similarity this may simply represent examples of convergent evolution, reinforcing the hypothesis that there exists a limited number of folds in nature (5,6). The Architecture level in CATH, groups proteins whose folds have similar 3D arrangements of secondary structures (e.g., barrel, sandwich or propellor), regardless of their connectivity, whilst the top level, Class, simply reflects the proportion of α-helix or β-strand secondary structures. Three major classes are recognised, mainly-α, mainly-β and α β, since analysis revealed considerable overlap between the α+β and alternating α/β classes, originally described by Levitt and Chothia (7). Before classification, multidomain proteins are first separated into their constituent folds using a consensus method which seeks agreement between three independent algorithms (8). Whilst the protocol for updating CATH is largely automatic (9), several stages require manual validation, in particular establishing domain boundaries in proteins for which no consensus could be reached and in checking the relationships of very distant homologues and proteins having borderline fold similarity. Although there are plans to assign the more regular architectures automatically, all architecture groupings are currently assigned manually. A homologous family Dictionary is now available within CATH, which contains functional data, where available, for each protein within a homologous family. This includes EC identifiers, SWISS-PROT keywords and information from the Enzyme database or the literature (Fig. 2). Multiple structure based alignments are also available, coloured according to secondary structure assignments or residue properties and there are schematic plots showing domain representations annotated by protein ligand interactions (DOMPLOTS) (A.E.Todd, C.A.Orengo and J.M.Thornton, submitted to Protein Engng.).
3 277 Nucleic Acids Research, 1994, 1999, Vol. Vol. 22, 27, No. No Figure 3. CATH wheel plot showing the population of homologous families in different fold groups, architectures and classes. The wheel is coloured according to protein class (red, mainly-α; green, mainly-β; yellow, αβ; blue, few secondary structures). The size of the outer wheel represents the number of homologous families in CATH whilst each band in the outer wheel corresponds to a single fold family. The size of each fold band therefore reflects the number of homologous families having that fold. It can be seen that most fold families contain a single homologous family. The superfold families are shown as paler bands, containing many homologous families. The inner wheel shows the population of homologous families in the different architectures. The topology of each domain is illustrated by schematic TOPS diagrams ( ; 10). We have also recently set up a Web Server (11), which enables the user to scan the CATH database with a newly determined protein structure and identify possible fold similarities or evolutionary relationships. There are also plans to incorporate sequence searches (using BLAST or PSI-BLAST) (12) to identify a probable fold for a new sequence. The latest release of CATH (version 1.4, April 1998) contains 9342 protein chains from the PDB (13), which divide into domain folds. Currently 32 different architectures are recognised. Since the last release, three new architectures have been described, including the five-bladed α β propellor. Grouping proteins on the basis of sequence, structure and functional similarity gives 827 evolutionary homologous families (H-level). Whilst recognising more distant structural similarity with no accompanying sequence or function similarity gives rise to 593 different fold groups (T-level). The population of the different levels in the CATH hierarchy is illustrated by the CATH wheel shown in Figure 3. It can be seen that several highly populated fold families, which we describe as superfolds (6), as they support a diverse range of sequences and more than three different functions, still account for nearly 30% of non-homologous structures. IMPLICATIONS FOR STRUCTURAL GENOMICS As the sequence databases grow rapidly, the need to interpret these sequences and assign functions to specific genes becomes increasingly important. Many techniques exist for matching protein sequences and thereby inheriting functional information. However, for very distant homologues there is often no detectable sequence similarity, despite conservation of 3D structure and function. For these cases, evolutionary relationships and thereby functions can only be assigned by comparing the structures. Therefore, a number of structural genomics initiatives are being proposed (14) which aim to identify all the folds in nature with the ultimate goal of being able to predict the function of a new protein from its known or probable structure. The important questions to ask are how many more folds do we need to determine before we have the complete set? and how confident can we be in assigning function between proteins having similar structures? In the current genomes, on average only 30 46% of sequences can be assigned to a structural family, by recognising sequence similarity to a protein of known structure (15,16). With only 600 unique structures currently in the PDB, compared with sequence families, it is clear that we still need to determine many more structures if we are to understand biology at the molecular level. However analysis of recently deposited structural data is very revealing. Figure 4a illustrates the distribution of 2159 new structural domains classified in the 10 months from June 1997 to March A large proportion of these (79%) were clearly homologous ( 30% identity) to proteins of known structure. Of the remaining 443 structures (Fig. 4b) corresponding to new sequences, we found only 8% were novel folds, the remainder resembling a previously determined structure. Many of these, 199 (45%), could be identified as clear homologues by having significant structure and sequence similarity (SSAP 80
4 278 Nucleic Acids Research, 1999, Vol. 27, No. 1 a b Figure 4. Pi-charts showing the proportion of 2159 recently deposited structures, which match structures in CATH. (a) Proportion of new structures matching by sequence alignment (21) or structure alignment (SSAP) (3). (b) Proportion of new non-homologous structure (<30% sequence identity to any previous CATH entry), which match previous CATH entries by structure. Those which have more than 20% sequence identity, measured after structural alignment, or functional similarity, are assigned as homologues. The remaining structures are analogues, having no clear evolutionary relationship. and 20% sequence identity). A further 169 (38%) were probable homologues as, although the sequence identity was below 20%, they had functional similarity and/or gave significant scores using sequence search methods designed to detect very distant homologues (PSIBLAST) (12). There remained a further 40 (9%) proteins which were analogous i.e., they had the same fold as a previous entry, but neither the sequence nor the function gave definite evidence of a common ancestor. RELATIONSHIP BETWEEN PROTEIN STRUCTURE AND FUNCTION We now need to consider at what levels of structural similarity or evolutionary distance it is reasonable to inherit functional information, within a protein family. Data on the CATH evolutionary families and structural groupings is stored in a Postgres relational database (11) with links to a ligand database containing information about protein ligand interactions (2). This allows us to analyse the relationship between the 3D structure and function, using stored data on EC identifiers, SWISS-PROT key words and protein ligand interactions (11). Considering the degree of functional similarity observed in structures with similar folds, the vast majority (>96%) of fold groups in the PDB derive from a single homologous family, with similar or closely related functions within the family. However, for the very common folds (superfolds, see above) which derive from three or more apparently unrelated homologous families, the proteins can perform quite unrelated functions even though they have the same fold. We have described these as analogous folds, which may or may not have a common ancestor. At the homologous superfamily level in CATH, a more detailed analysis of enzyme functions showed that the majority of homologous enzyme families in CATH (>90%) contained proteins for which the first three EC identifiers were the same. Considering those families where homologues have significant sequence identity ( 20%) after structural alignment, 95% were found to have a single EC identifier, whilst for families where proteins have more than 30% sequence similarity, we observed that 98% had a single EC code. Although assigning function on the basis of homology is common practice, it is clear that some caution should be exercised, particularly where there is little or no sequence similarity. There are also some clear examples where homologues with significant sequence similarity perform different functions. The role of gene recruitment is especially clear in the eye lens proteins, which function as enzymes in other cellular environments, but which are used as structural proteins in this context (17). The extent of such gene recruitment and context-sensitive function is really not known at this time. For enzymes, it is clear that catalytic function can change and evolve, usually to act on a different but related substrate. Similarly, within the lipocalin family (CATH id #: ), several proteins are found with very similar structures, which bind different fatty acids in the same region at the base of the β-barrel (e.g., retinol, bilin, biotin). Nearly half of the homologous families where two or more different EC numbers were observed, belong to the superfolds. This suggests that if a new protein is assigned to a superfold family, more caution should be used when inheriting functional information, as there appears to be greater tolerance to changes in sequence and ultimately function, for these families. However, it is interesting to note that many of these were TIM barrel or Rossmann folds. These are superfolds in which the substrate or ligand commonly binds in the same place. This is in the base of the β-barrel for the TIMs and at the crossover of the polypeptide chain for the doubly wound Rossmann structures. ASSIGNMENT OF FUNCTION THROUGH STRUCTURE One of the reasons for determining structures is to derive more information to facilitate the assignment of function. From our analysis of proteins in CATH, we suggest that structural data can help to assign function in several ways: (i) The structural data allow recognition of more distant homologues compared with sequence data in our analysis, 83% of structures with novel sequences could be assigned as homologues in this way (note that such assignment of function is again subject to the caveats imposed by gene recruitment discussed above). (ii) The structural data allows detailed inspection of the functional site to suggest if and how the function may have evolved. For example, if an enzyme has evolved to act on a
5 different substrate, the binding site may reveal, or at least suggest, possible changes in the substrate. (iii) For the superfolds, similarity of structure does not necessarily mean similarity of function. However the active site/binding sites are often conserved, e.g., in the TIM barrel or Rossmann fold structures, the ligand always binds at the same end of the barrel or sheet. (iv) Some methods have already been developed, and will increasingly be the focus of attention over the next few years, which aim to predict function ab initio from structure. For example, enzymes can often be identified by the presence of a major cleft, which also locates the active site (18). Similarly critical surface patches, which are used for molecular recognition in binding other proteins or ligands, may be identified using knowledge-based approaches (19,20). In summary, extrapolating the data from Figure 4 to a new genome, we can expect that, of the 54 70% of sequences which currently have no obvious sequence matches in the PDB, we will find nearly 80 90% to be homologous to a known family using the structural data alone. For the singlet folds, this will almost certainly reveal some clues to the function. For the superfolds, some folds will reveal information on the functional class (e.g., enzyme for TIM barrels) or the location of the active site, if not the specific function. Only 10 20% will be expected to be novel folds. For these the ab initio methods referred to above may provide some clues to guide experiments. Therefore, it is clear that determining structures, as part of a structural genomics initiative, for example, will make a major contribution to interpreting genome data. 279 Nucleic Acids Research, 1994, 1999, Vol. Vol. 22, 27, No. No REFERENCES 1 Orengo,C.A., Flores,T.P., Taylor,W.R. and Thornton,J.M. (1993) Protein Engng., 6, Laskowski,R.A., Hutchinson,E.G., Michie,A.D., Wallace,A.C., Jones,M.L. and Thornton,J.M. (1997) Trends Biochem. Sci., 22, Taylor,W.R. and Orengo,C.A. (1989) J. Mol. Biol., 208, Orengo,C.A., Brown,N.P. and Taylor,W.R. (1992) Proteins, 14, Chothia,C. (1993) Nature, 357, Orengo,C.A., Jones,D.T. and Thornton,J.M. (1994) Nature, 372, Levitt,M. and Chothia,C. (1976) Nature, 261, Jones,S., Swindells,M.B., Stewart,M., Michie,A.D., Orengo,C.A. and Thornton,J.M. (1998) Protein Sci., 7, Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, Westhead,D.R., Hatton,D.C. and Thornton,J.M. (1998) Trends Biochem. Sci., 23, Martin,A.C.R., Orengo,C.A., Hutchinson,E.G., Jones,S., Karmirantzou,M., Laskowski,R.A., Mitchell,J.B.O., Taroni,C. and Thornton,J.M. (1998) Structure, 6, Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, Abola,E.E., Bernstein,F.C., Bryant,S.H., Koetzle,T.F. and Weng,J. (1987) In Allen,F.H., Bergerhoff,G. and Sievers,R. (eds), Crystallographic Databases-Information Content, Software Systems, Scientific Applications. Data Commission of the International Union of Crystallography, Bonn/Cambridge/Chester, pp Pennisi,L. (1998) Science, 279, Huynen,M., Doerks,T., Eisenhaber,F., Orengo,C.A., Sunyaev,S., Yuan,Y. and Bork,P. (1998) J. Mol. Biol., 280, Jones,D.T. (1998) J. Mol. Biol., in press. 17 Piatigorsky,J. andwistow,g. (1991) Science, 252, Laskowski,R.A., Luscombe,N.M., Swindells,M.B. and Thornton,J.M. (1996) Protein Sci., 5, Jones,S. and Thornton,J.M. (1997) J. Mol. Biol., 272, Jones,S. and Thornton,J.M. (1997) J. Mol. Biol., 272, Needleman,S.B. and Wunsch,C.D. (1970) J.Mol. Biol., 48,
Heteropolymer. Mostly in regular secondary structure
Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4 C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6!
More information1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)
Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More informationProtein structure alignments
Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives
More informationNumber sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence
Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence
More informationThe CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues
Protein Engineering vol.13 no.3 pp.153 165, 2000 The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues J.E.Bray 1,2, A.E.Todd 1, F.M.G.Pearl
More informationSCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like
SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available
More informationProtein Structure: Data Bases and Classification Ingo Ruczinski
Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationProtein Folds, Functions and Evolution
Article No. jmbi.1999.3054 available online at http://www.idealibrary.com on J. Mol. Biol. (1999) 293, 333±342 Protein Folds, Functions and Evolution Janet M. Thornton 1,2 *, Christine A. Orengo 1, Annabel
More informationHomology and Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The
More informationHomology. and. Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology
More informationGenome Databases The CATH database
Genome Databases The CATH database Michael Knudsen 1 and Carsten Wiuf 1,2* 1 Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark 2 Centre for Membrane Pumps in Cells and Disease
More informationDATE A DAtabase of TIM Barrel Enzymes
DATE A DAtabase of TIM Barrel Enzymes 2 2.1 Introduction.. 2.2 Objective and salient features of the database 2.2.1 Choice of the dataset.. 2.3 Statistical information on the database.. 2.4 Features....
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationCSCE555 Bioinformatics. Protein Function Annotation
CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The
More informationAn automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures
Protein Engineering vol.10 no.6 pp.737 741, 1997 PROTOCOL An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures Lawrence A.Kelley, Stephen P.Gardner
More informationStructure to Function. Molecular Bioinformatics, X3, 2006
Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationProcheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.
Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond
More informationBIOINFORMATICS: An Introduction
BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinff18.html Proteins and Protein Structure
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationBioinformatics. Macromolecular structure
Bioinformatics Macromolecular structure Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis Structure alignment Domain
More informationAnnotation Error in Public Databases ALEXANDRA SCHNOES UNIVERSITY OF CALIFORNIA, SAN FRANCISCO OCTOBER 25, 2010
Annotation Error in Public Databases ALEXANDRA SCHNOES UNIVERSITY OF CALIFORNIA, SAN FRANCISCO OCTOBER 25, 2010 1 New genomes (and metagenomes) sequenced every day... 2 3 3 3 3 3 3 3 3 3 Computational
More informationEBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013
EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice
More informationProtein structure analysis. Risto Laakso 10th January 2005
Protein structure analysis Risto Laakso risto.laakso@hut.fi 10th January 2005 1 1 Summary Various methods of protein structure analysis were examined. Two proteins, 1HLB (Sea cucumber hemoglobin) and 1HLM
More informationChapter 2 Structures. 2.1 Introduction Storing Protein Structures The PDB File Format
Chapter 2 Structures 2.1 Introduction The three-dimensional (3D) structure of a protein contains a lot of information on its function, and can be used for devising ways of modifying it (propose mutants,
More informationComprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
Published online February 15, 26 166 18 Nucleic Acids Research, 26, Vol. 34, No. 3 doi:1.193/nar/gkj494 Comprehensive genome analysis of 23 genomes provides structural genomics with new insights into protein
More informationPrediction of protein function from sequence analysis
Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The omic era Genome Sequencing Projects: Archaea: 74 species In Progress:52 Bacteria:
More informationAnalysis and Prediction of Protein Structure (I)
Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng
More informationExtending CATH: increasing coverage of the protein structure universe and linking structure with function
D420 D426 Nucleic Acids Research, 20, Vol. 39, Database issue Published online 9 November 200 doi:0.093/nar/gkq00 Extending CATH: increasing coverage of the protein structure universe and linking structure
More informationIdentification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach
Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach Prof. Dr. M. A. Mottalib, Md. Rahat Hossain Department of Computer Science and Information Technology
More informationStudy of Mining Protein Structural Properties and its Application
Study of Mining Protein Structural Properties and its Application A Dissertation Proposal Presented to the Department of Computer Science and Information Engineering College of Electrical Engineering and
More information09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition
Sequence identity Structural similarity Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Fold recognition Sommersemester 2009 Peter Güntert Structural similarity X Sequence identity Non-uniform
More informationDiscrete structure of van der Waals domains in globular proteins
Protein Engineering vol.16 no.3 pp.161 167, 2003 DOI: 10.1093/proeng/gzg026 Discrete structure of van der Waals domains in globular proteins Igor N. Berezovsky Department of Structural Biology, The Weizmann
More informationThe ups and downs of protein topology; rapid comparison of protein structure
Protein Engineering vol.13 no.12 pp.829 837, 2000 The ups and downs of protein topology; rapid comparison of protein structure Andrew C.R.Martin search, but in the worst-case scenario, it could still be
More informationIntroduction to Evolutionary Concepts
Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq
More informationBioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing
Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.
More informationALL LECTURES IN SB Introduction
1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL
More informationSome Problems from Enzyme Families
Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems
More informationremembering Secondary Structures Does everyone know what the backbone and residue/side chains are? Clear about 1, 2 3 structures?
remembering Secondary Structures add blast Does everyone know what the backbone residue/side chains are? Clear about 1, 2 3 structures? Heteropolymer - + Mostly in regular secondary structure + - Secondary
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationAmino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1
Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationand structural variability. One of the principal objectives the structures are subjected to evolutionary pressure for
Protein Engineering vol.14 no.4 pp.219 226, 2001 Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability among homologous
More informationFunctional diversity within protein superfamilies
Functional diversity within protein superfamilies James Casbon and Mansoor Saqi * Bioinformatics Group, The Genome Centre, Barts and The London, Queen Mary s School of Medicine and Dentistry, Charterhouse
More informationMETABOLIC PATHWAY PREDICTION/ALIGNMENT
COMPUTATIONAL SYSTEMIC BIOLOGY METABOLIC PATHWAY PREDICTION/ALIGNMENT Hofestaedt R*, Chen M Bioinformatics / Medical Informatics, Technische Fakultaet, Universitaet Bielefeld Postfach 10 01 31, D-33501
More informationA General Model for Amino Acid Interaction Networks
Author manuscript, published in "N/P" A General Model for Amino Acid Interaction Networks Omar GACI and Stefan BALEV hal-43269, version - Nov 29 Abstract In this paper we introduce the notion of protein
More informationProtein Structure & Motifs
& Motifs Biochemistry 201 Molecular Biology January 12, 2000 Doug Brutlag Introduction Proteins are more flexible than nucleic acids in structure because of both the larger number of types of residues
More informationFrom Sequence to Function (I): - Protein Profiling - Case Studies in Structural & Functional Genomics
BCHS 6229 Protein Structure and Function Lecture 6 (Oct 27, 2011) From Sequence to Function (I): - Protein Profiling - Case Studies in Structural & Functional Genomics 1 From Sequence to Function in the
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationProtein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.
Protein Structure Analysis and Verification Course S-114.2500 Basics for Biosystems of the Cell exercise work Maija Nevala, BIO, 67485U 16.1.2008 1. Preface When faced with an unknown protein, scientists
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More informationPROMALS3D: a tool for multiple protein sequence and structure alignments
Nucleic Acids Research Advance Access published February 20, 2008 Nucleic Acids Research, 2008, 1 6 doi:10.1093/nar/gkn072 PROMALS3D: a tool for multiple protein sequence and structure alignments Jimin
More informationResearch Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.
Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationProtein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society
1 of 5 1/30/00 8:08 PM Protein Science (1997), 6: 246-248. Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society FOR THE RECORD LPFC: An Internet library of protein family
More informationBasics of protein structure
Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu
More informationProtein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1
Protein Structures Sequences of amino acid residues 20 different amino acids Primary Secondary Tertiary Quaternary 10/8/2002 Lecture 12 1 Angles φ and ψ in the polypeptide chain 10/8/2002 Lecture 12 2
More informationBMD645. Integration of Omics
BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study
More informationGene Ontology and overrepresentation analysis
Gene Ontology and overrepresentation analysis Kjell Petersen J Express Microarray analysis course Oslo December 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim Overview How
More informationMolecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007
Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationMSAT a Multiple Sequence Alignment tool based on TOPS
MSAT a Multiple Sequence Alignment tool based on TOPS Te Ren, Mallika Veeramalai, Aik Choon Tan and David Gilbert Bioinformatics Research Centre Department of Computer Science University of Glasgow Glasgow,
More informationProtein Structure Prediction
Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on
More informationAdvanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions
BIRKBECK COLLEGE (University of London) Advanced Certificate in Principles in Protein Structure MSc Structural Molecular Biology Date: Thursday, 1st September 2011 Time: 3 hours You will be given a start
More informationProtein Structure and Function Prediction using Kernel Methods.
Protein Structure and Function Prediction using Kernel Methods. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Huzefa Rangwala IN PARTIAL FULFILLMENT OF THE
More informationSUPPLEMENTARY INFORMATION
Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*
More informationData Mining in Protein Binding Cavities
In Proc. GfKl 2004, Dortmund: Data Mining in Protein Binding Cavities Katrin Kupas and Alfred Ultsch Data Bionics Research Group, University of Marburg, D-35032 Marburg, Germany Abstract. The molecular
More informationUnderstanding Sequence, Structure and Function Relationships and the Resulting Redundancy
Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy many slides by Philip E. Bourne Department of Pharmacology, UCSD Agenda Understand the relationship between sequence,
More informationAutomatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX
Genome Informatics 12: 113 122 (2001) 113 Automatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX Atsushi Yoshimori Carlos A. Del Carpio yosimori@translell.eco.tut.ac.jp
More informationDetection of Protein Binding Sites II
Detection of Protein Binding Sites II Goal: Given a protein structure, predict where a ligand might bind Thomas Funkhouser Princeton University CS597A, Fall 2007 1hld Geometric, chemical, evolutionary
More informationPROMALS3D web server for accurate multiple protein sequence and structure alignments
W30 W34 Nucleic Acids Research, 2008, Vol. 36, Web Server issue Published online 24 May 2008 doi:10.1093/nar/gkn322 PROMALS3D web server for accurate multiple protein sequence and structure alignments
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationBiophysics 101: Genomics & Computational Biology. Section 8: Protein Structure S T R U C T U R E P R O C E S S. Outline.
Biophysics 101: Genomics & Computational Biology Section 8: Protein Structure Faisal Reza Nov. 11 th, 2003 B101.pdb from PS5 shown at left with: animated ball and stick model, colored CPK H-bonds on, colored
More informationSequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5
Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many
More informationA Structure-Centric View of Protein Evolution, Design and Adaptation
A Structure-Centric View of Protein Evolution, Design and Adaptation Eric J. Deeds* and Eugene I. Shakhnovich *Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02138,
More informationBioinformatics: Secondary Structure Prediction
Bioinformatics: Secondary Structure Prediction Prof. David Jones d.jones@cs.ucl.ac.uk LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem? Entries
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the
More informationGenome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.
Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationCMPS 3110: Bioinformatics. Tertiary Structure Prediction
CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite
More informationExamples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE
Examples of Protein Modeling Protein Modeling Visualization Examination of an experimental structure to gain insight about a research question Dynamics To examine the dynamics of protein structures To
More informationModeling for 3D structure prediction
Modeling for 3D structure prediction What is a predicted structure? A structure that is constructed using as the sole source of information data obtained from computer based data-mining. However, mixing
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationBIOINFORMATICS LAB AP BIOLOGY
BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to
More informationSyllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)
Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural
More informationVisualization of Macromolecular Structures
Visualization of Macromolecular Structures Present by: Qihang Li orig. author: O Donoghue, et al. Structural biology is rapidly accumulating a wealth of detailed information. Over 60,000 high-resolution
More informationChristian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel
Christian Sigrist General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a
More informationCurriculum Links. AQA GCE Biology. AS level
Curriculum Links AQA GCE Biology Unit 2 BIOL2 The variety of living organisms 3.2.1 Living organisms vary and this variation is influenced by genetic and environmental factors Causes of variation 3.2.2
More informationAvailable online at Analele Stiintifice ale Universitatii Al. I. Cuza din Iasi Seria Geologie 58 (1) (2012) 53 58
Available online at http://geology.uaic.ro/auig/ Analele Stiintifice ale Universitatii Al. I. Cuza din Iasi Seria Geologie 58 (1) (2012) 53 58 AUI GEOLOGIE GIS database for mineral resources: case study
More informationHomologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are
1 Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are as close to each other as possible. Structural similarity
More informationCh. 9 Multiple Sequence Alignment (MSA)
Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -
More informationProtoNet 4.0: A hierarchical classification of one million protein sequences
ProtoNet 4.0: A hierarchical classification of one million protein sequences Noam Kaplan 1*, Ori Sasson 2, Uri Inbar 2, Moriah Friedlich 2, Menachem Fromer 2, Hillel Fleischer 2, Elon Portugaly 2, Nathan
More information