The CATH Database provides insights into protein structure/function relationships

Size: px
Start display at page:

Download "The CATH Database provides insights into protein structure/function relationships"

Transcription

1 1999 Oxford University Press Nucleic Acids Research, 1999, Vol. 27, No The CATH Database provides insights into protein structure/function relationships C. A. Orengo, F. M. G. Pearl, J. E. Bray, A. E. Todd, A. C. Martin, L. Lo Conte and J. M. Thornton Department of Biochemistry and Molecular Biology, Darwin Building, Univeristy College London, Gower Street, London WC1E 6BT, UK Received October 2, 1998; Revised October 13, 1998; Accepted October 28, 1998 ABSTRACT We report the latest release (version 1.4) of the CATH protein domains database ( ac.uk/bsm/cath ). This is a hierarchical classification of protein domain structures into evolutionary families and structural groupings. We currently identify 827 homologous families in which the proteins have both structual similarity and sequence and/or functional similarity. These can be further clustered into 593 fold groups and 32 distinct architectures. Using our structural classification and associated data on protein functions, stored in the database (EC identifiers, SWISS-PROT keywords and information from the Enzyme database and literature) we have been able to analyse the correlation between the 3D structure and function. More than 96% of folds in the PDB are associated with a single homologous family. However, within the superfolds, three or more different functions are observed. Considering enzyme functions, more than 95% of clearly homologous families exhibit either single or closely related functions, as demonstrated by the EC identifiers of their relatives. Our analysis supports the view that determining structures, for example as part of a structural genomics initiative, will make a major contribution to interpreting genome data. INTRODUCTION The CATH classification of protein domain structures was established in 1993 (1) as a hierarchical clustering of protein domain structures into evolutionary families and structural groupings, depending on sequence and structure similarity. There are four major levels, corresponding to protein class, architecture, topology or fold and homologous family (Fig. 1). Since 1995, information about these structural groups and protein families has been accessible over the Web ( bsm/cath ), together with summary information about each individual protein structure (PDBsum) (2). CATH consists of both phylogenetic and phenetic descriptors for protein domain relationships. At the lowest levels in the Figure 1. Schematic representation of the (C)lass, (A)rchitecture and (T)opology/fold levels in the CATH database. hierarchy, proteins are grouped into evolutionary families (Homologous familes), for having either significant sequence similarity ( 35% identity) or high structural similarity and some sequence similarity ( 20% identity). Structural similarity is assessed using an automatic method (SSAP) (3,4), which scores *To whom correspondence should be addressed. Tel: ; Fax; ; orengo@biochem.ucl.ac.uk

2 276 Nucleic Acids Research, 1999, Vol. 27, No. 1 Figure 2. Snapshot of a web page showing data available in the CATH dictionary of homologous superfamilies, for the subtilisin family (CATH id: ). Tables display the PDB codes for non-identical relatives in the family, together with EC identifier codes and information about the enzyme reactions. The multiple structural alignment, shown, has been coloured according to secondary structure assignments (red for helix, blue for strands). 100 for identical proteins and generally returns scores above 80 for homologous proteins. More distantly related folds generally give scores above 70 (Topology or fold level), though in the absence of any sequence or functional similarity this may simply represent examples of convergent evolution, reinforcing the hypothesis that there exists a limited number of folds in nature (5,6). The Architecture level in CATH, groups proteins whose folds have similar 3D arrangements of secondary structures (e.g., barrel, sandwich or propellor), regardless of their connectivity, whilst the top level, Class, simply reflects the proportion of α-helix or β-strand secondary structures. Three major classes are recognised, mainly-α, mainly-β and α β, since analysis revealed considerable overlap between the α+β and alternating α/β classes, originally described by Levitt and Chothia (7). Before classification, multidomain proteins are first separated into their constituent folds using a consensus method which seeks agreement between three independent algorithms (8). Whilst the protocol for updating CATH is largely automatic (9), several stages require manual validation, in particular establishing domain boundaries in proteins for which no consensus could be reached and in checking the relationships of very distant homologues and proteins having borderline fold similarity. Although there are plans to assign the more regular architectures automatically, all architecture groupings are currently assigned manually. A homologous family Dictionary is now available within CATH, which contains functional data, where available, for each protein within a homologous family. This includes EC identifiers, SWISS-PROT keywords and information from the Enzyme database or the literature (Fig. 2). Multiple structure based alignments are also available, coloured according to secondary structure assignments or residue properties and there are schematic plots showing domain representations annotated by protein ligand interactions (DOMPLOTS) (A.E.Todd, C.A.Orengo and J.M.Thornton, submitted to Protein Engng.).

3 277 Nucleic Acids Research, 1994, 1999, Vol. Vol. 22, 27, No. No Figure 3. CATH wheel plot showing the population of homologous families in different fold groups, architectures and classes. The wheel is coloured according to protein class (red, mainly-α; green, mainly-β; yellow, αβ; blue, few secondary structures). The size of the outer wheel represents the number of homologous families in CATH whilst each band in the outer wheel corresponds to a single fold family. The size of each fold band therefore reflects the number of homologous families having that fold. It can be seen that most fold families contain a single homologous family. The superfold families are shown as paler bands, containing many homologous families. The inner wheel shows the population of homologous families in the different architectures. The topology of each domain is illustrated by schematic TOPS diagrams ( ; 10). We have also recently set up a Web Server (11), which enables the user to scan the CATH database with a newly determined protein structure and identify possible fold similarities or evolutionary relationships. There are also plans to incorporate sequence searches (using BLAST or PSI-BLAST) (12) to identify a probable fold for a new sequence. The latest release of CATH (version 1.4, April 1998) contains 9342 protein chains from the PDB (13), which divide into domain folds. Currently 32 different architectures are recognised. Since the last release, three new architectures have been described, including the five-bladed α β propellor. Grouping proteins on the basis of sequence, structure and functional similarity gives 827 evolutionary homologous families (H-level). Whilst recognising more distant structural similarity with no accompanying sequence or function similarity gives rise to 593 different fold groups (T-level). The population of the different levels in the CATH hierarchy is illustrated by the CATH wheel shown in Figure 3. It can be seen that several highly populated fold families, which we describe as superfolds (6), as they support a diverse range of sequences and more than three different functions, still account for nearly 30% of non-homologous structures. IMPLICATIONS FOR STRUCTURAL GENOMICS As the sequence databases grow rapidly, the need to interpret these sequences and assign functions to specific genes becomes increasingly important. Many techniques exist for matching protein sequences and thereby inheriting functional information. However, for very distant homologues there is often no detectable sequence similarity, despite conservation of 3D structure and function. For these cases, evolutionary relationships and thereby functions can only be assigned by comparing the structures. Therefore, a number of structural genomics initiatives are being proposed (14) which aim to identify all the folds in nature with the ultimate goal of being able to predict the function of a new protein from its known or probable structure. The important questions to ask are how many more folds do we need to determine before we have the complete set? and how confident can we be in assigning function between proteins having similar structures? In the current genomes, on average only 30 46% of sequences can be assigned to a structural family, by recognising sequence similarity to a protein of known structure (15,16). With only 600 unique structures currently in the PDB, compared with sequence families, it is clear that we still need to determine many more structures if we are to understand biology at the molecular level. However analysis of recently deposited structural data is very revealing. Figure 4a illustrates the distribution of 2159 new structural domains classified in the 10 months from June 1997 to March A large proportion of these (79%) were clearly homologous ( 30% identity) to proteins of known structure. Of the remaining 443 structures (Fig. 4b) corresponding to new sequences, we found only 8% were novel folds, the remainder resembling a previously determined structure. Many of these, 199 (45%), could be identified as clear homologues by having significant structure and sequence similarity (SSAP 80

4 278 Nucleic Acids Research, 1999, Vol. 27, No. 1 a b Figure 4. Pi-charts showing the proportion of 2159 recently deposited structures, which match structures in CATH. (a) Proportion of new structures matching by sequence alignment (21) or structure alignment (SSAP) (3). (b) Proportion of new non-homologous structure (<30% sequence identity to any previous CATH entry), which match previous CATH entries by structure. Those which have more than 20% sequence identity, measured after structural alignment, or functional similarity, are assigned as homologues. The remaining structures are analogues, having no clear evolutionary relationship. and 20% sequence identity). A further 169 (38%) were probable homologues as, although the sequence identity was below 20%, they had functional similarity and/or gave significant scores using sequence search methods designed to detect very distant homologues (PSIBLAST) (12). There remained a further 40 (9%) proteins which were analogous i.e., they had the same fold as a previous entry, but neither the sequence nor the function gave definite evidence of a common ancestor. RELATIONSHIP BETWEEN PROTEIN STRUCTURE AND FUNCTION We now need to consider at what levels of structural similarity or evolutionary distance it is reasonable to inherit functional information, within a protein family. Data on the CATH evolutionary families and structural groupings is stored in a Postgres relational database (11) with links to a ligand database containing information about protein ligand interactions (2). This allows us to analyse the relationship between the 3D structure and function, using stored data on EC identifiers, SWISS-PROT key words and protein ligand interactions (11). Considering the degree of functional similarity observed in structures with similar folds, the vast majority (>96%) of fold groups in the PDB derive from a single homologous family, with similar or closely related functions within the family. However, for the very common folds (superfolds, see above) which derive from three or more apparently unrelated homologous families, the proteins can perform quite unrelated functions even though they have the same fold. We have described these as analogous folds, which may or may not have a common ancestor. At the homologous superfamily level in CATH, a more detailed analysis of enzyme functions showed that the majority of homologous enzyme families in CATH (>90%) contained proteins for which the first three EC identifiers were the same. Considering those families where homologues have significant sequence identity ( 20%) after structural alignment, 95% were found to have a single EC identifier, whilst for families where proteins have more than 30% sequence similarity, we observed that 98% had a single EC code. Although assigning function on the basis of homology is common practice, it is clear that some caution should be exercised, particularly where there is little or no sequence similarity. There are also some clear examples where homologues with significant sequence similarity perform different functions. The role of gene recruitment is especially clear in the eye lens proteins, which function as enzymes in other cellular environments, but which are used as structural proteins in this context (17). The extent of such gene recruitment and context-sensitive function is really not known at this time. For enzymes, it is clear that catalytic function can change and evolve, usually to act on a different but related substrate. Similarly, within the lipocalin family (CATH id #: ), several proteins are found with very similar structures, which bind different fatty acids in the same region at the base of the β-barrel (e.g., retinol, bilin, biotin). Nearly half of the homologous families where two or more different EC numbers were observed, belong to the superfolds. This suggests that if a new protein is assigned to a superfold family, more caution should be used when inheriting functional information, as there appears to be greater tolerance to changes in sequence and ultimately function, for these families. However, it is interesting to note that many of these were TIM barrel or Rossmann folds. These are superfolds in which the substrate or ligand commonly binds in the same place. This is in the base of the β-barrel for the TIMs and at the crossover of the polypeptide chain for the doubly wound Rossmann structures. ASSIGNMENT OF FUNCTION THROUGH STRUCTURE One of the reasons for determining structures is to derive more information to facilitate the assignment of function. From our analysis of proteins in CATH, we suggest that structural data can help to assign function in several ways: (i) The structural data allow recognition of more distant homologues compared with sequence data in our analysis, 83% of structures with novel sequences could be assigned as homologues in this way (note that such assignment of function is again subject to the caveats imposed by gene recruitment discussed above). (ii) The structural data allows detailed inspection of the functional site to suggest if and how the function may have evolved. For example, if an enzyme has evolved to act on a

5 different substrate, the binding site may reveal, or at least suggest, possible changes in the substrate. (iii) For the superfolds, similarity of structure does not necessarily mean similarity of function. However the active site/binding sites are often conserved, e.g., in the TIM barrel or Rossmann fold structures, the ligand always binds at the same end of the barrel or sheet. (iv) Some methods have already been developed, and will increasingly be the focus of attention over the next few years, which aim to predict function ab initio from structure. For example, enzymes can often be identified by the presence of a major cleft, which also locates the active site (18). Similarly critical surface patches, which are used for molecular recognition in binding other proteins or ligands, may be identified using knowledge-based approaches (19,20). In summary, extrapolating the data from Figure 4 to a new genome, we can expect that, of the 54 70% of sequences which currently have no obvious sequence matches in the PDB, we will find nearly 80 90% to be homologous to a known family using the structural data alone. For the singlet folds, this will almost certainly reveal some clues to the function. For the superfolds, some folds will reveal information on the functional class (e.g., enzyme for TIM barrels) or the location of the active site, if not the specific function. Only 10 20% will be expected to be novel folds. For these the ab initio methods referred to above may provide some clues to guide experiments. Therefore, it is clear that determining structures, as part of a structural genomics initiative, for example, will make a major contribution to interpreting genome data. 279 Nucleic Acids Research, 1994, 1999, Vol. Vol. 22, 27, No. No REFERENCES 1 Orengo,C.A., Flores,T.P., Taylor,W.R. and Thornton,J.M. (1993) Protein Engng., 6, Laskowski,R.A., Hutchinson,E.G., Michie,A.D., Wallace,A.C., Jones,M.L. and Thornton,J.M. (1997) Trends Biochem. Sci., 22, Taylor,W.R. and Orengo,C.A. (1989) J. Mol. Biol., 208, Orengo,C.A., Brown,N.P. and Taylor,W.R. (1992) Proteins, 14, Chothia,C. (1993) Nature, 357, Orengo,C.A., Jones,D.T. and Thornton,J.M. (1994) Nature, 372, Levitt,M. and Chothia,C. (1976) Nature, 261, Jones,S., Swindells,M.B., Stewart,M., Michie,A.D., Orengo,C.A. and Thornton,J.M. (1998) Protein Sci., 7, Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, Westhead,D.R., Hatton,D.C. and Thornton,J.M. (1998) Trends Biochem. Sci., 23, Martin,A.C.R., Orengo,C.A., Hutchinson,E.G., Jones,S., Karmirantzou,M., Laskowski,R.A., Mitchell,J.B.O., Taroni,C. and Thornton,J.M. (1998) Structure, 6, Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, Abola,E.E., Bernstein,F.C., Bryant,S.H., Koetzle,T.F. and Weng,J. (1987) In Allen,F.H., Bergerhoff,G. and Sievers,R. (eds), Crystallographic Databases-Information Content, Software Systems, Scientific Applications. Data Commission of the International Union of Crystallography, Bonn/Cambridge/Chester, pp Pennisi,L. (1998) Science, 279, Huynen,M., Doerks,T., Eisenhaber,F., Orengo,C.A., Sunyaev,S., Yuan,Y. and Bork,P. (1998) J. Mol. Biol., 280, Jones,D.T. (1998) J. Mol. Biol., in press. 17 Piatigorsky,J. andwistow,g. (1991) Science, 252, Laskowski,R.A., Luscombe,N.M., Swindells,M.B. and Thornton,J.M. (1996) Protein Sci., 5, Jones,S. and Thornton,J.M. (1997) J. Mol. Biol., 272, Jones,S. and Thornton,J.M. (1997) J. Mol. Biol., 272, Needleman,S.B. and Wunsch,C.D. (1970) J.Mol. Biol., 48,

Heteropolymer. Mostly in regular secondary structure

Heteropolymer. Mostly in regular secondary structure Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4 C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6!

More information

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB) Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

Protein structure alignments

Protein structure alignments Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues

The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues Protein Engineering vol.13 no.3 pp.153 165, 2000 The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues J.E.Bray 1,2, A.E.Todd 1, F.M.G.Pearl

More information

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

Protein Structure: Data Bases and Classification Ingo Ruczinski

Protein Structure: Data Bases and Classification Ingo Ruczinski Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Protein Folds, Functions and Evolution

Protein Folds, Functions and Evolution Article No. jmbi.1999.3054 available online at http://www.idealibrary.com on J. Mol. Biol. (1999) 293, 333±342 Protein Folds, Functions and Evolution Janet M. Thornton 1,2 *, Christine A. Orengo 1, Annabel

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

Genome Databases The CATH database

Genome Databases The CATH database Genome Databases The CATH database Michael Knudsen 1 and Carsten Wiuf 1,2* 1 Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark 2 Centre for Membrane Pumps in Cells and Disease

More information

DATE A DAtabase of TIM Barrel Enzymes

DATE A DAtabase of TIM Barrel Enzymes DATE A DAtabase of TIM Barrel Enzymes 2 2.1 Introduction.. 2.2 Objective and salient features of the database 2.2.1 Choice of the dataset.. 2.3 Statistical information on the database.. 2.4 Features....

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures

An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures Protein Engineering vol.10 no.6 pp.737 741, 1997 PROTOCOL An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures Lawrence A.Kelley, Stephen P.Gardner

More information

Structure to Function. Molecular Bioinformatics, X3, 2006

Structure to Function. Molecular Bioinformatics, X3, 2006 Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics. Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinff18.html Proteins and Protein Structure

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Bioinformatics. Macromolecular structure

Bioinformatics. Macromolecular structure Bioinformatics Macromolecular structure Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis Structure alignment Domain

More information

Annotation Error in Public Databases ALEXANDRA SCHNOES UNIVERSITY OF CALIFORNIA, SAN FRANCISCO OCTOBER 25, 2010

Annotation Error in Public Databases ALEXANDRA SCHNOES UNIVERSITY OF CALIFORNIA, SAN FRANCISCO OCTOBER 25, 2010 Annotation Error in Public Databases ALEXANDRA SCHNOES UNIVERSITY OF CALIFORNIA, SAN FRANCISCO OCTOBER 25, 2010 1 New genomes (and metagenomes) sequenced every day... 2 3 3 3 3 3 3 3 3 3 Computational

More information

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013 EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice

More information

Protein structure analysis. Risto Laakso 10th January 2005

Protein structure analysis. Risto Laakso 10th January 2005 Protein structure analysis Risto Laakso risto.laakso@hut.fi 10th January 2005 1 1 Summary Various methods of protein structure analysis were examined. Two proteins, 1HLB (Sea cucumber hemoglobin) and 1HLM

More information

Chapter 2 Structures. 2.1 Introduction Storing Protein Structures The PDB File Format

Chapter 2 Structures. 2.1 Introduction Storing Protein Structures The PDB File Format Chapter 2 Structures 2.1 Introduction The three-dimensional (3D) structure of a protein contains a lot of information on its function, and can be used for devising ways of modifying it (propose mutants,

More information

Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space

Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space Published online February 15, 26 166 18 Nucleic Acids Research, 26, Vol. 34, No. 3 doi:1.193/nar/gkj494 Comprehensive genome analysis of 23 genomes provides structural genomics with new insights into protein

More information

Prediction of protein function from sequence analysis

Prediction of protein function from sequence analysis Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The omic era Genome Sequencing Projects: Archaea: 74 species In Progress:52 Bacteria:

More information

Analysis and Prediction of Protein Structure (I)

Analysis and Prediction of Protein Structure (I) Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng

More information

Extending CATH: increasing coverage of the protein structure universe and linking structure with function

Extending CATH: increasing coverage of the protein structure universe and linking structure with function D420 D426 Nucleic Acids Research, 20, Vol. 39, Database issue Published online 9 November 200 doi:0.093/nar/gkq00 Extending CATH: increasing coverage of the protein structure universe and linking structure

More information

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach Prof. Dr. M. A. Mottalib, Md. Rahat Hossain Department of Computer Science and Information Technology

More information

Study of Mining Protein Structural Properties and its Application

Study of Mining Protein Structural Properties and its Application Study of Mining Protein Structural Properties and its Application A Dissertation Proposal Presented to the Department of Computer Science and Information Engineering College of Electrical Engineering and

More information

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition Sequence identity Structural similarity Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Fold recognition Sommersemester 2009 Peter Güntert Structural similarity X Sequence identity Non-uniform

More information

Discrete structure of van der Waals domains in globular proteins

Discrete structure of van der Waals domains in globular proteins Protein Engineering vol.16 no.3 pp.161 167, 2003 DOI: 10.1093/proeng/gzg026 Discrete structure of van der Waals domains in globular proteins Igor N. Berezovsky Department of Structural Biology, The Weizmann

More information

The ups and downs of protein topology; rapid comparison of protein structure

The ups and downs of protein topology; rapid comparison of protein structure Protein Engineering vol.13 no.12 pp.829 837, 2000 The ups and downs of protein topology; rapid comparison of protein structure Andrew C.R.Martin search, but in the worst-case scenario, it could still be

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.

More information

ALL LECTURES IN SB Introduction

ALL LECTURES IN SB Introduction 1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL

More information

Some Problems from Enzyme Families

Some Problems from Enzyme Families Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems

More information

remembering Secondary Structures Does everyone know what the backbone and residue/side chains are? Clear about 1, 2 3 structures?

remembering Secondary Structures Does everyone know what the backbone and residue/side chains are? Clear about 1, 2 3 structures? remembering Secondary Structures add blast Does everyone know what the backbone residue/side chains are? Clear about 1, 2 3 structures? Heteropolymer - + Mostly in regular secondary structure + - Secondary

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

and structural variability. One of the principal objectives the structures are subjected to evolutionary pressure for

and structural variability. One of the principal objectives the structures are subjected to evolutionary pressure for Protein Engineering vol.14 no.4 pp.219 226, 2001 Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability among homologous

More information

Functional diversity within protein superfamilies

Functional diversity within protein superfamilies Functional diversity within protein superfamilies James Casbon and Mansoor Saqi * Bioinformatics Group, The Genome Centre, Barts and The London, Queen Mary s School of Medicine and Dentistry, Charterhouse

More information

METABOLIC PATHWAY PREDICTION/ALIGNMENT

METABOLIC PATHWAY PREDICTION/ALIGNMENT COMPUTATIONAL SYSTEMIC BIOLOGY METABOLIC PATHWAY PREDICTION/ALIGNMENT Hofestaedt R*, Chen M Bioinformatics / Medical Informatics, Technische Fakultaet, Universitaet Bielefeld Postfach 10 01 31, D-33501

More information

A General Model for Amino Acid Interaction Networks

A General Model for Amino Acid Interaction Networks Author manuscript, published in "N/P" A General Model for Amino Acid Interaction Networks Omar GACI and Stefan BALEV hal-43269, version - Nov 29 Abstract In this paper we introduce the notion of protein

More information

Protein Structure & Motifs

Protein Structure & Motifs & Motifs Biochemistry 201 Molecular Biology January 12, 2000 Doug Brutlag Introduction Proteins are more flexible than nucleic acids in structure because of both the larger number of types of residues

More information

From Sequence to Function (I): - Protein Profiling - Case Studies in Structural & Functional Genomics

From Sequence to Function (I): - Protein Profiling - Case Studies in Structural & Functional Genomics BCHS 6229 Protein Structure and Function Lecture 6 (Oct 27, 2011) From Sequence to Function (I): - Protein Profiling - Case Studies in Structural & Functional Genomics 1 From Sequence to Function in the

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1. Protein Structure Analysis and Verification Course S-114.2500 Basics for Biosystems of the Cell exercise work Maija Nevala, BIO, 67485U 16.1.2008 1. Preface When faced with an unknown protein, scientists

More information

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this

More information

PROMALS3D: a tool for multiple protein sequence and structure alignments

PROMALS3D: a tool for multiple protein sequence and structure alignments Nucleic Acids Research Advance Access published February 20, 2008 Nucleic Acids Research, 2008, 1 6 doi:10.1093/nar/gkn072 PROMALS3D: a tool for multiple protein sequence and structure alignments Jimin

More information

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society 1 of 5 1/30/00 8:08 PM Protein Science (1997), 6: 246-248. Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society FOR THE RECORD LPFC: An Internet library of protein family

More information

Basics of protein structure

Basics of protein structure Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu

More information

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1 Protein Structures Sequences of amino acid residues 20 different amino acids Primary Secondary Tertiary Quaternary 10/8/2002 Lecture 12 1 Angles φ and ψ in the polypeptide chain 10/8/2002 Lecture 12 2

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

Gene Ontology and overrepresentation analysis

Gene Ontology and overrepresentation analysis Gene Ontology and overrepresentation analysis Kjell Petersen J Express Microarray analysis course Oslo December 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim Overview How

More information

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007 Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

MSAT a Multiple Sequence Alignment tool based on TOPS

MSAT a Multiple Sequence Alignment tool based on TOPS MSAT a Multiple Sequence Alignment tool based on TOPS Te Ren, Mallika Veeramalai, Aik Choon Tan and David Gilbert Bioinformatics Research Centre Department of Computer Science University of Glasgow Glasgow,

More information

Protein Structure Prediction

Protein Structure Prediction Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on

More information

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions BIRKBECK COLLEGE (University of London) Advanced Certificate in Principles in Protein Structure MSc Structural Molecular Biology Date: Thursday, 1st September 2011 Time: 3 hours You will be given a start

More information

Protein Structure and Function Prediction using Kernel Methods.

Protein Structure and Function Prediction using Kernel Methods. Protein Structure and Function Prediction using Kernel Methods. A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Huzefa Rangwala IN PARTIAL FULFILLMENT OF THE

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

Data Mining in Protein Binding Cavities

Data Mining in Protein Binding Cavities In Proc. GfKl 2004, Dortmund: Data Mining in Protein Binding Cavities Katrin Kupas and Alfred Ultsch Data Bionics Research Group, University of Marburg, D-35032 Marburg, Germany Abstract. The molecular

More information

Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy

Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy many slides by Philip E. Bourne Department of Pharmacology, UCSD Agenda Understand the relationship between sequence,

More information

Automatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX

Automatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX Genome Informatics 12: 113 122 (2001) 113 Automatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX Atsushi Yoshimori Carlos A. Del Carpio yosimori@translell.eco.tut.ac.jp

More information

Detection of Protein Binding Sites II

Detection of Protein Binding Sites II Detection of Protein Binding Sites II Goal: Given a protein structure, predict where a ligand might bind Thomas Funkhouser Princeton University CS597A, Fall 2007 1hld Geometric, chemical, evolutionary

More information

PROMALS3D web server for accurate multiple protein sequence and structure alignments

PROMALS3D web server for accurate multiple protein sequence and structure alignments W30 W34 Nucleic Acids Research, 2008, Vol. 36, Web Server issue Published online 24 May 2008 doi:10.1093/nar/gkn322 PROMALS3D web server for accurate multiple protein sequence and structure alignments

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Biophysics 101: Genomics & Computational Biology. Section 8: Protein Structure S T R U C T U R E P R O C E S S. Outline.

Biophysics 101: Genomics & Computational Biology. Section 8: Protein Structure S T R U C T U R E P R O C E S S. Outline. Biophysics 101: Genomics & Computational Biology Section 8: Protein Structure Faisal Reza Nov. 11 th, 2003 B101.pdb from PS5 shown at left with: animated ball and stick model, colored CPK H-bonds on, colored

More information

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many

More information

A Structure-Centric View of Protein Evolution, Design and Adaptation

A Structure-Centric View of Protein Evolution, Design and Adaptation A Structure-Centric View of Protein Evolution, Design and Adaptation Eric J. Deeds* and Eugene I. Shakhnovich *Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02138,

More information

Bioinformatics: Secondary Structure Prediction

Bioinformatics: Secondary Structure Prediction Bioinformatics: Secondary Structure Prediction Prof. David Jones d.jones@cs.ucl.ac.uk LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem? Entries

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE Examples of Protein Modeling Protein Modeling Visualization Examination of an experimental structure to gain insight about a research question Dynamics To examine the dynamics of protein structures To

More information

Modeling for 3D structure prediction

Modeling for 3D structure prediction Modeling for 3D structure prediction What is a predicted structure? A structure that is constructed using as the sole source of information data obtained from computer based data-mining. However, mixing

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural

More information

Visualization of Macromolecular Structures

Visualization of Macromolecular Structures Visualization of Macromolecular Structures Present by: Qihang Li orig. author: O Donoghue, et al. Structural biology is rapidly accumulating a wealth of detailed information. Over 60,000 high-resolution

More information

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel Christian Sigrist General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a

More information

Curriculum Links. AQA GCE Biology. AS level

Curriculum Links. AQA GCE Biology. AS level Curriculum Links AQA GCE Biology Unit 2 BIOL2 The variety of living organisms 3.2.1 Living organisms vary and this variation is influenced by genetic and environmental factors Causes of variation 3.2.2

More information

Available online at Analele Stiintifice ale Universitatii Al. I. Cuza din Iasi Seria Geologie 58 (1) (2012) 53 58

Available online at   Analele Stiintifice ale Universitatii Al. I. Cuza din Iasi Seria Geologie 58 (1) (2012) 53 58 Available online at http://geology.uaic.ro/auig/ Analele Stiintifice ale Universitatii Al. I. Cuza din Iasi Seria Geologie 58 (1) (2012) 53 58 AUI GEOLOGIE GIS database for mineral resources: case study

More information

Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are

Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are 1 Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are as close to each other as possible. Structural similarity

More information

Ch. 9 Multiple Sequence Alignment (MSA)

Ch. 9 Multiple Sequence Alignment (MSA) Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -

More information

ProtoNet 4.0: A hierarchical classification of one million protein sequences

ProtoNet 4.0: A hierarchical classification of one million protein sequences ProtoNet 4.0: A hierarchical classification of one million protein sequences Noam Kaplan 1*, Ori Sasson 2, Uri Inbar 2, Moriah Friedlich 2, Menachem Fromer 2, Hillel Fleischer 2, Elon Portugaly 2, Nathan

More information