Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Similar documents
Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Protein structure analysis. Risto Laakso 10th January 2005

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Bioinformatics. Macromolecular structure

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

CAP 5510 Lecture 3 Protein Structures

Introduction to Comparative Protein Modeling. Chapter 4 Part I

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

DATE A DAtabase of TIM Barrel Enzymes

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Protein Structure: Data Bases and Classification Ingo Ruczinski

CS612 - Algorithms in Bioinformatics

Modeling for 3D structure prediction

Heteropolymer. Mostly in regular secondary structure

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

Supporting Online Material for

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

CS612 - Algorithms in Bioinformatics

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Getting To Know Your Protein

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Basics of protein structure

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Protein structure alignments

Protein Structure Prediction

ALL LECTURES IN SB Introduction

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Protein Structures: Experiments and Modeling. Patrice Koehl

Physiochemical Properties of Residues

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Homology modeling of Ferredoxin-nitrite reductase from Arabidopsis thaliana

D Dobbs ISU - BCB 444/544X 1

Ramachandran Plot. 4ysz Phi (degrees) Plot statistics

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Conformational Geometry of Peptides and Proteins:

EBI web resources II: Ensembl and InterPro

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Chapter 2 Structures. 2.1 Introduction Storing Protein Structures The PDB File Format

The Structure and Functions of Proteins

Homology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

Introduction to" Protein Structure

Analysis and Prediction of Protein Structure (I)

Template-Based Modeling of Protein Structure

Structure to Function. Molecular Bioinformatics, X3, 2006

Protein Structure Basics

Comparing Protein Structures. Why?

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Computational Molecular Biology. Protein Structure and Homology Modeling

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Useful background reading

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach

Protein Structure Prediction and Display

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society

PROTEIN STRUCTURE PREDICTION II

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Building 3D models of proteins

Study of Mining Protein Structural Properties and its Application

SUPPLEMENTARY INFORMATION

Biol Introduction to Bioinformatics

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Protein Structure Prediction 11/11/05

Biophysics 101: Genomics & Computational Biology. Section 8: Protein Structure S T R U C T U R E P R O C E S S. Outline.

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Presenter: She Zhang

Packing of Secondary Structures

Prediction and refinement of NMR structures from sparse experimental data

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Motif Prediction in Amino Acid Interaction Networks

Protein Structure and Function Prediction using Kernel Methods.

Homology and Information Gathering and Domain Annotation for Proteins

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

Secondary and sidechain structures

Get familiar with PDBsum and the PDB Extract atomic coordinates from protein data files Compute bond angles and dihedral angles

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Prediction of protein function from sequence analysis

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Computational Molecular Modeling

Structure of proteins modeling and drug design

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein Structures. 11/19/2002 Lecture 24 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

Contact map guided ab initio structure prediction

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Orientational degeneracy in the presence of one alignment tensor.

Transcription:

Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond labeling Value sigma ----------------------------------------------------------------- C-N C-NH1 (except Pro) 1.329 0.014 C-N (Pro) 1.341 0.016 C-O C-O 1.231 0.020 Calpha-C CH1E-C (except Gly) 1.525 0.021 CH2G*-C (Gly) 1.516 0.018 Calpha-Cbeta CH1E-CH3E (Ala) 1.521 0.033 CH1E-CH1E (Ile,Thr,Val) 1.540 0.027 CH1E-CH2E (the rest) 1.530 0.020 N-Calpha NH1-CH1E (except Gly,Pro) 1.458 0.019 NH1-CH2G* (Gly) 1.451 0.016 N-CH1E (Pro) 1.466 0.015 ----------------------------------------------------------------- Bond angles (Procheck) ------------------------------------------------------------------ Angle labeling Value sigma ------------------------------------------------------------------ C-N-Calpha C-NH1-CH1E (except Gly,Pro) 121.7 1.8 C-NH1-CH2G* (Gly) 120.6 1.7 C-N-CH1E (Pro) 122.6 5.0 Calpha-C-N CH1E-C-NH1 (except Gly,Pro) 116.2 2.0 CH2G*-C-NH1 (Gly) 116.4 2.1 CH1E-C-N (Pro) 116.9 1.5 Calpha-C-O CH1E-C-O (except Gly) 120.8 1.7 CH2G*-C-O (Gly) 120.8 2.1 ------------------------------------------------------------------ Procheck output a. Ramachandran plot quality - percentage of the protein's residues that are in the core regions of the Ramachandran plot. b. Peptide bond planarity - standard deviation of the protein structure's omega torsion angles. c. Bad non-bonded interactions - number of bad contacts per 100 residues. d. Cα tetrahedral distortion - standard deviation of the ζ torsion angle (Cα, N, C, and Cβ). e. Main-chain hydrogen bond energy - standard deviation of the hydrogen bond energies for main-chain hydrogen bonds. f. Overall G-factor - average of different G- factors for each residue in the structure. Procheck output Procheck output

Procheck output Procheck output - backbone G factors Procheck output - all atom G factors Dynamics of Database Growth 100000000 1000000 10000 EMBL PDB 100 1983 1987 1991 1995 1999 2003 PDB Current Holdings (11/07/07) PDB Structures by Resolution Proteins Nucleic Acids Molecule Type Protein/ NA Compl Other Total X-ray 37254 1000 1726 24 40004 Exp. Meth. NMR EM Other Total 5948 109 83 43394 789 11 4 1804 136 40 4 1906 7 0 2 33 6880 160 93 47137

PDB Redundancy PDB Growth Method Description 95% identity (one chain) 90% identity (one chain) 70% identity (one chain) 50% identity (one chain) 40% identity (one chain) 30% identity (one chain) # of Clusters 18106 17367 15507 13201 11457 9458 Growth of New Folds in PDB Growth of New Folds in PDB "new folds" (blue) and "old folds" (orange) Secondary Structure: Computational Problems Structural classes of proteins Secondary structure characterization Secondary structure assignment Secondary structure prediction Protein structure classification all α all β α/β

Protein Structure Classification SCOP - Structural Classification of Proteins FSSP - Fold classification based on Structure-Structure alignment of Proteins CATH - Class, architecture, topology and homologous superfamily Current release: 1.75 38221 PDB Entries (June 2009). 110800 Domains. http://scop.mrc-lmb.cam.ac.uk/scop/ The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. Proteins are classified to reflect both structural and evolutionary relatedness. Many levels exist in the hierarchy; the principal levels are family, superfamily and fold Family: Clear evolutionarily relationship Superfamily: Probable common evolutionary origin Fold: Major structural similarity Family: Clear evolutionarily relationship Proteins clustered together into families are clearly evolutionarily related. Generally, this means that pairwise residue identities between the proteins are 30% and greater. However, in some cases similar functions and structures provide definitive evidence of common descent in the absense of high sequence identity; for example, many globins form a family though some members have sequence identities of only 15%. Superfamily: Probable common evolutionary origin Proteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies. For example, actin, the ATPase domain of the heat shock protein, and hexakinase together form a superfamily. Fold: Major structural similarity Proteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. Different proteins with the same fold often have peripheral elements of secondary structure and turn regions that differ in size and conformation. In some cases, these differing peripheral regions may comprise half the structure. Proteins placed together in the same fold category may not have a common evolutionary origin: the structural similarities could arise just from the physics and chemistry of proteins favoring certain packing arrangements and chain topologies. SCOP Statistics Class Folds Super Families families All alpha proteins 179 299 480 All beta proteins 126 248 462 Alpha and beta proteins (a/b) 121 199 542 Alpha and beta proteins (a+b) 234 349 567 Multi-domain proteins 38 38 53 Membrane and cell surface proteins36 66 73 Small proteins 66 95 150 Total 800 1294 2327

SCOP Statistics Protein Modeling Methods Class All alpha proteins All beta proteins Alpha/beta proteins (a/b) Alpha+beta proteins (a+b) Multi-domain proteins Membrane and cell surface proteins Small proteins Total Number of folds 284 174 147 376 66 58 90 1195 Number of superfamilies 507 354 244 552 66 110 129 1962 Number of families 871 742 803 1055 89 123 219 3902 Ab initio methods: solution of a protein folding problem search in conformational space Energy-based methods: energy minimization molecular simulation Knowledge-based methods: homology modeling fold recogniion Knowledge Knowledge is... a pattern that exceeds certain threshold of interestingness. Factors that contribute to interestingness: coverage confidence statistical significance simplicity unexpectedness actionability Knowledge-based methods Finding patterns in known structures Deriving rules (usually in the form of PMF) Applying the rules Fold Recognition Threading Pattern searching sequence patterns structure patterns residue composition patterns Threading sequence-structure compatibility structure-sequence compatibility Sequence-structure compatibility (fold recognition) Structure-sequence compatibility (inverse folding)

Threading Only the local environment is taken into account Non-local contacts are assumed with generic peptide No gaps are allowed in the alignment Homology Modeling Identification of structurally conserved regions (using multiple alignment) Backbone construction (based on SCR) Loop construction (KB or conformational search) Side-chain restoration (KB, rotamer, or MM) Structure verification and evaluation Structure refinement (energy minimization) Homology Modeling Programs Modeller (http://guitar.rockefeller.edu/modeller) Swiss-Model (http://www.expasy.ch/swissmod) Whatif (http://www.cmbi.kun.nl/whatif) Swiss-Model Method: Knowledge-based approach. Requirements: At least one known 3D-structure of a related protein. Good quality sequence alignements. Procedures: Superposition of related 3D-structures. Generation of a multiple a alignement. Generation of a framework for the new sequence. Rebuild lacking loops. Complete and correct backbone. Correct and rebuild side chains. Verify model structure quality and check packing. Refine structure by energy minimisation and molecular dynamics. Methods and Programs used by Swiss-Model Sequence Alignment BLAST (Altschul S.F., J. Mol. Biol. 215:403, 1990) SIM (Huang, X., Miller, M. Adv. Appl. Math.12:337, 1991) ProModII (Peitsch, M.C. Unpublished, Server-specific tool) Knowledge Based Protein Modelling ProMod (Peitsch M.C. Biochem Soc Trans 24:274, 1996) Energy Minimisation Gromos96 (van Gunsteren W.F. http://igc.ethz.ch/gromos/) Model evaluation Swiss-PdbViewer (http://www.expasy.ch/spdbv/mainpage.html) Swiss-Model Request Types First Approach mode. Optimise mode. Combine mode. GPCR mode.

Model Confidence Factors The Model B-factors are determined as follows: The number of template structures used for model building. The deviation of the model from the template structures. The Distance trap value used for framework building. The Model B-factor is computed as: 85.0 * (1/ # selected template str.) * (Distance trap / 2.5) and 99.9 for all atoms added during loop and side-chain building