Protein structure analysis. Risto Laakso 10th January 2005

Similar documents
HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Introduction to Comparative Protein Modeling. Chapter 4 Part I

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

Protein Structure: Data Bases and Classification Ingo Ruczinski

Bioinformatics. Macromolecular structure

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Protein dynamics. Folding/unfolding dynamics. Fluctuations near the folded state

Physiochemical Properties of Residues

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Homology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics

Protein structure alignments

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

ALL LECTURES IN SB Introduction

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Modeling for 3D structure prediction

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Supporting Online Material for

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

Heteropolymer. Mostly in regular secondary structure

Conformational Geometry of Peptides and Proteins:

Full wwpdb X-ray Structure Validation Report i

Full wwpdb X-ray Structure Validation Report i

Secondary and sidechain structures

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

CAP 5510 Lecture 3 Protein Structures

Analysis and Prediction of Protein Structure (I)

Homology modeling of Ferredoxin-nitrite reductase from Arabidopsis thaliana

Peptides And Proteins

Biomolecules: lecture 9

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Supplemental Materials for. Structural Diversity of Protein Segments Follows a Power-law Distribution

Universal Similarity Measure for Comparing Protein Structures

Protein Structure Basics

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

wwpdb X-ray Structure Validation Summary Report

Introduction to" Protein Structure

Full wwpdb X-ray Structure Validation Report i

NMR, X-ray Diffraction, Protein Structure, and RasMol

Comparing Protein Structures. Why?

Protein Structure Refinement Using 13 C α Chemical. Shift Tensors. Benjamin J. Wylie, Charles D. Schwieters, Eric Oldfield and Chad M.

Molecular Modeling lecture 17, Tue, Mar. 19. Rotation Least-squares Superposition Structure-based alignment algorithms

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Finding Similar Protein Structures Efficiently and Effectively

Protein Structures. 11/19/2002 Lecture 24 1

Full wwpdb/emdatabank EM Map/Model Validation Report i

Report of protein analysis

Protein Structure & Motifs

CS612 - Algorithms in Bioinformatics

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

Getting To Know Your Protein

BCMP 201 Protein biochemistry

Basics of protein structure

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Full wwpdb X-ray Structure Validation Report i

YAPS: Yet Another Protein Similarity

Part II => PROTEINS and ENZYMES. 2.3 PROTEIN STRUCTURE 2.3a Secondary Structure 2.3b Tertiary Structure 2.3c Quaternary Structure

Full wwpdb X-ray Structure Validation Report i

April, The energy functions include:

Full wwpdb X-ray Structure Validation Report i

Overview. The peptide bond. Page 1

QUANTA Protein Design MAY 2006

Full wwpdb X-ray Structure Validation Report i

Protein Structure and Function. Protein Architecture:

Full wwpdb X-ray Structure Validation Report i

Report of protein analysis

D Dobbs ISU - BCB 444/544X 1

STRUCTURAL BIOINFORMATICS I. Fall 2015

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Ab-initio protein structure prediction

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Full wwpdb X-ray Structure Validation Report i

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Molecular Structure Prediction by Global Optimization

The Structure and Functions of Proteins

Structure to Function. Molecular Bioinformatics, X3, 2006

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

BIRKBECK COLLEGE (University of London)

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon

Full wwpdb NMR Structure Validation Report i

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Protein Structure Prediction, Engineering & Design CHEM 430

Computational Molecular Biology. Protein Structure and Homology Modeling

Introduction to Computational Structural Biology

Ramachandran Plot. 4ysz Phi (degrees) Plot statistics

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Prediction and refinement of NMR structures from sparse experimental data

Structure determination through NMR

Ultra-high resolution structures in validation

Transcription:

Protein structure analysis Risto Laakso risto.laakso@hut.fi 10th January 2005 1

1 Summary Various methods of protein structure analysis were examined. Two proteins, 1HLB (Sea cucumber hemoglobin) and 1HLM (Sea cucumber hemoglobin cyano-met), were selected for analysis. They are structurally quite similar (i.e. both are sea cucumber hemoglobin), but there is also a significant difference between them (1HLM has cyanide ion). 1HLB HEMOGLOBIN 1HLM HEMOGLOBIN (CYANO-MET) (SEA CUCUMBER) (SEA CUCUMBER) cyanmethemoglobin: a tightly bound complex of methemoglobin with the cyanide ion. [Dorland s Medical Dictionary] 2

y 2 Structure comparision 2.1 Root mean square deviation Structures of two proteins can be compared using root mean square deviation. The root mean square deviation measures the difference between C α atom positions between two proteins. The smaller the deviation, the more spatially equivalent the two proteins are. Ideally, it should be 0.0 for two same proteins, but measurement errors and other variations cause deviation. The formula for root mean square deviation is defined as RMSD = 1 N N i=1 r model i r real i 2 where ri model, and ri real are the positions of i:th C α atoms in model and real protein. Using the 1HLB hemoglobin C α atom positions, root mean square deviation was tested. Because it requires two protein-samples, the original atom positions were duplicated, and some random variation was introduced to the second set (this could be e.g. measurement error). Below is the plot for these two samplesets, and the calculated RMSD ( 0.8). 35 Root mean square deviation example 1HLB C α atom positions 1HLB C α atom positions with little variation 30 25 20 15 10 RMSD=0.81157 5 10 5 0 5 10 15 20 25 30 35 x 2.2 Distance-matrix alignment (DALI) As proteins evolve, their structure changes. Because of spatially different structures, simple root mean square deviation will not give very good results for two related proteins. However, patterns of contacts between residues tend to stay similar between related proteins. Therefore if we analyze the patterns of contacts, we should be able to identify related proteins. 3

The distance-matrix alignment (DALI) was developed by Liisa Holm and Chris Sander to analyze these patterns of contacts. A paper of DALI is available on the internet at http://www.ebi.ac.uk/dali/dali jmb.html. In DALI, the three-dimensional coordinates of each protein are used to calculate residue-residue (C α C α ) distance matrices. Using DALI for 1HLB and 1HLM, the following results were achieved: No Chain raw-score Z-score %id lali rmsd Description 1 1hlb 1932.5 30.7 100 157 0.0 HEMOGLOBIN (SEA CUCUMBER) 11 1hlm 1036.6 15.5 58 155 2.9 HEMOGLOBIN (CYANO-MET) (SEA CUCUMBER) Raw-score = the sum of weighted similarities of intramolecular distances that Dali maximizes. Z-score = score mean deviation. Z-score is normalized, so for identical proteins we have about Z=30, and for quite similar proteins Z=15. The %id denotes the percentual identicalness, 100% for same protein, and 58% for 1HLB vs. 1HLM. 4

3 Stereochemical quality 3.1 Ramachandran plot The Ramachandran plot shows the φ ψ torsion angles for all residues in the structure (except those at the chain termini) A fragment of polypetide chain common to all protein structures is shown below. Rotation is permitted around the N-Ca and Ca-C single bonds of all residues. The angles φ and ψ around these bonds, and the angle of rotation around the peptide bond, ω, define the conformation of a residue. The peptide bond itself tends to be planar with two allowed states: trans, ω=180 (usually) and cis, ω =0(rarely). The sequence of φ, ψ and ω angles of all residues in a protein defines the backbone conformation. α C N H O ψ C ω H Cα H N φ Cβ C Cα O Conformational angles of the polypeptide backbone. Ramachandran plots for the two proteins 1HLB and 1HLM were generated. The resulting plots can be seen below. The area background color denotes the probability that a given residue should have these angles, with red being most probable, then brown, dark yellow and light yellow as least probable. 5

1HLB HEMOGLOBIN (SEA CUCUMBER) 1HLM HEMOGLOBIN (CYANO-MET) (SEA CUCUMBER) 6

4 Structure classification The CATH database is a hierarchical domain classification of protein structures in the Brookhaven protein databank. There are four major levels in this hierarchy; Class, Architecture, Topology (fold family) and Homologous superfamily. Comparing 1HLB and 1HLM, one can observe that their classification is almost the same (as it should be). 1HLB 1HLB 1HLM 1HLM Class 1 Mainly alpha 1 Mainly alpha Architecture 1.10 Orthogonal bundle 1.10 Orthogonal bundle Topology 1.10.490 Globin-like 1.10.490 Globin-like Homologous superfamily 1.10.490.10 Globins 1.10.490.10 Globins Sequence family 1.10.490.10.1 Globins 1.10.490.10.1 Globins Non-identical 1.10.490.10.1.2 Globins 1.10.490.10.1.1 Globins Identical 1.10.490.10.1.2.1 Globins 1.10.490.10.1.1.1 Globins 7

5 Structure verification Protein structure verification is meant both for PDB-structure depositors and users; for depositors to check whether their PDB is good enough to be submitted, and for users to measure if the quality of PDB is good enough for use. A WHAT IF program available from the internet (at http://www.cmbi.kun.nl/gv/whatcheck/) can be used to verify protein structure. It performs consistency checks (chain naming, atom weights, etc), symmetry checks, geometry checks (chirality, bond lengths, bond angles, torsion angles, etc) and structural checks. The WHAT IF report on 1HLB can be run on the internet at address http://www.cmbi.kun.nl/cgibin/nonotes?pdbid=1hlb. Final summary for the users of the structure states: Structure Z-scores, positive is better than average: 1st generation packing quality : -0.664 2nd generation packing quality : -2.396 Ramachandran plot appearance : -2.973 chi-1/chi-2 rotamer normality : -3.072 (poor) Backbone conformation : -1.099 RMS Z-scores, should be close to 1.0: Bond lengths : 0.826 Bond angles : 1.796 Omega angle restraints : 0.854 Side chain planarity : 0.988 Improper dihedral distribution : 1.235 Inside/Outside distribution : 0.965 8

6 References [1] Lesk A. M. Introduction to bioinformatics. Oxford University Press 2002. [2] Holm L, Sander C. Protein structure comparision by alignment of distance matrices. J. Mol. Biol. 233: 123-138, 1993. http://www.ebi.ac.uk/dali/dali jmb.html. [3] PDB Lite: Find Macromolecules, http://oca.ebi.ac.uk/oca-bin/pdblite. [4] CATH Protein Structure Classification, http://www.biochem.ucl.ac.uk/bsm/cath/. [5] WHAT IF, http://swift.cmbi.kun.nl/whatif/. 9