Protein structure analysis Risto Laakso risto.laakso@hut.fi 10th January 2005 1
1 Summary Various methods of protein structure analysis were examined. Two proteins, 1HLB (Sea cucumber hemoglobin) and 1HLM (Sea cucumber hemoglobin cyano-met), were selected for analysis. They are structurally quite similar (i.e. both are sea cucumber hemoglobin), but there is also a significant difference between them (1HLM has cyanide ion). 1HLB HEMOGLOBIN 1HLM HEMOGLOBIN (CYANO-MET) (SEA CUCUMBER) (SEA CUCUMBER) cyanmethemoglobin: a tightly bound complex of methemoglobin with the cyanide ion. [Dorland s Medical Dictionary] 2
y 2 Structure comparision 2.1 Root mean square deviation Structures of two proteins can be compared using root mean square deviation. The root mean square deviation measures the difference between C α atom positions between two proteins. The smaller the deviation, the more spatially equivalent the two proteins are. Ideally, it should be 0.0 for two same proteins, but measurement errors and other variations cause deviation. The formula for root mean square deviation is defined as RMSD = 1 N N i=1 r model i r real i 2 where ri model, and ri real are the positions of i:th C α atoms in model and real protein. Using the 1HLB hemoglobin C α atom positions, root mean square deviation was tested. Because it requires two protein-samples, the original atom positions were duplicated, and some random variation was introduced to the second set (this could be e.g. measurement error). Below is the plot for these two samplesets, and the calculated RMSD ( 0.8). 35 Root mean square deviation example 1HLB C α atom positions 1HLB C α atom positions with little variation 30 25 20 15 10 RMSD=0.81157 5 10 5 0 5 10 15 20 25 30 35 x 2.2 Distance-matrix alignment (DALI) As proteins evolve, their structure changes. Because of spatially different structures, simple root mean square deviation will not give very good results for two related proteins. However, patterns of contacts between residues tend to stay similar between related proteins. Therefore if we analyze the patterns of contacts, we should be able to identify related proteins. 3
The distance-matrix alignment (DALI) was developed by Liisa Holm and Chris Sander to analyze these patterns of contacts. A paper of DALI is available on the internet at http://www.ebi.ac.uk/dali/dali jmb.html. In DALI, the three-dimensional coordinates of each protein are used to calculate residue-residue (C α C α ) distance matrices. Using DALI for 1HLB and 1HLM, the following results were achieved: No Chain raw-score Z-score %id lali rmsd Description 1 1hlb 1932.5 30.7 100 157 0.0 HEMOGLOBIN (SEA CUCUMBER) 11 1hlm 1036.6 15.5 58 155 2.9 HEMOGLOBIN (CYANO-MET) (SEA CUCUMBER) Raw-score = the sum of weighted similarities of intramolecular distances that Dali maximizes. Z-score = score mean deviation. Z-score is normalized, so for identical proteins we have about Z=30, and for quite similar proteins Z=15. The %id denotes the percentual identicalness, 100% for same protein, and 58% for 1HLB vs. 1HLM. 4
3 Stereochemical quality 3.1 Ramachandran plot The Ramachandran plot shows the φ ψ torsion angles for all residues in the structure (except those at the chain termini) A fragment of polypetide chain common to all protein structures is shown below. Rotation is permitted around the N-Ca and Ca-C single bonds of all residues. The angles φ and ψ around these bonds, and the angle of rotation around the peptide bond, ω, define the conformation of a residue. The peptide bond itself tends to be planar with two allowed states: trans, ω=180 (usually) and cis, ω =0(rarely). The sequence of φ, ψ and ω angles of all residues in a protein defines the backbone conformation. α C N H O ψ C ω H Cα H N φ Cβ C Cα O Conformational angles of the polypeptide backbone. Ramachandran plots for the two proteins 1HLB and 1HLM were generated. The resulting plots can be seen below. The area background color denotes the probability that a given residue should have these angles, with red being most probable, then brown, dark yellow and light yellow as least probable. 5
1HLB HEMOGLOBIN (SEA CUCUMBER) 1HLM HEMOGLOBIN (CYANO-MET) (SEA CUCUMBER) 6
4 Structure classification The CATH database is a hierarchical domain classification of protein structures in the Brookhaven protein databank. There are four major levels in this hierarchy; Class, Architecture, Topology (fold family) and Homologous superfamily. Comparing 1HLB and 1HLM, one can observe that their classification is almost the same (as it should be). 1HLB 1HLB 1HLM 1HLM Class 1 Mainly alpha 1 Mainly alpha Architecture 1.10 Orthogonal bundle 1.10 Orthogonal bundle Topology 1.10.490 Globin-like 1.10.490 Globin-like Homologous superfamily 1.10.490.10 Globins 1.10.490.10 Globins Sequence family 1.10.490.10.1 Globins 1.10.490.10.1 Globins Non-identical 1.10.490.10.1.2 Globins 1.10.490.10.1.1 Globins Identical 1.10.490.10.1.2.1 Globins 1.10.490.10.1.1.1 Globins 7
5 Structure verification Protein structure verification is meant both for PDB-structure depositors and users; for depositors to check whether their PDB is good enough to be submitted, and for users to measure if the quality of PDB is good enough for use. A WHAT IF program available from the internet (at http://www.cmbi.kun.nl/gv/whatcheck/) can be used to verify protein structure. It performs consistency checks (chain naming, atom weights, etc), symmetry checks, geometry checks (chirality, bond lengths, bond angles, torsion angles, etc) and structural checks. The WHAT IF report on 1HLB can be run on the internet at address http://www.cmbi.kun.nl/cgibin/nonotes?pdbid=1hlb. Final summary for the users of the structure states: Structure Z-scores, positive is better than average: 1st generation packing quality : -0.664 2nd generation packing quality : -2.396 Ramachandran plot appearance : -2.973 chi-1/chi-2 rotamer normality : -3.072 (poor) Backbone conformation : -1.099 RMS Z-scores, should be close to 1.0: Bond lengths : 0.826 Bond angles : 1.796 Omega angle restraints : 0.854 Side chain planarity : 0.988 Improper dihedral distribution : 1.235 Inside/Outside distribution : 0.965 8
6 References [1] Lesk A. M. Introduction to bioinformatics. Oxford University Press 2002. [2] Holm L, Sander C. Protein structure comparision by alignment of distance matrices. J. Mol. Biol. 233: 123-138, 1993. http://www.ebi.ac.uk/dali/dali jmb.html. [3] PDB Lite: Find Macromolecules, http://oca.ebi.ac.uk/oca-bin/pdblite. [4] CATH Protein Structure Classification, http://www.biochem.ucl.ac.uk/bsm/cath/. [5] WHAT IF, http://swift.cmbi.kun.nl/whatif/. 9