Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates

Similar documents
Reconstruction of Protein Backbone with the α-carbon Coordinates *

A simple technique to estimate partition functions and equilibrium constants from Monte Carlo simulations

Peptide Backbone Reconstruction Using Dead-End Elimination and a Knowledge-Based Forcefield

Protein structure analysis. Risto Laakso 10th January 2005

Simulations of the folding pathway of triose phosphate isomerase-type a/,8 barrel proteins

Computer simulations of protein folding with a small number of distance restraints

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Computer design of idealized -motifs

Reconstruction of Protein Backbone with the α-carbon Coordinates

Protein modeling and structure prediction with a reduced representation

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics

Universal Similarity Measure for Comparing Protein Structures

Discrete restraint-based protein modeling and the C -trace problem

Orientational degeneracy in the presence of one alignment tensor.

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Conformational Geometry of Peptides and Proteins:

Folding of small proteins using a single continuous potential

CS 882 Course Project Protein Planes

Discrete representations of the protein C Xavier F de la Cruz 1, Michael W Mahoney 2 and Byungkook Lee

Discretized model of proteins. I. Monte Carlo study of cooperativity in homopolypeptides

Calculation of protein backbone geometry from a-carbon coordinates based on peptide-group dipole alignment

Prediction of the quaternary structure of coiled coils: GCN4 Leucine Zipper and its Mutants. Michal Vieth a. Andrzej Kolinskia,b

JOOYOUNG LEE*, ADAM LIWO*, AND HAROLD A. SCHERAGA*

The sequences of naturally occurring proteins are defined by

AbInitioProteinStructurePredictionviaaCombinationof Threading,LatticeFolding,Clustering,andStructure Refinement

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Supporting Online Material for

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

A Method for the Improvement of Threading-Based Protein Models

COMBINED MULTIPLE SEQUENCE REDUCED PROTEIN MODEL APPROACH TO PREDICT THE TERTIARY STRUCTURE OF SMALL PROTEINS

Protein Structure Prediction

Powerful Simulated-Annealing Algorithm Locates Global Minimum of Protein-Folding Potentials from Multiple Starting Conformations

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials

Prediction and refinement of NMR structures from sparse experimental data

What should the Z-score of native protein structures be?

Computer Simulations of De Novo Designed Helical Proteins

NMR-Structure determination with the program CNS

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Ab-initio protein structure prediction

Protein dynamics. Folding/unfolding dynamics. Fluctuations near the folded state

Course Notes: Topics in Computational. Structural Biology.

Helix-coil and beta sheet-coil transitions in a simplified, yet realistic protein model

Protein Structure Prediction and Display

BCMP 201 Protein biochemistry

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Computer simulation of polypeptides in a confinement

Computing Artificial Backbones of Hydrogen Atoms in order to Discover Protein Backbones

Protein Structure Analysis with Sequential Monte Carlo Method. Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University

Molecular Mechanics. I. Quantum mechanical treatment of molecular systems

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

The typical end scenario for those who try to predict protein

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach

Packing of Secondary Structures

The Structure and Functions of Proteins

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Coordinate Refinement on All Atoms of the Protein Backbone with Support Vector Regression

Protein Structure Prediction

Practical Manual. General outline to use the structural information obtained from molecular alignment

Protein Structure Prediction, Engineering & Design CHEM 430

arxiv:cond-mat/ v1 2 Feb 94

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Prediction of Protein Backbone Structure by Preference Classification with SVM

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Template Free Protein Structure Modeling Jianlin Cheng, PhD

CAP 5510 Lecture 3 Protein Structures

A NEW ALGORITHM FOR THE ALIGNMENT OF MULTIPLE PROTEIN STRUCTURES USING MONTE CARLO OPTIMIZATION

DATE A DAtabase of TIM Barrel Enzymes

CHEM 463: Advanced Inorganic Chemistry Modeling Metalloproteins for Structural Analysis

Secondary and sidechain structures

Effect of variations in peptide parameters and charges on the unperturbed dimensions of polypeptide chains

Homework 9: Protein Folding & Simulated Annealing : Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Build_model v User Guide

AN AB INITIO STUDY OF INTERMOLECULAR INTERACTIONS OF GLYCINE, ALANINE AND VALINE DIPEPTIDE-FORMALDEHYDE DIMERS

Section III - Designing Models for 3D Printing

New Optimization Method for Conformational Energy Calculations on Polypeptides: Conformational Space Annealing

Homology modeling of Ferredoxin-nitrite reductase from Arabidopsis thaliana

FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon

Unit 1: Chemistry - Guided Notes

Biomolecules: lecture 10

Protein Structure: Data Bases and Classification Ingo Ruczinski

An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

Protein Structure Prediction

TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6

Conformational Analysis

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

FlexPepDock In a nutshell

Molecular Structure Prediction by Global Optimization

Transcription:

Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates MARIUSZ MILIK, 1 *, ANDRZEJ KOLINSKI, 1, 2 and JEFFREY SKOLNICK 1 1 The Scripps Research Institute, Department of Molecular Biology, 10666 North Torrey Pines Road, La Jolla, California 92037; and 2 Department of Chemistry, University of Warsaw, Pasteura 1, 02093 Warsaw, Poland Received 5 September 1995; accepted 4 April 1996 ABSTRACT A method for generating a full backbone protein structure from the coordinates of -carbons, is presented. The method extracts information from known protein structures to generate statistical positions for the reconstructed atoms. Tests on a set of proteins structures show the algorithm to be of comparable accuracy to existing procedures. However, the basic advantage of the presented method is its simplicity and speed. In a test run, the present program is shown to be much faster than existing database searching algorithms, and reconstructs about 8000 residues per second. Thus, it may be included as an independent procedure in protein folding algorithms to rapidly generate approximate coordinates of backbone atoms. 1997 by John Wiley & Sons, Inc. Introduction M any molecular modeling protocols require algorithms for the reconstruction of protein backbone atoms from the positions of C carbons. Recently, several algorithms for full-atom backbone reconstruction 1 8 have been published. Basically, two kinds of approaches are used: exploitation of similarity to small fragments of known protein structures and minimization of the local *Author to whom all correspondence should be addressed. molecular energy. Some of the methods combine both of these approaches. The method proposed here was prepared to be used in flow in protein simulation algorithms. The emphasis here was placed upon the minimization of the calculation time. Fortunately, in spite of its simplicity, the method is of comparable accuracy to existing protocols. Calculation of the predicted backbone atom positions requires access to rather small tables based on statistical data and a few arithmetical calculations per residue. Using the proposed method, the frequent switching between a reduced Ž C -based. representation and an ( ) Journal of Computational Chemistry, Vol. 18, No. 1, 80 85 1997 1997 by John Wiley & Sons CCC 0192-8651 / 97 / 010080-06

RECONSTRUCTION OF A PROTEIN BACKBONE approximate full atom representation becomes computationally tractable. Description of Method The method is based on the observation that, for a given set of three consecutive C C vectors, the position of the central peptide plate is well defined. This probably reflects the specific interplay of angles restraints Ž see Fig. 1.. The idea presented above is not new. It was originally invoked by Purisima and Scheraga 9 and then used for development of a method of reconstruction of protein backbone and side-chain direction by Payne. 1 In the presented implementation, the local conformation of the peptide backbone fragment is represented by a combination of three internal distances: d i 1, i 1; d i, i 2; and d i 1, i 2, where di, j denotes the distance between C carbons in the ith and jth residues. Because most of the amino acids are chiral, the local conformation representation must contain the information about the chirality of the system to distinguish between left- and right-handed structures. Thus, d i, i 3 is defined by: Ž. d v v v 1 i, i 3 i 1 i i 1 where sign Ž v v. v i 1 i i 1, and vi denotes a virtual C -to-c vector from the ith to i 1 residue. Figure 2 presents histograms of di, i 2 and di, i 3 obtained by the statistical analysis of coordinates of 430 known protein structures from the Brookhaven Protein Data Bank Ž PDB.. 10 Most of the values for d lie between 4.6 and 7.6 A, and i, i 2 FIGURE 2. Histograms of values of d ( A) i, i+3 and d ( B) i, i+2 in 430 protein structures chosen from the Brookhaven Protein Data Bank ( see text ). the d values lie between 4.2 and 11.0A.We i, i 3 clustered the local conformations of four-residue peptide fragments using their values of three internal distances, d, d, and d Ž i, i 2 i 1, i 3 i, i 3 defined above.. Knowing that the average precision of atomic coordinates in the PDB is about 0.2 A,we used a 0.3-A grid in our definition of the local conformation class. This grid gave 10 possible values for di, i 2 and di 1, i 3 and 48 values for di, i 3 Žone must remember that the di, i 3 distance is chiral.. As a result, we have 4800 possible local conformations for four residue peptide fragments in the presented representation. For every local conformation, we calculated the average positions of C, O, N, and C atoms in a local coordinate system, defined below. The three C C vectors provide a reference frame in which the coordinates of the peptide bond can be represented. The basic idea of the reference system is shown in Figure 3. FIGURE 1. Definition of the central peptide plate, d i, i+3, and d i, i+2. Ci denotes position of the C atom of ith residue. FIGURE 3. Schematic view of the local coordinate system used in the present work. JOURNAL OF COMPUTATIONAL CHEMISTRY 81

MILIK, KOLINSKI, AND SKOLNICK The first vector in the local coordinate system is constructed according to the formula: Ž. x Ž v v. v v Ž 2. i p i p where vp is the sum of two consecutive backbone vectors: vp vi v i 1. The second axis is defined as the normalized cross-product of the vp and the x-axis, defined above: Ž p. y Ž v x. v x Ž 3. p and the third axis is orthogonal to x and y: Ž. Ž. Ž. z x y x y 4 The local coordination system defined this way depends on the secondary structure of the peptide fragment. An analogous convention for the local coordination system was proposed by Rey and Skolnick. 6 A table containing average positions of C, N, O, and C atoms for all types of local peptide structures was stored and used as a starting point in the process of backbone reconstruction. The case of cis-proline was treated separately, as it is an amino acid with unusual internal geometry. The data for these residues were stored in a separate database. The algorithm for all backbone atom reconstruction for a given residue, i, proceeds as follows: 1. Calculate the values of internal distances, d i 1, i 1, d i, i 2, and d i 1, i 2, and chirality Ž see the definitions in Fig. 1.. 2. Find the class of local conformation, according to the values of internal distances and chirality, using the grid defined above. 3. For the given local conformation class, find in the database the average C C, C N, C O, and C C vectors, in the local coordination system. In this case, when the local conformation class is not in our database Žthis may happen for less popular states., the algorithm takes the values for the closest represented local conformation class. For cis-prolines, these vectors are taken from a separate database, created only for this case. Of course, the C C vector is not calculated for glycines. In the case of the C C vector, the statistics are collected for the first C atom in the central peptide plane Ž the C position in Fig. 1.. i 1 4. Build the local coordination system according to Eqs. Ž. 1 Ž.Ž 3 see Fig. 3 and its text description.. 5. Rotate the vectors from the local to the laboratory coordinate system and calculate the actual values of atomic coordinates. Using the above procedure, which starts from C coordinates, it is possible to calculate predicted coordinates of all backbone atoms for residues from 2 to N 2 Ž N is the number of residues in the protein chain under consideration.. The problem of calculating the backbone atomic coordinates for the first and two last residues of the chain was approximately solved by using virtual residues on both ends of the chain. The coordinates of the dummy residues were calculated by repetition of the first Žfor the beginning of the chain. and last two vectors Ž for the end.. The proposed algorithm is a step in the application of the homology modeling ideas into the backbone reconstruction problem. The idea of using information about known protein structures in the process of backbone reconstruction is not new Že.g., see Levitt 4.. The basic difference is that our method eliminates the necessity of on-line PDB database searching, and therefore speeds up the reconstruction process without a significant loss in precision. This opens up the possibility of new areas of application such as in the inclusion of the method into Monte Carlo protein structure modeling algorithms. Results and Discussion The method was tested on a set of 15 protein structures from the PDB. We have chosen the structures that were used in previous works 1 7 to compare our algorithm with already published methods. The structures used in the process of preparation of our statistical database were excluded from the testing set. In the testing process, the coordinates of backbone atoms and C were calculated using our algorithm on the basis of C coordinates from the PDB file. The predicted coordinates were then compared with the crystallographic ones from PDB, and the distances between the crystallographic and predicted coordinates were calculated. To compare with other methods, we have also calculated rootmean-squared Ž RMS. distances for these atoms and 82 VOL. 18, NO.1

RECONSTRUCTION OF A PROTEIN BACKBONE the average RMS for backbone atoms. The results are summarized in Table I. For comparison, the same table also presents results obtained for these structures, using the previously published algorithms cited in the present work. 1 7 Table I presents the values of RMS calculated individually for N, C, O, and C atoms, and the average RMS for all backbone atoms. The comparison with other works shows that, regardless of its speed and simplicity, our algorithm gives a similar level of prediction accuracy as other approaches. The best reconstruction precision is obtained for the nitrogen and carbon atoms, where the RMS is in the range 0.08 0.30 A. This means that the average error of reconstruction for these atoms is on the level of average error of the experimental data in the PDB. The precision of reconstruction of C atom coordinates is on a similar level with the RMS and is in the range 0.140 0.475 A. The worst reconstruction precision is obtained from the carbonyl oxygen atoms, whose RMS values range from 0.341 to 0.959 A; however, the diminished reconstruction accuracy of the carbonyl oxygen atom is typical of other methods as well. 6 Additional information about the test results are presented in Figure 4, which shows histograms of errors of reconstruction for the backbone and C atoms for all the molecules from the testing set. Most reconstruction errors for the N and C atoms are less than 0.5 A, and there is no example where the error is larger than 1.0 A.A similar situation is FIGURE 4. Histograms of errors of reconstruction for backbone atoms and C, for the 15 testing proteins, obtained using the present algorithm. obtained for most cases of C atom reconstruction. However, the error is larger than 1.0 A for several atoms and, in one case, it reached 2.5 A. The distribution of errors is broader for oxygen atom reconstruction and, in a few cases, errors are larger than 2.0 A, with a maximal reconstruction error of about 3.5 A. The large values of reconstruction errors occur mostly in the loop or turn fragments and on the surface of proteins. The reconstruction precision is much better for fragments within well-defined secondary structures this result is typical when a statistical method is used. Additionally, the quality of the crystallographic structure affects the error of reconstruction. Table II presents errors of reconstruction obtained for different structures of the protein, triosephosphate isomerase Ž TIM., for chicken Ž Gallus gallus.. The worst quality of refinement is obtained for the structure with the poorest resolution Ž 1tim.. The precision of backbone reconstruction for other structures is better, and is on the level of the average for the tested set of structures Žsee Table I.. Figure 5A and B presents individual reconstruction errors for the backbone atoms for testing structures with the best Ž 2wrp, Fig. 5A. and the worst Ž 1tim, Fig. 5B. average RMS. In the case of the best structure, 2wrp Ž Fig. 5A., one can see the correlation between errors in reconstruction of nitrogen and carbon atoms and errors in reconstruction of carbonyl oxygens. There is only one position where the error of oxygen reconstruction is larger than 2 A Ž residue 60.; all other backbone atoms are reconstructed with error on the level of the average experimental error. Figure 5B presents the worst case, for the 1tim structure. The reconstruction of the nitrogen and carbon atoms is satisfactory. Even in this case, most of the errors are smaller than 0.5 A.However, the errors of reconstruction of carbonyl oxy- gens are much larger and, in eight cases, they exceed 2 A. Most of the large errors happened in residues situated on the surface of the 1tim structure. In conclusion, the method described here is of comparable accuracy to existing algorithms, but it is much faster. In the test run, on a Sparc 10 work station with a GNU C compiler without optimization, our algorithm reconstructs about 8000 residues per second. The presented results offer the possibility of several new applications. On the most trivial level, JOURNAL OF COMPUTATIONAL CHEMISTRY 83

MILIK, KOLINSKI, AND SKOLNICK TABLE I. Values of RMS Distances for N, C, O, and C Atoms Calculated for Testing Protein Structures Using the Present Algorithm. a RMS distances from this work ( A ) RMS distances from prior works ( A ) Backbone Classens Holm and Reid and Rey and 2 5 3 4 1 7 6 Protein N C O C average Correa et al. Sander Levitt Payne Thornton Skolnick 2alp 0.182 0.243 0.724 0.241 0.453 0.19 x x x 0.30 x x 3app 0.188 0.230 0.657 0.279 0.416 x x x x x x x 5cpa 0.245 0.267 0.749 0.320 0.480 x 0.61 x x 0.33 x x 1crn 0.141 0.208 0.660 0.194 0.408 x x x 0.56 0.20 x x 1ctf 0.212 0.265 0.721 0.267 0.461 x x x 0.29 0.19 x x 2cts 0.162 0.197 0.587 0.249 0.369 x 0.54 x x 0.33 x x 4fxn 0.164 0.204 0.668 0.223 0.414 x x x x 0.39 x x 3fxn 0.194 0.270 0.842 0.345 0.522 0.49 x x 0.44 x 0.51 0.77 2mhr 0.208 0.269 0.714 0.321 0.457 x x x x 0.22 x 0.78 2prk 0.145 0.190 0.672 0.224 0.358 x x 0.49 x 0.26 x x 6pti 0.149 0.186 0.616 0.400 0.381 x x x 0.51 0.32 x x 1tim 0.231 0.301 0.959 0.475 0.595 x 0.55 x x 0.50 x 0.68 1ubq 0.112 0.171 0.523 0.198 0.324 x x 0.42 x 0.25 x x 9wga 0.166 0.233 0.725 0.359 0.450 x x x x x x x 2wrp 0.085 0.108 0.341 0.140 0.212 x x 0.46 x 0.18 x x a The backbone atom reconstruction results obtained for these structures in previously published studies are presented for comparison. 84 VOL. 18, NO.1

RECONSTRUCTION OF A PROTEIN BACKBONE TABLE II. Comparison of Reconstruction Errors for Different Structures of Triosephosphate Isomerase ( TIM ). Resolution N RMS C RMS O RMS C RMS Backbone Structure ( A ) ( A ) ( A ) ( A ) ( A ) average ( A ) 1tph 1.8 0.124 0.164 0.522 0.189 0.324 1tim 2.5 0.231 0.301 0.959 0.475 0.595 1tpu 1.9 0.125 0.151 0.475 0.172 0.297 1tpb 1.9 0.121 0.156 0.503 0.180 0.312 1tpc 1.9 0.135 0.180 0.574 0.261 0.356 1tpv 1.9 0.129 0.161 0.505 0.174 0.315 1tpw 1.9 0.143 0.186 0.591 0.193 0.367 quire approximate coordinates of the backbone atoms. 13 In such algorithms, the backbone reconstruction has to be performed on the order of 10 7 times in a single simulation run. The proposed protocol makes this kind of task feasible. Acknowledgments The list of protein structures and statistical data used for these calculations are available via anonymous ftp scripps.edu from directory pub milik backbone. The research was supported by NIH Grant No. GM-37408. A. Kolinski was partly supported by the University of Warsaw Grant BST 502-34 95. A.K. is an International Research Scholar of the Howard Hughes Medical Institute. References FIGURE 5. Individual reconstruction errors for backbone atoms for 2wrp ( A) and the first 140 residues of 1tim ( B) structures. it can be used to generate good starting points for much more expensive and accurate procedures. However, the main area of application is expected to be in folding simulation algorithms employing a reduced C representation of protein conformations, where the energy evaluation may 11, 12 re- 1. P. W. Payne, Prot. Sci., 2, 315 324 Ž 1993.. 2. P. E. Correa, Proteins, 7, 366 Ž 1990.. 3. L. Holm and C. Sander, J. Mol. Biol., 218, 183 Ž 1991.. 4. M. Levitt, J. Mol. Biol., 226, 507 Ž 1992.. 5. M. Classens, E. van Cutsem, I. Lasters, and S. Wodak, Prot. Eng., 2, 335 Ž 1989.. 6. A. Rey and J. Skolnick, J. Comput. Chem., 13, 443 Ž 1992.. 7. L. Reid and J. Thornton, J. Prot., 5, 170 Ž 1990.. 8. A. Liwo, M. R. Pincus, R. J. Wawak, S. Rackovsky, and H. A. Sheraga, Prot. Sci., 2, 1697 Ž 1993.. 9. E.O.Purisima and H. Scheraga, Biopolymers, 23, 1207 Ž 1984.. 10. F. C. Bernstein, et al., J. Mol. Biol., 112, 535 Ž 1977.. 11. A. Kolinski and J. Skolnick, Proteins, 18, 353 Ž 1994.. 12. A. Kolinski and J. Skolnick, Proteins, 18, 338 Ž 1994.. 13. A. Kolinski, M. Milik, J. Rycombel, and J. Skolnick, J. Chem. Phys., 103, 4312 Ž 1995.. JOURNAL OF COMPUTATIONAL CHEMISTRY 85