Helmholtz free energies of atom pair interactions in proteins Manfred J Sippl, Maria Ortner, Markus Jaritz, Peter Lackner and Hannes Flöckner

Similar documents
Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

From Amino Acids to Proteins - in 4 Easy Steps

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration:

Section Week 3. Junaid Malek, M.D.

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Dental Biochemistry EXAM I

Computer simulations of protein folding with a small number of distance restraints

Solutions and Non-Covalent Binding Forces

AN AB INITIO STUDY OF INTERMOLECULAR INTERACTIONS OF GLYCINE, ALANINE AND VALINE DIPEPTIDE-FORMALDEHYDE DIMERS

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Chem. 27 Section 1 Conformational Analysis Week of Feb. 6, TF: Walter E. Kowtoniuk Mallinckrodt 303 Liu Laboratory

BCH 4053 Exam I Review Spring 2017

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram

Packing of Secondary Structures

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Introduction to Computational Structural Biology

Lecture 34 Protein Unfolding Thermodynamics

arxiv: v1 [cond-mat.soft] 22 Oct 2007

1. What is an ångstrom unit, and why is it used to describe molecular structures?

Proteins polymer molecules, folded in complex structures. Konstantin Popov Department of Biochemistry and Biophysics

Short Announcements. 1 st Quiz today: 15 minutes. Homework 3: Due next Wednesday.

Modeling Background; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 8

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

Supplementary Information

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Protein Struktur (optional, flexible)

Problem Set 1

THE UNIVERSITY OF MANITOBA. PAPER NO: _1_ LOCATION: 173 Robert Schultz Theatre PAGE NO: 1 of 5 DEPARTMENT & COURSE NO: CHEM / MBIO 2770 TIME: 1 HOUR

titin, has 35,213 amino acid residues (the human version of titin is smaller, with only 34,350 residues in the full length protein).

Molecular Interactions F14NMI. Lecture 4: worked answers to practice questions

Dental Biochemistry Exam The total number of unique tripeptides that can be produced using all of the common 20 amino acids is

Lecture 11: Protein Folding & Stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

Read more about Pauling and more scientists at: Profiles in Science, The National Library of Medicine, profiles.nlm.nih.gov

Introduction to" Protein Structure

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

12/6/12. Dr. Sanjeeva Srivastava IIT Bombay. Primary Structure. Secondary Structure. Tertiary Structure. Quaternary Structure.

Aqueous solutions. Solubility of different compounds in water

Biophysics II. Hydrophobic Bio-molecules. Key points to be covered. Molecular Interactions in Bio-molecular Structures - van der Waals Interaction

EXAM I COURSE TFY4310 MOLECULAR BIOPHYSICS December Suggested resolution

CAP 5510 Lecture 3 Protein Structures

Biomolecules: lecture 10

The protein folding problem consists of two parts:

Biomolecules: lecture 9

Physiochemical Properties of Residues

Protein Folding. I. Characteristics of proteins. C α

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Theory and Applications of Residual Dipolar Couplings in Biomolecular NMR

Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials

Lecture 26: Polymers: DNA Packing and Protein folding 26.1 Problem Set 4 due today. Reading for Lectures 22 24: PKT Chapter 8 [ ].

NMR, X-ray Diffraction, Protein Structure, and RasMol

Proton Acidity. (b) For the following reaction, draw the arrowhead properly to indicate the position of the equilibrium: HA + K + B -

Due in class on Thursday Sept. 8 th

BIBC 100. Structural Biochemistry

BSc and MSc Degree Examinations

Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013

Free energy, electrostatics, and the hydrophobic effect

THE UNIVERSITY OF MANITOBA. PAPER NO: 409 LOCATION: Fr. Kennedy Gold Gym PAGE NO: 1 of 6 DEPARTMENT & COURSE NO: CHEM 4630 TIME: 3 HOURS

What binds to Hb in addition to O 2?

arxiv:cond-mat/ v1 2 Feb 94

6 Hydrophobic interactions

Protein Structure Basics

A) at equilibrium B) endergonic C) endothermic D) exergonic E) exothermic.

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Molecular Mechanics. I. Quantum mechanical treatment of molecular systems

Molecular Modeling lecture 2

Protein Structure Prediction and Molecular Forces

Biochemistry Quiz Review 1I. 1. Of the 20 standard amino acids, only is not optically active. The reason is that its side chain.

Properties of amino acids in proteins

Thermodynamics. Entropy and its Applications. Lecture 11. NC State University

Basics of protein structure

Bulk behaviour. Alanine. FIG. 1. Chemical structure of the RKLPDA peptide. Numbers on the left mark alpha carbons.

Interparticle interaction

Central Dogma. modifications genome transcriptome proteome

Monte Carlo simulation of proteins through a random walk in energy space

Resonance assignments in proteins. Christina Redfield

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Why Proteins Fold. How Proteins Fold? e - ΔG/kT. Protein Folding, Nonbonding Forces, and Free Energy

The Select Command and Boolean Operators

Bioinformatics. Macromolecular structure

Exam I Answer Key: Summer 2006, Semester C

Why Is Molecular Interaction Important in Our Life

Conformational Geometry of Peptides and Proteins:

Supporting Online Material for

CHEM 3653 Exam # 1 (03/07/13)

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

Denaturation and renaturation of proteins

Key Terms (1 point each). Fill in the blank with the proper term. Each term may be used only once.

Chapter 4. Glutamic Acid in Solution - Correlations

Protein Structure Bioinformatics Introduction

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Overview. The peptide bond. Page 1

Transcription:

Research Paper 289 Helmholtz free energies of atom pair interactions in proteins Manfred J Sippl, Maria Ortner, Markus Jaritz, Peter Lackner and Hannes Flöckner Background: Proteins fold to unique three-dimensional structures, but how they achieve this transition and how they maintain their native folds is controversial. Information on the functional form of molecular interactions is required to address these issues. The basic building blocks are the free energies of atom pair interactions in dense protein solvent systems. In a dense medium, entropic effects often dominate over internal energies but free estimates are notoriously difficult to obtain. A prominent example is the peptide hydrogen bond (H-bond). It is still unclear to what extent H-bonds contribute to protein folding and stability of native structures. Results: Radial distribution functions of atom pair interactions are compiled from a database of known protein folds. The functions are transformed to Helmholtz free energies using a recipe from the statistical mechanics of dense interacting systems. In particular we concentrate on the features of the free functions of peptide H-bonds. Differences in Helmholtz free energies correspond to the reversible work required or gained when the distance between two particles is changed. Consequently, the functions directly display the energetic features of the respective thermodynamic process, such as H-bond formation or disruption. Address: Center for Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Jakob Haringer Straße 1, A-5020 Salzburg, Austria. Correspondence: Manfred J Sippl e-mail: sippl@irina.came.sbg.ac.at Key words: potential of mean force, protein folding, protein structure, radial distribution function, structure prediction Received: 13 May 1996 Accepted: 13 May 1996 Published: 08 Jul 1996 Electronic identifier: 1359-0278-001-00289 Folding & Design 08 Jul 1996, 1:289 298 Current Biology Ltd ISSN 1359-0278 Conclusions: In the H-bond potential, a high barrier isolates a deep narrow minimum at H-bond contact from large distances, but the free difference between H-bond contact and large distances is close to zero. The barrier plays an intriguing role in H-bond formation and disruption: both processes require activation in the order of 2kT. H-bond formation opposes folding to compact states, but once formed, H-bonds act as molecular locks and a network of such bonds keeps polypeptide chains in a precise spatial configuration. On the other hand, peptide H-bonds do not contribute to the thermodynamic stability of native folds, because the balance of H-bond formation is close to zero. Introduction How and why proteins fold to unique structures and how they maintain their native folds are controversial issues [1]. The general consensus is that folding is driven by molecular forces, that differences in free between folded and unfolded states are in favour of native structures, that folding and unfolding transitions are cooperative, and that hydrophobic effects, electrostatic interactions and hydrogen bonds (H-bonds) play important roles. But there is virtually no agreement on the relative significance, strength and functional form of the various interactions observed in proteins and how their combined action results in the formation and maintenance of native structures [1]. For example, there is no conclusive answer to the issue of whether hydrophobic and/or electrostatic effects favour formation of ordered structures. In particular, the role of H-bonds in protein folding has remained controversial [1 3]. In vacuo formation of H-bonds is driven by strong electrostatic forces [4], but in protein solvent systems water molecules compete with protein H-bond donors and acceptors and it is still unclear whether the net effect of intramolecular H-bond formation stabilizes or destabilizes protein folds. In dense molecular systems, entropic effects are so strong that they often dominate the functional form of the free of atomic interactions, and the calculation of free energies in protein solvent systems is a complicated task. A frequently used approach is to specify the internal, also called potential, of the system and to estimate the entropic contribution to the free via molecular dynamics or Monte Carlo simulations. This approach requires a number of approximations and assumptions. Consequently, quantitative estimates are difficult to obtain [5]. Nevertheless, interesting conclusions have been drawn from theoretical studies. In particu-

290 Folding & Design Vol 1 No 4 lar, Yang and Honig [6,7] recently concluded on the basis of Zimm Bragg theory and molecular mechanics calculations that the free balance of H-bond formation is close to zero or perhaps slightly unfavourable. Alternatively, the functional form of free energies of atom pair interactions can be obtained from experimental data. Here, we derive Helmholtz free energies, also called potentials of mean force, for atom pair interactions in soluble globular proteins from a database of structures determined to atomic resolution. The potentials obtained uncover the variation of free as a function of particle separation in an average protein solvent environment. As differences in Helmholtz free energies correspond to the reversible work required or gained when the distance between two particles is changed, the functions directly display the energetics of the respective process, such as H-bond formation or disruption. The potential functions obtained for interactions involving peptide H-bond donors and acceptors indicate that H- bonds act as molecular locks. A large free barrier separates a deep narrow minimum at H-bond contact from large distances. The barrier plays an intriguing role in H- bond formation and disruption: both processes require activation in the order of 2kT, but the free balance of H-bond formation is close to zero. The shape of peptide H-bond potentials has important consequences. H- bond formation opposes folding to compact states, but once formed, H-bonds act as kinetic traps and a network of such bonds keeps polypeptide chains in a precise spatial configuration. On the other hand, H-bonds do not contribute to the thermodynamic stability of native folds, because the balance of H-bond formation is close to zero. Compared to other interactions, H-bond potentials involving peptide donors and acceptors are unique. Helmholtz free energies obtained for attractive hydrophobic and electrostatic interactions have relatively broad minima. They are long ranged, they have no barrier and the free balance of contact formation is in the order of 2kT for electrostatic and 1kT for hydrophobic interactions. On average, two hydrophobic interactions or one ion pair interaction can supply the activation required for H-bond formation. Results and discussion Helmholtz free energies from radial distribution functions The Helmholtz free, w(r), of atomic interactions is related to the radial distribution function, g(r), by equation 1 [8,9]: w(r) = kt ln[g(r)] (1) where kt is Boltzmann s constant times absolute temperature. In terms of statistical mechanics, the functions g(r) and w(r) are thermodynamic averages over the associated canonical ensemble at constant density and temperature. w(r), representing an ensemble average, is also called potential of mean force. There is another important interpretation for w(r) it is the reversible work exchanged in a process where the relative distances between two particles of the system are changed. Hence, Helmholtz free, potential of mean force, and reversible work are different names for the same physical quantity. Radial distribution functions g(r) can be determined from the structure factor of liquid samples obtained in diffraction experiments [8 10]. g(r) is a superposition of all interatomic distances in the sample unless specific atoms or atom pairs are selectively labelled as scatterers. Such fractional radial distribution functions g ab (r) for specific atom types a and b have been determined for liquids of simple composition [9], but the approach seems impractical for complex molecular systems. Experimental determination of g ab (r) for peptide H-bonds of proteins in aqueous solution, for example, requires that only peptide N and O atoms act as scattering centres, with all other atoms in the sample being transparent to the radiation used. This is hard to achieve. On the other hand, radial distribution functions can be compiled when the structures of proteins dissolved in the sample are known to atomic resolution. Equivalence of distribution functions Radial distribution functions g ab (r) and potentials of mean force w ab (r) have been compiled previously for mainchain atoms of proteins [11 17] and applied successfully in the recognition of errors in experimentally determined structures [18] and in the prediction of protein folds by fold recognition techniques [19 24]. It has been pointed out previously that the definition of an appropriate reference state and sparse data corrections are vital components in the compilation of mean force potentials [11,15,17]. Here, we extend this approach to the calculation of arbitrary sets of potentials of mean force for atom pair interactions occurring in proteins. Formally, distribution functions g ab (r) obtained from a database of structures and g ab (r) obtained from diffraction experiments are equivalent and the well developed statistical mechanical theory of dense interacting systems (e.g. [8]) is applicable in both cases. Free energies are always tied to the average molecular environment in the respective system. In this context, the actual similarity between g ab (r) and g ab (r) is important. If the functions turn out to be equivalent for practical purposes, then the corresponding potentials w ab (r) are excellent approximations to free energies w ab (r) in solution. To address this point we set up a Gedankenexperiment for the determination of g ab (r) of H-bond interactions, we

Research Paper Helmholtz free energies of atom pair interactions Sippl et al. 291 present the compilation of knowledge-based potentials in a second step and finally discuss the main differences between the two radial distribution functions. Imagine a sample of many distinct soluble globular proteins in aqueous solution. To prevent aggregation, all species are present in very low but equal concentrations. In all proteins, specific atoms have been labelled as scatterers, e.g. the peptide N and O atoms, with all other atoms, including solvent molecules, being transparent. The diffraction experiment yields a fractional distribution function g ab (r) and the associated potential w ab (r) for the N O interaction. As the Helmholtz free w ab (r) is the reversible work required to bring two tagged particles a and b from large separations to a mutual distance r, w ab (r) is the potential function for H-bond formation in an average protein solvent environment. We now determine g ab (r) from a library of protein folds. We take exactly the same set of proteins that was used to prepare our hypothetical sample. We calculate the number density ab (r) as a sum of all N O distances in the fold library by equation 2: ab (r) = Σ ab (r r pij ) ( 2) pij where each distance is represented by a delta function ab (r r pij ). The sum is taken over all N O atom pairs i and j and proteins p and the radial distribution function is obtained from equation 3: g ab (r) = ab (r)/ (3) with the bulk density of N O interactions. The bulk density is the distribution of a and b atoms when no interactions are present. This state, equivalent to an ideal gas composed of a and b atoms, serves as the reference state. As shown in the statistical mechanics of dense interacting systems, the corresponding free is the Helmholtz free of the interaction (e.g. [8]). To calculate the bulk density, the volume V of the sample has to be specified. As the number density ab (r) is compiled from a library of structures up to a cut-off distance R, the appropriate volume is R O 4 r 2 dr. Hence, the bulk number density of interactions involving a and b atoms is calculated from equation 4: = R O ab (r)dr / R O 4 r 2 dr (4) and the free is calculated from equation 5: w(r) = kt ln[g(r)] = kt ln[ ab (r)] + kt ln[ ] ( 5) The effect of the normalization constant on w(r) is a shift of the whole function by kt ln[ ] relative to = 1 and has no effect on differences among any two states r 1 and r 2, defined by equation 6: w ab (r 1,r 2 ) = w(r 2 ) w(r 1 ) ( 6) and it does not affect mean forces, equation 7: F ab (r) = d w ab (r) dr Formally, Helmholtz free energies can be calculated from both distributions g ab (r) and g ab (r). These functions are equivalent, provided g ab (r) is a good approximation to g ab (r). This condition is satisfied if the distance distributions of ab atom pairs of both systems are similar. g ab (r) is a superimposition of intramolecular distances of individual protein molecules and intermolecular distances. In contrast, g ab (r) contains intramolecular terms only. In dilute solutions, average intermolecular distances are large and their contributions have little effect on the short distance range. Moreover, the intermolecular distances are uncorrelated and the associated Helmholtz free vanishes. Hence, in the short distance range the effect of intermolecular contributions to g ab (r) can be neglected. The remaining issue is to what extent g ab (r) can be considered as an approximation to the intramolecular part of g ab (r). The functions will be identical in practice, provided the distribution of intramolecular distances in protein folds determined by X-ray analysis does not deviate substantially from the corresponding distribution in a dilute solution. Noticeable differences can occur only when the library of structures is indeed fundamentally distinct from the same set of structures in solution, since g ab (r) and g ab (r) are averages over a large number of distances where individual effects are wiped out. Nuclear magnetic resonance studies indicate that folds in solution are more flexible but nevertheless most similar to the structures obtained from X-ray diffraction. Hence, the ensemble averages g ab (r) and g ab (r) must be closely related. Mean force potentials of peptide H-bond donor and acceptor interactions Figure 1 shows the H-bond potential as a function of peptide N O separation in the distance range from 0 to 14 Å compiled from a library of unrelated X-ray structures. The function has a deep potential well centered around r c = 2.9 Å corresponding to H-bond contact. The superimposed oscillations, familiar from diffraction studies and computer simulations of liquids, are due to near neighbour effects [8 10]. Before we discuss the features of this potential we have to address the practical issue of whether the current fold library of unrelated X-ray structures contains enough (7)

292 Folding & Design Vol 1 No 4 Figure 1 Figure 2 Helmholtz free or mean force potential for the H-bond interaction N O of peptide N and O atoms separated by more than 8 peptide units along the chain. [w NO (r)] is in units of kt and r in Å. The plot contains two functions. The first is compiled from the fold library of 289 proteins, the second from a subset of 150 proteins chosen randomly from the fold library. The functions are practically identical demonstrating that evaluation of w NO (r) is statistically reliable. Other randomly chosen subsets yield identical results (MJ Sippl, unpublished data). information to determine g ab (r) with sufficient precision. The statistical reliability depends on the product of the number densities N a (r) and N b (r) of atom types a and b. In fact, Figure 1 shows two potential functions. The first is compiled from a set of 289 proteins resulting in 10 6 distances in the range of observation r < 14 Å. The second is compiled from a subset of 150 proteins randomly chosen from the larger set. The functions are practically indistinguishable, demonstrating that the amount of data available is sufficient for reliable statistics. Compilations performed on several independent subsets confirm this result (MJ Sippl, unpublished data). Hence, the N O interactions can be estimated with considerable precision. Protein molecules are linear and asymmetric chains of amino acids. Both features affect the potentials. First, the accessible range of N O distances between two peptides depends on the number, k, of intervening peptide units along the chain and second, potentials are inherently asymmetric. Both effects diminish with increasing k and their effect on w ab (r) should vanish for large separations. The potential in Figure 1 is compiled for k > 8 and does not take into account the asymmetry of polypeptide chains. Consequently, the figure represents a long-range H-bond potential that is an average over the two individual w NO (r) and w ON (r) potentials. Figure 2 superimposes the individual potentials w NO (r) and w ON (r). The difference between both potentials (and those shown in Fig. 1) is negligible and the assumption of symmetry, i.e. using a single potential function, is justified. The situation is quite different when small separations along the chain are included in w ab (r). The potentials Mean force potentials w NO (r) (bold line) and w ON (r) (thin line) for the H- bond interaction N O for k > 8. Free energies are in units of kt and r in Å. The functions are very similar, indicating that the assumption of symmetry (i.e. neglect of chain direction) is valid for k > 8. w NO (r) and w ON (r) compiled for all k > 1 (Fig. 3) are grossly distinct. Distances associated with small separations along the sequence have only a small effect on w NO (r), which is quite similar to the functions shown in Figures 1 and 2. But the -helix O i N i+4 H-bond interaction, favoured by the local steric geometry of polypeptide chains, changes w ON (r) dramatically. If the O i N i+4 distances are removed from w ON (r), the function is most similar to w NO (r) (MJ Sippl, unpublished data). The constraints imposed by the geometry of covalently linked peptide units affect the free energies of individual atom atom interactions. Clearly, entropic effects in the O i N i+3 interaction are different from those in the O i N i+4 pair and a detailed description of the energetic Figure 3 Free - Mean force potentials w ON (r) (bold line) and w NO (r) (thin line) for the H- bond interaction N O for k > 1. Free energies are in units of kt and r in Å. For comparison, the function shown in Fig. 1 is superimposed (fine line). The functions are grossly distinct and the assumption of symmetry is invalid for small values of k. Short-range effects (i.e. small k-values) have little effect on w NO (r), but the unique -helical H-bond favoured by the local stereochemistry has a strong influence on w ON (r).

Research Paper Helmholtz free energies of atom pair interactions Sippl et al. 293 Figure 4 Figure 5 Mean force potential w NN (r) (bold line) for the peptide N N interaction (k > 8). Free energies are in units of kt and r in Å. For comparison, the function shown in Fig. 1 is superimposed (thin line). Mean force potential w OO (r) (bold line) for the peptide O O interaction (k > 8). Free energies are in units of kt and r in Å. For comparison, the function shown in Fig. 1 is superimposed (thin line). features of protein chains requires the compilation of individual potentials, w ab k (r), for small separations k. In principle this can be done, but the problem of sparse data has to be addressed appropriately. Here, perhaps with a few exceptions below, we restrict the analysis to types of interactions whose statistical reliability is easy to judge. Each peptide unit contains one H-bond donor and acceptor in close spatial proximity and invariant relative orientation. Formation of H-bonds between two peptide units always results in relatively close contacts of N N and O O pairs. These interactions are repulsive, but they are strongly affected by the attraction of N O pairs. Each peptide H-bond enforces an N N interaction with an associated free minimum at 5 Å (Fig. 4). The O O interaction shows similar effects (Fig. 5). The functions in Figures 1 5 emphasize the basic nature of mean force potentials. They always depend on the specific context in which they are defined because they are averages over the surrounding medium. Features that are invariant in the given context have a strong effect on the potentials, variable features are averaged out. Characteristic features of peptide H-bond interactions The H-bond potential w NO (r) (Fig. 1) has two intriguing features. First, the minimum at H-bond contact is separated by a high barrier from larger distances. Hence, H-bond formation requires input that is immediately regained when H-bond contacts are formed. Once formed, H-bonds are locked in a deep narrow minimum. The reverse process, breaking H-bonds, also requires activation to surmount the barrier. Second, the Helmholtz free difference between H- bond contact w NO (r c ) and large distances (r > 8 Å) is very small. Consequently, H-bonds do not contribute to the thermodynamic stability of native folds. The activation required to form H-bonds opposes folding, and the to drive the folding process has to be supplied by other interactions. But the deep narrow valley at H-bond contact acts as a kinetic trap. Once formed, H- bonds resist unfolding and a network of such bonds locks the protein backbone in a precise configuration. Formation of peptide H-bond networks in proteins involves a transition of the polypeptide chain from an unordered to an ordered state. Docking of H-bond partners requires that two sections of the chain that are far apart in sequence approach each other in a concerted action. In this approach, a high entropy barrier has to be surmounted. The form of w NO (r) displays the combined effects of entropic and electrostatic terms. The entropic repulsion increases with decreasing distance and only at H-bond contact is the electrostatic attraction strong enough to compensate for the entropic repulsion, but both terms merely cancel. H-bonds between peptide units and sidechain atoms Several amino acid sidechains have functional groups that can form H-bonds among each other or to peptide units. As sidechains are more flexible, there are less geometric constraints and more degrees of freedom as compared to the approach of two peptide units, which are always associated with the bulky backbone. Consequently, the entropic repulsion of H-bond formation involving sidechains should be smaller. Figure 6 shows the H-bond potential for the peptide O and asparagine N atoms. The entropic barrier is only half the size as compared to the w NO (r) interaction, but again the difference between large distances and close contacts is vanishingly small. Compared to w NO (r), the compilation of free energies involving sidechain atoms is less reliable due to lower counts. The amount of noise present can be qualitatively judged by comparing the asymmetric potentials for Asn-N O and O Asn-N,

294 Folding & Design Vol 1 No 4 Figure 6 Figure 7 Mean force potentials for the Pep-O Asn-N (bold line) and Asn- N Pep-O (thin line) interactions (k > 8). Pep denotes the peptide function in Fig. 1 is superimposed (fine line). Mean force potentials for the Pep-N Asn-O (bold line) and Asn- O Pep-N (thin line) interactions (k > 8). Pep denotes the peptide function in Fig. 1 is superimposed (fine line). although differences in these functions may point to distinct energetic features of these interactions. As shown in Figure 6, there are slight variations but the general features are quite similar. Asparagine can form another H-bond to the peptide N atom through its O oxygen (Fig. 7). The depth of the minimum at H-bond contact is 0.5kT higher than in the peptide O and Asn-N interactions resulting in a slightly positive balance for H-bond formation. According to Figure 7, this H-bond is thermodynamically unfavourable. On the other hand, the difference between barrier height and minimum is 1kT for both types of asparagine peptide H-bonds resulting in comparable activation energies for H-bond disruption. With the exception of an additional CH 2 group, glutamine is identical to asparagine. Figures 8 and 9 show that the respective potential functions of glutamine and those of asparagine are practically indistinguishable (Figs 6,7). In particular, potentials for analogous interactions such as Asn-O peptide-o and Gln-O peptide-o have similar energies at H-bond contact. Another interaction with a strong electrostatic contribution is between ionic aspartic carboxyl oxygens and peptide N atoms (Fig. 10). Perhaps surprisingly, the respective mean force potential is quite similar to the asparagine O and glutamine O interactions (Figs 8,9), the main difference being the longer repulsive tail in the Asp-O N potential. Figure 11 shows the glutamine N interaction with asparagine O atoms. This is an example of H-bonds between sidechain atoms. The functions seem to indicate that there is no entropic barrier, but the data set is very sparse and the potentials are statistically unreliable. Figure 8 Figure 9 Mean force potentials for the Pep-O Gln-N (bold line) and Gln- N Pep-O (thin line) interactions (k > 8). Pep denotes the peptide function shown in Fig. 1 is superimposed (fine line). Mean force potentials for the Pep-N Gln-O (bold line) and Gln- O Pep-N (thin line) interactions (k > 8). Pep denotes the peptide function shown in Fig. 1 is superimposed (fine line).

Research Paper Helmholtz free energies of atom pair interactions Sippl et al. 295 Figure 10 Figure 11 Mean force potentials for the Pep-N Asp-O (bold line) and Asp- O Pep-N (thin line) interactions (k > 8). Pep denotes the peptide function shown in Fig. 1 is superimposed (fine line). Mean force potentials for the Gln-N Asn-O (bold line) and Asn- O Gln-N (thin line) interactions (k > 8). Free energies are in units of kt and r in Å. For comparison, the function shown in Fig. 1 is superimposed (fine line). Instead, we investigate the average potential of all nitrogen N-H 2 and oxygen =O sidechain atoms that can form H-bonds among each other (Fig. 12). Still there are less data available, as compared to H-bond interactions involving peptide donors and acceptors, but two important features emerge. First, there is no sign of a significant barrier for H-bond formation and second, the balance of H-bond formation is roughly 1kT, indicating that H-bonds between sidechain atoms contribute to the thermodynamic stability of native folds. Peptide H-bonds in relation to other interactions Taken together, the differences in the potentials indicate that H-bonds of peptide units are special, their potentials having the following characteristic features: a deep narrow well at H-bond contact, and a small or vanishing free balance of H-bond formation. The consequence is that H-bond formation involving donors and acceptors in the protein backbone does not drive folding to compact states. The driving force for folding to native structures has to be supplied by other interactions. Prominent candidates are hydrophobic and electrostatic forces. Figures 13 and 14 show that the functional form of the free corresponding to these interactions is indeed fundamentally different from those of peptide H-bonds. The potentials have broad valleys, they have no significant barrier separating the global minimum from large distances and, most important, free differences between large distances and close contacts are substantial. For the methyl methyl interaction between the CH 3 groups of valine C and isoleucine C atoms, a typical hydrophobic interaction, this difference is in the order of 1kT. Moreover, the interaction is extremely long ranged, the two methyl groups attracting each other to distances of 14 Å and beyond. The ionic interaction between aspartic carboxyl oxygens O and amino nitrogen N of lysine has a shorter range, extending approximately to 10 Å. The gain of contact formation being 2kT is roughly twice as large as a typical hydrophobic contact. As a rough estimate, one ionic interaction or two hydrophobic contacts can supply the activation to push two peptide H-bond partners over the barrier. This is immediately regained when a H-bond contact is formed and is available for stabilization or subsequent folding events. Other important factors in protein folding are repulsive forces, exemplified by the interaction between aspartic carboxyl O atoms and peptide O atoms (Fig. 15). These oxygen atoms repel each other to 10 Å. The small dip near 5 Å originates in the attractive interaction of carboxyl O and peptide N atoms (Fig. 10). Figure 12 Mean force potentials for the N-H 2 O= and O= N-H 2 (both shown with thin lines) interactions (k > 8). The symmetric mean force potential obtained by adding the respective distribution functions is shown in bold. Free energies are in units of kt and r in Å. For comparison, the function shown in Fig. 1 is superimposed (fine line).

296 Folding & Design Vol 1 No 4 Figure 13 Figure 14 - Mean force potentials for the Ile-C Val-C (bold line) and Val-C Ile- C (thin line) interactions (k > 8). Free energies are in units of kt and r in Å. For comparison, the function shown in Fig. 1 is superimposed (fine line). - Mean force potentials for the Asp-O Lys-N (bold line) and Lys- N Asp-O (thin line) interactions (k > 8). Free energies are in units of kt and r in Å. For comparison, the function shown in Fig. 1 is superimposed (fine line). Role of H-bonds in protein folding and stability In summary, the Helmholtz free energies or mean force potentials obtained in this study shed some light on the peculiarities of protein folding and the energetic architecture of native structures. Spontaneous folding to a unique conformation requires that the free difference favours the native state. Since the free balance of peptide H-bond formation is close to zero, these interactions do not drive folding, in agreement with the conclusions reached by Yang, Honig and Cohen [1,5,6]. In the compact state, typical for native structures, peptide H- bonds constrain the possible geometries of polypeptide backbones, but a specific fold is selected primarily by sidechain interactions [1]. The barriers in the H-bond potentials keep peptide units apart, impeding H-bond formation. On the other hand, hydrophobic and other attractive but unspecific interactions drive the chain to a compact state. The result of these opposing effects may resemble a molten globule. Every peptide unit carries a H-bond donor and acceptor so that the total barrier for the collapse to a compact native conformation is high. Formation of H- bond networks characteristic of all native folds requires that the interactions that drive this transition are highly cooperative. Of course this does not imply that there are no intermediates. The main events in folding depend on the force field of a specific polypeptide chain which in turn is a function of the amino acid sequence. In most cases, force fields of random sequences will lack this cooperativity. The H-bond potential shown in Figure 1 is a general peptide H-bond in the sense that it is an average over all proteins in the database and an average over all amino acid types. This does not imply that sidechains have no effect on H-bond formation, but simply means that we have averaged over sidechain effects of particular amino acids. The peptide H-bond potential for specific pairs of amino acids is modulated by the presence of sidechains. In the case of glycine pairs, for example, we find that the barrier height is only 1kT as compared to 2kT for the average peptide H-bond, but the functional form is similar to the average H-bond and in particular the balance of H- bond formation is close to zero. On the other hand, the alanine alanine interaction is practically identical to the average H-bond (MJ Sippl, unpublished data). A more detailed analysis of individual sidechain effects is possible but more difficult due to the very small data set for atom pairs of rare amino acids. Finally, we comment on some general aspects of potentials of mean force. The potentials discussed here derive from a database of X-ray structures using the theory of dense interacting systems in equilibrium. With the excep- Figure 15 Mean force potentials for the Pep-O Asp-O (bold line) and Asp- O Pep-O (thin line) interactions (k > 8). Pep denotes the peptide function shown in Fig. 1 is superimposed (fine line).

Research Paper Helmholtz free energies of atom pair interactions Sippl et al. 297 tion of sparse data corrections for some types of potentials, no additional assumptions or approximations are involved. The functions obtained are Helmholtz free energies and as such they reflect the average energetic situation of a particular interaction in this medium. In the discussion of potentials we have frequently referred to dynamic processes, such as H-bond formation. The potentials derive from static structures, and conclusions regarding dynamic processes may seem difficult to accept. However, differences in Helmholtz free energies and mean force potentials correspond to the reversible work exchanged in a thermodynamic process. In terms of atom atom interactions this corresponds to the free change in the process r 1 r 2, i.e. w 1 2 = w(r 2 ) w(r 1 ). As always in the application of equilibrium thermodynamics, this change in the relative separation of the two particles has to proceed on a reversible path. Materials and methods In the present study, Helmholtz free energies are compiled from a fold library. Distances between all atom pairs a and b found in the fold library are calculated in the distance range 0 r 14 Å sampled in intervals of 0.3 Å. The resulting number density ab (r) is normalized yielding the radial distribution function g ab (r) as shown in equation 8: g ab (r) = ab (r)/ (8) is the number density of atom pairs divided by the observed volume, as shown in equation 9: = R O ab (r)dr / R O 4 r2 dr (9) (R = 14 Å), and the Helmholtz free or potential of mean force is calculated from equation 10: w ab (r) = kt ln[g(r)] (10) A sparse data correction [11] was applied using the same value σ = 1/10 in all cases. This correction has only a small effect on the potentials. Statistical fluctuations are only slightly suppressed but numerical problems are avoided. In particular, ρ ab (r) is zero for very small distances corresponding to atomic overlap resulting in ln[0] =. In this situation, sparse data correction yields a small numerical value for ab (r) and a large but finite value of ln[ ab (r)]. Smaller values for suppress statistical fluctuations more efficiently, but our goal here is the explicit demonstration of the statistical reliability of various potentials. The library of folds is prepared from the Brookhaven Protein Data Bank [25] in the following way. All X-ray structures from the October 1995 release having a resolution better than 2.5 Å are extracted. The sequence homology of each protein to all others in this subset is computed. All proteins having amino acid identities > 25% define a family of related proteins. From each family we take the structure of highest resolution. These proteins (289 monomer structures) define the fold library of unrelated proteins used for the compilation of potentials where any pair of structures has sequence identity < 25%. Quality of structure determinations and possible errors in structures affect the quality of potentials w ab (r). Since the potentials are averages over a library of folds, low quality and occasional errors are negligible provided the average quality is good and errors are infrequent. The major problems we encountered are atomic overlaps and bad stereochemistry. It is somewhat surprising that a substantial number of protein folds in the Brookhaven Protein Data Bank have one or several atomic overlaps and/or unrealistic covalent geometry. Since close contacts map into the distance range that is unoccupied in a proper radial distribution function, such data give rise to artificially low energies at close contacts. In the present study, we removed all proteins that would affect the potentials in this way. Sometimes protein folds are erroneous due to frame shift errors in amino acid assignment and/or misinterpretation of electron densities. These errors are usually confined to low-resolution studies. New folds are checked by PROSA-II [18], a program that can detect faulty structures, before they are added to our fold library. Suspicious structures are rejected. Coordinates obtained from X-ray analysis are often refined using molecular mechanics force fields. These functions model atom atom interactions in proteins and the question is to what extent refinement affects mean force potentials derived from a library of folds. As a rule, protein conformations are not grossly changed by refinement. The main results are correction of atom atom clashes and distorted covalent geometries. The resulting effect on mean force potentials is small for several reasons. First, small changes in individual conformations result in even smaller changes in ab (r). Second, mean force potentials are averages over structures obtained by a variety of different refinement protocols. In summary, refinement may have some influence on the fine details of potentials, but their general form is not affected. Acknowledgements This work was supported by grant P11601 of the FWF/Austria. A set of Helmholtz free functions (Program ProMPT), the database of protein structures and other material is available electronically from URL http://lore.came.sbg.ac.at/extern/software/prompt/prompt.html. References 1. Honig, B. & Cohen, F.E. (1996). Adding backbone to protein folding: why proteins are polypeptides. Folding & Design 1, R17 R20; 1359-0278-001-R0017. 2. Baldwin, R.L. (1993). In Outstanding Papers in Biology. pp. 1. Current Biology Ltd, London. 3. Schellman, J.A. (1955). The stability of hydrogen-bonded peptide structures in aqueous solution. Comptes rendus des Travaux du Laboratoire Carlsberg Serie Chimique 29, 230 259. 4. Schuster, P. (1992). Hydrogen bonds. In Encyclopedia of Physical Science and Technology 7, 727 761. 5. Honig, B. & Yang, A.-S. (1995). Free balance in protein folding. Adv. Protein Chem. 46, 27 58. 6. Yang, A.-S. & Honig, B. (1995). Free determinants of secondary structure formation. 1. Alpha-helices. J. Mol. Biol. 252, 351 365. 7. Yang, A.-S. & Honig, B. (1995). Free determinants of secondary structure formation. 1. Antiparallel beta-sheets. J. Mol. Biol. 252, 366 376. 8. McQuarrie, D.A. (1976). Statistical Mechanics. Harper Collins Publishers, New York. 9. March, N.H. & Tosi, M.P. (1976). Atomic Dynamics in Liquids. Macmillan Press, London. 10. Gingrich, N.S. (1943). The diffraction of X-rays by liquid elements. Rev. Mod. Phys. 15, 90 110. 11. Sippl, M.J. (1990). Calculation of conformational ensembles from potentials of mean force. J. Mol. Biol. 213, 859 883. 12. Hendlich, M., et al., & Sippl, M.J. (1990). Identification of native protein folds amongst a large number of incorrect models. J. Mol. Biol. 216, 167 180. 13. Casari, G. & Sippl, M.J. (1992). Structure derived hydrophobic potential. A hydrophobic potential derived from X-ray structures of globular proteins is able to identify native folds. J. Mol. Biol. 224, 725 732. 14. Sippl, M.J. (1993). Boltzmann s principle, knowledge-based mean fields and protein folding. J. Comput. Aided Mol. Design 7, 473 501. 15. Sippl, M.J., Jaritz, M., Hendlich, M., Ortner, M. & Lackner, P. (1994). Application of knowledge based mean fields in the determination of protein structures. In Statistical Mechanics, Protein Structure and Protein Substrate Interactions. (Doniach, S., ed.), pp 297 315, Plenum Publishers.

298 Folding & Design Vol 1 No 4 16. Sippl, M.J. & Jaritz M. (1994). Predictive power of mean force pair potentials. In Protein Structure by Distance Analysis. (Bohr, H., Brunak, S., eds.), pp 113 134, IOS Press, Amsterdam. 17. Sippl, M.J. (1995). Knowledge-based potentials for proteins. Curr. Opin. Struct. Biol. 5, 229 235. 18. Sippl, M.J. (1993). Recognition of errors in three-dimensional structures of proteins. Proteins 17, 355 362. 19. Sippl, M.J. & Weitckus, S. (1992). Detection of native like models for amino acid sequences of unknown three dimensional structure in a data base of known protein conformations. Proteins 13, 258 271. 20. Sippl, M.J., Weitckus, S. & Floeckner, H. (1994). In search of protein folds. In The Protein Folding Problem and Tertiary Structure Prediction. (Merz, K.H., LeGrand, S., eds.), pp. 353 407, Birkhaeuser, Boston. 21. Sippl, M.J., Weitckus, W. & Floeckner, H. (1995). Fold recognition. In Modelling of Biomolecular Structures and Mechanisms. (Pullman, A., Jortner, J., Pullman, B., eds.), pp. 107 118, Kluwer, Amsterdam. 22. Braxenthaler, M. & Sippl, M.J. (1995). Screening genome sequences for known folds. In Protein Folds: a Distance Based Approach. (Bohr, H., Brunak, S, eds.), pp. 80 84, CRC Press, Boca Raton. 23. Floeckner, H., Braxenthaler, M., Lackner, P., Jaritz, M., Ortner, M. & Sippl, M.J. (1995). Progress in fold recognition. Proteins 23, 376 386. 24. Sippl, M.J. & Floeckner, H. (1996). Threading thrills and threats. Structure 4, 15 19. 25. Bernstein, F.C., et al., & Tasumi, M. (1977). The Protein Data Bank: a computer-assisted archival file for macromolecular structures. J. Mol. Biol. 112, 535 542. Because Folding & Design operates a Continuous Publication System for Research Papers, this paper has been published via the internet before being printed. The paper can be found in the BioMedNet library at http://biomednet.com/ for further information, see the explanation on the contents page.