Research Paper 577. Correspondence: Nikolay V Dokholyan Key words: Go model, molecular dynamics, protein folding

Similar documents
Identifying the Protein Folding Nucleus Using Molecular Dynamics

arxiv:cond-mat/ v1 [cond-mat.soft] 19 Mar 2001

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

Computer simulation of polypeptides in a confinement

Protein Folding Prof. Eugene Shakhnovich

arxiv:cond-mat/ v1 [cond-mat.soft] 16 Nov 2002

arxiv: v1 [cond-mat.soft] 22 Oct 2007

arxiv:cond-mat/ v1 [cond-mat.soft] 5 May 1998

arxiv:cond-mat/ v1 2 Feb 94

Folding of small proteins using a single continuous potential

PROTEIN EVOLUTION AND PROTEIN FOLDING: NON-FUNCTIONAL CONSERVED RESIDUES AND THEIR PROBABLE ROLE

Modeling protein folding: the beauty and power of simplicity Eugene I Shakhnovich

Short Announcements. 1 st Quiz today: 15 minutes. Homework 3: Due next Wednesday.

Proteins polymer molecules, folded in complex structures. Konstantin Popov Department of Biochemistry and Biophysics

Local Interactions Dominate Folding in a Simple Protein Model

Thermodynamics of the coil to frozen globule transition in heteropolymers

Swelling and Collapse of Single Polymer Molecules and Gels.

Protein folding. Today s Outline

Chap. 2. Polymers Introduction. - Polymers: synthetic materials <--> natural materials

Computer simulations of protein folding with a small number of distance restraints

Master equation approach to finding the rate-limiting steps in biopolymer folding

Protein Folding Pathways and Kinetics: Molecular Dynamics Simulations of -Strand Motifs

FREQUENCY selected sequences Z. No.

Lecture 11: Protein Folding & Stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Universal Similarity Measure for Comparing Protein Structures

chem-ph/ Feb 95

Lecture 34 Protein Unfolding Thermodynamics

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding?

Evolution of functionality in lattice proteins

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Thermodynamics. Entropy and its Applications. Lecture 11. NC State University

PHASE TRANSITIONS IN SOFT MATTER SYSTEMS

A first-order transition in the charge-induced conformational changes of polymers

Amorphous Polymers: Polymer Conformation Laboratory 1: Module 1

6 Hydrophobic interactions

Guessing the upper bound free-energy difference between native-like structures. Jorge A. Vila

Protein Folding. I. Characteristics of proteins. C α

It is not yet possible to simulate the formation of proteins

Pathways for protein folding: is a new view needed?

Lab Week 4 Module α 3. Polymer Conformation. Lab. Instructor : Francesco Stellacci

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

arxiv:cond-mat/ v1 [cond-mat.soft] 23 Mar 2007

Olle Inganäs: Polymers structure and dynamics. Polymer physics

Monte Carlo Simulations of Protein Folding using Lattice Models

Introduction The gramicidin A (ga) channel forms by head-to-head association of two monomers at their amino termini, one from each bilayer leaflet. Th

The kinetics of protein folding is often remarkably simple. For

Importance of chirality and reduced flexibility of protein side chains: A study with square and tetrahedral lattice models

Introduction to Computational Structural Biology

Coil to Globule Transition: This follows Giant Molecules by Alexander Yu. Grosberg and Alexei R. Khokhlov (1997).

Random heteropolymer adsorption on disordered multifunctional surfaces: Effect of specific intersegment interactions

Simulation of mutation: Influence of a side group on global minimum structure and dynamics of a protein model

Unfolding CspB by means of biased molecular dynamics

Supplementary Figures:

The protein folding problem consists of two parts:

arxiv: v1 [physics.bio-ph] 26 Nov 2007

Exercise 2: Solvating the Structure Before you continue, follow these steps: Setting up Periodic Boundary Conditions

Long Range Moves for High Density Polymer Simulations

Monte Carlo simulation of proteins through a random walk in energy space

ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Nature of the transition state ensemble for protein folding

Polymer Solution Thermodynamics:

Folding pathway of a lattice model for protein folding

Stretching lattice models of protein folding

Engineering of stable and fast-folding sequences of model proteins

Computer Simulation of Peptide Adsorption

The folding mechanism of larger model proteins: Role of native structure

Effect of Sequences on the Shape of Protein Energy Landscapes Yue Li Department of Computer Science Florida State University Tallahassee, FL 32306

Research Paper 1. Correspondence: Devarajan Thirumalai

Stacking and Hydrogen Bonding. DNA Cooperativity at Melting.

Influence of temperature and diffusive entropy on the capture radius of fly-casting binding

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

DesignabilityofProteinStructures:ALattice-ModelStudy usingthemiyazawa-jerniganmatrix

A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics

Electronic Supplementary Information

Amyloid fibril polymorphism is under kinetic control. Supplementary Information

Hyeyoung Shin a, Tod A. Pascal ab, William A. Goddard III abc*, and Hyungjun Kim a* Korea

Protein Folding experiments and theory

The Nature of Folded States of Globular Proteins

Exploring the energy landscape

Designing refoldable model molecules

Predicting free energy landscapes for complexes of double-stranded chain molecules

Analysis of the simulation

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS

Protein Folding Challenge and Theoretical Computer Science

Helix-coil and beta sheet-coil transitions in a simplified, yet realistic protein model

Scaling Law for the Radius of Gyration of Proteins and Its Dependence on Hydrophobicity

A MOLECULAR DYNAMICS SIMULATION OF A BUBBLE NUCLEATION ON SOLID SURFACE

MODIFIED PROXIMITY CRITERIA FOR THE ANALYSIS OF THE SOLVATION OF A POLYFUNCTIONAL SOLUTE.

Energy minimization using the classical density distribution: Application to sodium chloride clusters

Protein Folding In Vitro*

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Convergence of replica exchange molecular dynamics

Archives of Biochemistry and Biophysics

Protein design: a perspective from simple tractable models Eugene I Shakhnovich

Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials

Simulating Folding of Helical Proteins with Coarse Grained Models

Transcription:

Research Paper 577 Discrete molecular dynamics studies of the folding of a protein-like model ikolay V Dokholyan, Sergey V Buldyrev, H Eugene Stanley and Eugene I Shakhnovich Background: Many attempts have been made to resolve in time the folding of model proteins in computer simulations. Different computational approaches have emerged. Some of these approaches suffer from insensitivity to the geometrical properties of the proteins (lattice models), whereas others are computationally heavy (traditional molecular dynamics). Results: We used the recently proposed approach of Zhou and Karplus to study the folding of a protein model based on the discrete time molecular dynamics algorithm. We show that this algorithm resolves with respect to time the folding unfolding transition. In addition, we demonstrate the ability to study the core of the model protein. Conclusions: he algorithm along with the model of interresidue interactions can serve as a tool for studying the thermodynamics and kinetics of protein models. Addresses: Center for Polymer Studies, Physics Department, Boston University, Boston, MA 5, USA. Department of Chemistry, Harvard University, Oxford Street, Cambridge, MA 38, USA. Correspondence: ikolay V Dokholyan E-mail: dokh@bu.edu Key words: Go model, molecular dynamics, protein folding Received: 8 May 998 Revisions requested: 3 June 998 Revisions received: 7 October 998 Accepted: December 998 Published: 5 December 998 http://biomednet.com/elecref/359783577 5 December 998, 3:577 587 Current Biology Ltd ISS 359-78 Introduction he vast dimensionality of the protein conformational space [] makes the folding time too long to be reachable by direct computational approaches [ 4]. Simplified models [5 4] became popular due to their ability to reach reasonable time scales and to reproduce the basic thermodynamic and kinetic properties of real proteins [3,5,6]: firstly, a unique native state, that is, there should exist a single conformation with the lowest potential energy; secondly, a cooperative folding transition (resembling first order transition); thirdly, thermodynamic stability of the native state; and fourthly, kinetic accessibility, that is, the native state should be reachable in a biologically reasonable time [,7]. Monte Carlo (MC) simulations on lattice models (e.g. see [4 7] and references therein) appear to be useful for studying theoretical aspects of protein folding. he MC algorithm is based on a set of rules for the transition from one conformation to another. hese transitions are weighted by a transition matrix that reflects the phenomena under study. he simplicity of the algorithm and the significantly small conformational space of the protein models (due to the lattice constraints) make MC on-lattice simulations powerful tools for studying the equilibrium dynamics of protein models; however, lattice models impose strong constraints on the angles between the covalent bonds, thereby greatly restricting the conformational space of the protein-like model. he additional drawback of this restriction lies in the poor capability of these models to discern the geometrical properties of the proteins. he time in MC algorithms is estimated as the average number of moves (over an ensemble of the folding unfolding transitions) made by a model protein. It was pointed out [8] that MC simulations are equivalent to the solution of the master equation for the dynamics, so there is a relation between physical time and computer time, which is counted as the number of MC steps; however, a number of delicate issues such as the dependence of the dynamics on the set of allowed MC moves remain outstanding, so an independent test of the dynamics using the molecular dynamics (MD) approach is needed. o address aspects that are sensitive to geometrical details, it is useful to study off-lattice models of protein folding. hus far, several off-lattice simulations have been performed [9 ] that demonstrate the ability of the simplified models to study protein folding. Here, we study the three-dimensional molecular dynamics of a simplified model of proteins [6,7]. he potential of interactions between pairs of residues is modeled by a square-well, which allows us to increase the speed of the simulations [,3]. We estimate folding time based on the collision event list, which, besides increasing the speed of the simulation, allows for the tracking of realistic (not discretized) time. We show that such an algorithm

578 Vol 3 o 6 Figure U(r) U(r) U i, j +, ri rj a = sign ( ij ), a < ri rj a >, ri rj a () a a b b Here a / is a radius of the hard sphere, and a / is the radius of the attractive sphere (Figure a) and sets the energy scale. is a matrix of contacts with elements:, i S S j ij S S, i j r r a r r > a (3) can be a useful compromise between computationally heavy traditional MD and fast but restrictive MC. We demonstrate that the model protein reproduces the principal features of folding phenomena described above. We also address the issue of whether we can study the equilibrium properties of the core. he core is a small subset of the residues that maintains the backbone of the structure at temperatures close to the folding transition temperature (here the Θ-temperature, θ ). We emphasize the difference between the core and the nucleus of a protein: whereas the core is a persistent part of the structure at equilibrium, the nucleus is a fragment of this structure that is assembled in the transition state (S) the folding unfolding barrier (see Figure in [4]). Based on simple arguments, we estimate θ for our model [4] and compare it with the value found in the simulations. he model We study a beads-on-a-string model of a protein. We model the residues as hard spheres of unit mass. he potential of interaction between residues is square-well. We follow the Go model [5 7], where the attractive potential between residues is assigned to the pairs that are in contact ( ij, defined below) in the native state and repulsive potential is assigned to the pairs that are not in contact in the native state. hus, the potential energy is: E= U ij, ij, = he potential of interaction between specific residues and neighboring residues (covalent bond). a is the diameter of the hard sphere and a is the diameter of the attractive sphere. [b,b ] is the interval in which residues that are neighbors on the chain can move freely. where i and j denote residues i and j. U i,j is the matrix of pair interactions: () where r i S is the position of the i th residue when the protein is in the native conformation. ote that we penalize the non-native contacts by imposing ij <. he parameters are chosen as follows: =, a = 9.8 and a = 9.5. he covalent bonds are also modeled by a square-well potential (Bellemans bonds): V ii, +, b < ri ri+ < b = +, ri ri+ b, or ri ri+ b he values of b = 9.9 and b =. are chosen so that average covalent bond length is equal to (Figure b). he original configuration of the protein ( = 65 residues) was designed by collapse of a homopolymer at low temperature [,5,6]. It contains n* = 38 native contacts, so E S = 38. he 65 65 matrix of contacts of the globule in the native state is shown in Figure a. ote that the large number of native contacts (38/65 5 contacts per residue) is due to the choice of the parameter: a a so that residues are able to establish contacts with the residues in the second neighboring shell. he radius of gyration of the globule in the native state is R G.7. A snapshot of the globule in the native state is shown in Figure b. he program employs the discrete MD algorithm, which is based on the collision list and is similar to one recently used by Zhou et al. [] to study equilibrium thermodynamics of homopolymers and by Zhou and Karplus [3] to study equilibrium thermodynamics of folding of a model of Staphylococcus aureus protein A. A detailed description of the algorithm can be found in [7 3]. o control the temperature of the protein, we introduce 935 particles that do not interact with the protein or with each other in any way but via regular collisions, serving as a heat bath. hus, by changing the kinetic energy of those ghost particles, we are able to control the temperature of the environment. he ghost particles are hard spheres of the same radii as the chain residues and have unit mass. emperature is measured in units of /k B. he time unit (tu) is estimated (4)

Research Paper Discrete molecular dynamics studies of the folding of a protein-like model Dokholyan et al. 579 Figure he simulations reveal that the protein undergoes a folding unfolding transition as we increase the temperature to the proximity of the Θ-temperature θ, which in this model is θ f.46. At θ, the distribution of energy has three peaks (Figure 4a). he left peak corresponds to the folded state, the right peak corresponds to the unfolded state and the middle one corresponds to the partially folded state (PFS), with a 9-residue unfolded tail. his trimodality of the energy distribution is also seen in Figure 3b. he energy profile at temperature =.4 (close to θ ) also reflects these three states. Since < θ, only two states are mostly present in Figure 3b. hus, the energy distribution has only two peaks (Figure 5), corresponding to the folded state and the PFS. Above θ, the globule starts to explore energetic wells other than the native well (see Figure 3 in [3]). o show that the PFS is the cause of the middle peak in energy distribution (Figure 4a), we eliminate the 9- residue tail and plot the energy distribution for the 46-mer at Θ-temperature θ =.44 (Figure 6). We expect to see only two states folded and unfolded because the 9- residue tail, which is the cause of the PFS, is eliminated. Figure 6 confirms our expectations. 65 65 contact matrix of the model protein in the native state. Black boxes indicate the matrix elements of those residue pairs that have a contact (their relative distance is between a and a ). A snapshot of the protein of 65 residues in the native state obtained at temperature =.. from the shortest time between two consequent collisions in the system between any two particles. Results In order to study the thermodynamics, we performed MD simulations of the chain at various temperatures. We start with the globule in the native state at temperature =. and then raise the temperature of the heat bath to the desired one. hen we allow the system to equilibrate. At the final temperature, we let the protein relax for 6 time units. he typical behavior of the energy E and the radius of gyration R G as functions of time is shown in Figure 3 for three different temperatures. In the present model, the non-native contacts (Cs) are penalized, that is, the pairwise interaction between Cs is repulsive (this corresponds to g = in [3]), so their number increases as the temperature increases. At high temperatures (above θ ), however, the number of Cs varies only due to the random temperatures. he maximum number of Cs occurs at θ and does not exceed 35, which is roughly % of the total number of native contacts (Cs). he folding unfolding transition is further quantified in Figure 7. he energy and the radius of gyration increase most rapidly near f = θ resembling the order parameter jump in a phase transition (see discussion below). his rapid increase of E and R g reaches its maximum at the Θ- point, where the potential of interaction is compensated by the thermal motion of the particles. Above θ, interactions between residues do not hold them together any more and the chain becomes unfolded (see Figure 8a). ote that as all the attractive interactions are specific, the transition is described by one temperature, f. he presence of the PFS is observed in the temperature range,.4 to.48, in which the collapse transition occurs. hus, in this particular region, f and θ are indistinguishable within the accuracy of their definitions. Remarkably, a simple Flory-type model of an excluded volume chain predicts θ within %. o demonstrate this, let us write the probability that the end-to-end type distance of the chain is R [4]: v E( R) PR ( ) pr ( )exp( 3 ) R where v =(4π/3(a /) 3 is the volume of the monomer and p(r) R exp( 3R /((a /) )) is the probability that the end-to-end distance of the chain is R for the random walk model. For = f, the repulsive excluded volume term ( v)/(r 3 ) balances the attractive term E(R)/ >. hus: (5)

58 Vol 3 o 6 Figure 3 7 =.63 7 =.4 7 =.78 3 3 4 5 t/5 6 7 8 9 he dependence on time of energy E and radius of gyration RG. he globule is maintained at three different temperatures =.78 < f, =.4, and =.63 > f for 6 tu. For =.78, the fluctuations of both energy E and RG are small, that is, the globule is found in one folded configuration. At high temperatures ( =.63) the fluctuations of E and RG are large; the globule is mostly found in the unfolded state. At the temperature =.4, which is close to f, the globule is mostly present in two states. he lower energy configuration corresponds to the folded state: the globule is compact see. he other configuration has large fluctuations: the globule is in the PFS. here is an additional state: the unfolded state see. At =.4 the protein model is rarely present in the unfolded state. hus, the behavior of the globule at temperatures close to f indicates the presence of three distinct states: folded, unfolded and PFS. 7 =.63 4 RG 7 =.4 4 7 =.78 4 3 4 5 6 7 8 t/5 f = R3 E v.7 (6) We also compute the heat capacity CV from the relation [3]: CV = where E. 3 and R. 4 are taken for a certain configuration at the Θ-point. (δe) 9 (7) where (δe) E E and... denotes a time average. he time average is computed over 6 tu of equilibration at a fixed temperature. he dependence of the heat capacity on temperature is shown in Figure 7b. here is a pronounced peak of CV () for = f. We note that below the folding temperature f, the globule (Figure 8b) spends time in a state structurally similar to the native state (Figure b); however, one can see that even though the globule maintains approximately the same structure, that is, the same set of Cs, the distances between residues are much larger than in the native state. Due to the fact that the potential of interaction between like residues is a square-well, there is no penalty for these residues to be maximally separated, yet they remain within the range of attractive

Research Paper Discrete molecular dynamics studies of the folding of a protein-like model Dokholyan et al. 58 Figure 4 he probability distribution of the energy states E and the radius of gyration R G of the globule maintained at f =.46 for 6 tu. he trimodal distributions indicate the presence of three states: the folded state, the PFS, and the unfolded state..5.. Probability.5.5 3. 3 4 5 6 7 8 9 R G interaction. his allows the globule to have more C and, thus, still maintain its similarity to native structure, yet to have energy larger than the energy of the native state. his structure can be identified as the highest in Figure 5 energy that still maintains its core. As the temperature increases, the ratio R G R G S /R G S increases until the temperature reaches = f, where the ratio becomes roughly.87. Figure 6.4.3 =.5 =.4 =.73.5. θ * =.44 Probability. Probability.5.. 3 5 5 5 5 5 he probability distribution of the energy states E of the globule maintained at three different temperatures: =.5,.4 and.73. ote that at =.4 f the distribution has two expressed peaks. he right peak of this ( =.4) distribution corresponds to the PFS whereas the left peak corresponds to the energetic well of the native state. he probability distribution of the energy states E of the 46-residue globule maintained at f * =.44 during 6 tu. he bimodal distributions of energy indicate that the 9-residue tail is responsible for the PFS of the 65-residue globule: after eliminating the 9-residue tail, the trimodal energy distribution of the 65-residue globule becomes a bimodal energy distribution of the 46-residue globule.

58 Vol 3 o 6 Figure 7 Figure 8 78 8 78 8 78 38.3.5.7.9..3.5.7.9. 8 C V (c) R G 6 4 f.3.5.7.9..3.5.7.9. 7 6 5 4 3 A snapshot of the protein in the unfolded state, obtained at high temperature =.8; and the transition state, obtained at folding transition temperature f =.46 (green), overlapped with the globule at low temperature =.4 (red). ote that the S globule has a close visual similarity to those maintained at low temperature and in the native state (see also Figure b). It is more dispersed, however, which makes all the Cs easily breakable. o compare the globule at the S with the one maintained at temperature =.4, we perform the transformation proposed by Kabsch [4] to minimize the relative distance between the residues in the S and the state at =.4. he cold residues (grey spheres) denote residues whose rms displacement is smaller than a. o confirm the presence of the core, we calculate f C / C at temperatures below = f. he attractive interresidue interaction term E/ dominates the excluded volume repulsion term v/(r 3 ) (see Equation 5), so: E v > (8) 3 R f he total energy E has contributions from both C and C contacts, so:.3.5.7.9..3.5.7.9. he dependence on temperature of the energy E, the heat capacity C V and (c) the radius of gyration R G. he error bars are the standard deviation of fluctuations. he rapid increase of energy as well as the sharp peak in heat capacity at = f indicates a first-order phase transition. [ ] E= ( ) = f ( 9) C C C At a temperature slightly below f, f f.3, the residues are maximally separated within their potential wells, yet they still maintain contacts. herefore, the volume v ~ spanned by one residue is roughly v ~ (4π/3)(a /) 3 = 8v. C is the product of the probability v ~ /R 3 of having a bond (C or C) and the total number

Research Paper Discrete molecular dynamics studies of the folding of a protein-like model Dokholyan et al. 583 of possible arrangements of the pair contacts between residues, ( )/ /. hus: From Equations 8 we can estimate f, the fraction of C at the temperature.4 < f : Due to the fact that the globule maintains roughly the same volume at temperatures slightly below Θ-point, Equation implies that approximately 7% of all native contacts stay intact in the folded phase (see Figure 5). his result is supported by the simulations: at.4 the number of Cs is roughly C 8, and the energy E is E = 6. herefore, the number of Cs is C 34, and the fraction of Cs is f.89, which is even higher than the lower limit set by Equation. ote that at a temperature higher than f, the fraction of native contacts becomes small due to the fact that in this regime the interactions are dominated by the excluded volume repulsion. his change in the number of Cs from 7% to close to zero indicates the presence of the core structure maintained by these 7% of Cs (see Figure 8b and discussion below). Above the Θ-point, the globule is completely unfolded (Figure 8a). he formation of a specific nucleus during the folding transition was suggested by many theoretical [,4,,3,33 36] and experimental works [37 4]. he presence of the core at f may arise from a nucleation process driving the system from the unfolded state to the native state. We find indications of a first order transition. We also offer theoretical reasoning for the presence of a core (Equation ) that might indicate the presence of a nucleus. ext, we identify the core. We calculate the mean square displacement σ() of the globule at a certain temperature from a globule at the native state, that is: σ r ( t tr i ) S ( ) ri R( τr ) i= C = v f > +.68 () v ~ i= σ ( ) where ri and ri S are the coordinates of the residues of the globules at two conformations: at some conformation at the temperature t and at the native conformation, respectively. is a translation matrix, which sets the i 3 R v ~ / / = () () centers of mass t of these configurations at the same point in space. R is a rotation matrix that minimizes the relative distance between the residues of two configurations (for details, see [4 44]). he σ i () values in Equation are the root-mean-square (rms) displacements for each individual residue. he plot of σ i () is presented in Figure 9a from the roughness of the landscape, we can select a group of residues whose rms displacements are significantly smaller than the rms displacements of the other group of residues. We denote the former group by cold residues and the latter group by hot residues. he rms displacement strongly depends on the temperature near the folding transition and grows slowly below f. ote that the average numbers of C of the residues correlate with the average rms displacement of these residues, that is, the peak on the C,i isothermal lines of Figure 9b correspond to the cold residues. ext we calculate the rms displacement σ C () for the selected 5% coldest residues (the core) and σ O () for the rest of the residues. Figure shows their dependence on temperature, as well as the dependence of the rms displacement for all residues σ(). here is a pronounced difference in the behavior of the rms displacement of the core residues and the rest of the residues below f. At f their behavior is the same, due to the fact that all the attractive interactions are balanced by the repulsion of the excluded volume. Above f, the difference between σ C () and σ O () is due only to the fact that the core residues have most of the Cs and, therefore, are more likely to spend time together even at > f. o study the behavior of the globule at f, we subdivide the probability distribution of the energy states E of the globule maintained at f =.46 during 6 tu into five regions: A, B, C, D, and E (Figure a). Region A corresponds to the folded state; region B corresponds to the transitional state between the folded state and the PFS; region C corresponds to the PFS; region D corresponds to the transitional state between the PFS and the completely unfolded state (region E). ext we plot the rms displacement for each residue for each of the above regions (Figure a). ote that in region A, all residues stay in contact; in region C, both - and C-terminal tails break away forming a PFS; in region D, there are only a few core residues that still stay intact and in region E none of the residues is in contact. In region B, we observe that some of the C-terminal tail residues are not in contact, indicating the formation of a PFS. ext, we plot the dependence of the selected core residues (see legend to Figure 9) on the average energy of the window of the corresponding region (Figure c). We observe that core residues remain close to one another even in the

584 Vol 3 o 6 Figure 9 Figure. 5 4 Core Other otal σ i () σ C (), σ (), σ() 3 C 3 4 5 6 Residue i 5 5....3.5.7.9..3.5.7.9. he dependence of the rms displacement of the core residues σ C (), the rest of the residues σ O () and all the residues σ() on temperature. he above quantities are averaged over 6 tu. ote that for the ideal first-order phase transition, one would expect σ C () to be a step function; however, as we consider a transition that would be first order in the limit of the infinite size, σ C () exhibits only step-function-like behavior. he difference between core residues and other residues is that at f the average rms displacement of the core residues is smaller than 5, which indicates that they are in contact (see the legend to Figure 9). On the contrary, the average rms displacement of the noncore residues is greater than 5, indicating that these residues are not in contact. 3 4 5 6 Residue i he contour plot of the rms displacement σ i () for each residue i =,,, 64 at temperatures =.3,.97,.34,.46 (bold line) and.54, averaged over 6 tu. ote that there is a distinct difference between the cold (small values of σ i ()) and hot (large values of σ i ()) residues. he horizontal line indicates the breaking point of the Cs, that is, when σ i () is of the size of the average relative position between pairs of residues, that is, σ i ()=(a +a )/ 5. he bold lines in both and indicate the folding transition temperature line f. It is worth noting that residues are still in contact marked by circles on : 6, 3, 4, 5, 6, 7, 8, 9, 37, 38 and 39. An analogous plot to of the average number of Cs for each residue. ote that the number of Cs correlates strongly with therms: the local minima of the σ i () plots correspond to the local maxima of the number of Cs. second transitional state D between the PFS and the completely unfolded state. We also study the system by cooling it from the high temperature state. his technique corresponds to simulated annealing, due to the fact that the temperature. control is governed by the ghost particles that are present in the system. We find that if the target temperature is above., the globule always reaches the state corresponding to the native state; however, if the target temperature is.96, the globule reaches the state corresponding to the native state in only 7% of cases, in the time interval of 5 time units. As an example, we demonstrate in Figure the cooling of the model protein from the high temperature state = 3. to the low temperature state =.. he model protein collapses after tu. What is particularly remarkable about Figure is that we can follow the kinetics of the collapse. First, the globule gets trapped in some misfolded conformation, where it stays for about tu (see Figure a) and then it collapses to the native state. he time behavior of the energy, however, can look a bit puzzling. After the rms displacement drops to close to, indicating the native state, the energy is still higher than that of the native state for about 4 tu (see Figure b). he key to resolving this puzzle is the fact that after the collapse of the model protein, its potential energy transforms to kinetic energy,

Research Paper Discrete molecular dynamics studies of the folding of a protein-like model Dokholyan et al. 585 Probability rms displacement (c) rms displacement..8.6.4. A B C D E. 3 5 5 5 6 4 A (E<4) B ( 96>E> 4) C ( 4>E> 96) D ( 3>E> 4) E ( E> 3) 5 5 5 3 35 4 45 5 55 6 65 Residue 4 Core Other 3 5 5 which slowly decreases by thermal equilibration with the bath of the ghost particles. Discussion We find that the classical model of the self-avoiding chain with excluded volume shows good agreement with the simulations. We show from simple arguments and simulations that the fraction of Cs at the folding temperature Figure he probability distribution of the energy states E of the globule maintained at f =.46 during 6 tu. he probability distribution is divided into five regions: A, B, C, D and E. Region A corresponds to the folded state; region B corresponds to the transitional state between the folded state and the PFS; region C corresponds to the PFS; region D corresponds to the transitional state between the PFS and the completely unfolded state (region E). he plot of the rms displacement σ i () for each residue i =,,, 64 for regions A, B, C, D and E from the plot in averaged over 6 tu. ote that in region A, all residues stay in contact; in region C, both - and C-terminal tails break away, forming a PFS; in region D, there are only a few of the core residues that are still intact; and in region E, none of the residues is in contact. (c) he dependence of the rms displacement of the core residues (circles) and the other residues (squares) on the average energy E of the window of the corresponding region. ote that core residues stay intact even in the second transitional state D between the PFS and the completely unfolded state. he horizontal lines in (b,c) indicate the breaking point of the Cs (see Figure 9). f is larger then 7%, consistent with the presence of the core. he nucleus forms in the unstable transition state. From the transition state, the globule jumps either to the completely unfolded conformation or to the folded conformation. Our simulations are in agreement with the recent work of Zhou and Karplus [3]. hey performed discrete MD simulations of S. aureus protein A, the interresidue interactions of which were modeled based on the Go model [5 7]. he pair residues of the model protein, which form native contacts, had square-well potential of interaction with the depth of the well equal to B, whereas all other pair residues had square-well potential of interaction with the depth of the well equal to B O. hey characterized the difference between Cs and Cs by the bias gap, g: g = B O /B. Zhou and Karplus found that when g =.3, that is, when the interaction between Cs is of the opposite sign to the interaction between Cs, there is a strong first-order-like transition from the random coil to the ordered globule. he case with our globule corresponds to g =, where, according to the work of Zhou and Karplus, there should exist a strong first-order-like transition from the random coil to the ordered globule without intermediate. We also select the core residues and show that their rms displacement behaves significantly differently to the behavior of the rms displacement of the rest of the residues and exhibits step-function-like behavior upon the change of temperature. Our findings are in agreement with the recent experimental study of the equilibrium hydrogen exchange behavior of cytochrome c of Bai et al. [39], who investigated the exposure of the amide hydrogens (H) in cytochrome c to solvent (due to local and global unfolding fluctuations). he experiments were based on the properties of the amide hydrogens

586 Vol 3 o 6 Figure Misfolded [4,]. his view is consistent with the hydrogen exchange experiments [39,4]. Such behavior of the globule is consistent with a possible first-order phase transition in a system of finite size. rmsd 3 3 3 Folded hermal equilibration 33 ime (tu) he time evolution of the rms displacement per residue of the globule from its native state and energy when we cool the system from the high temperature ( = 3.) unfolded state down to the low temperature ( =.) state. he model protein gets trapped in the misfolded conformation after tu and then proceeds to its native state after tu. Although the rms displacement of the globule from its native state is close to after tu, the energy of the globule is higher than the energy of the native state for about 4 tu. his effect is due to the thermal bath ghost particles that thermally equilibrate the system during t relax 4 tu relaxation time. that are involved in hydrogen-bonded structure and can exchange with solvent hydrogens. Bai et al. [39] demonstrated that proteins undergo folding unfolding transition through intermediate forms. hey also selected these intermediate forms (cooperative units), which are 5 5 residues in size. he presence of PFSs in our simulations is thus in agreement with the finding by Bai et al. [39] of the intermediate forms in cytochrome c. he relation between core residues that we find and the nucleus is hard to establish due to the fact that the S is very unstable. Recent amide hydrogen exchange experiments on the CheY protein from Escherichia coli of Lacroix et al. [4] provided evidence for the residues being involved in the folding nucleus; furthermore, the lattice MC simulations of Abkevich et al. [] also demonstrate that the presence of the nucleus is a necessary and sufficient condition for subsequent rapid folding to the native state. he crucial difference between the nucleus and the rest of the structure is in dynamics, which is manifest also in equilibrium fluctuations. All local unfolding fluctuations (i.e. the ones after which the chain returns rapidly back to the native state) keep the nucleus intact, whereas fluctuations that disrupt the nucleus lead to global unfolding: descend to the unfolded free energy minimum Acknowledgements We would like to thank G.F. Berriz for designing the globule, V.I. Abkevich, L. Mirny, M.R. Sadr-Lahijani and S. Erramilli for helpful discussions, and R.S. Dokholyan for help in editing the manuscript..v.d. is supported by IH RSA molecular biophysics predoctoral traineeship (GM89-9). E.I.S. is supported by IH grant RO-56. References. Levinthal, C. (968). Are there pathways for protein foldings? J. Chim. Phys. 65, 44.. Go,. (983). heoretical studies of protein folding. Annu. Rev. Biophys. Bioeng., 83-. 3. Karplus, M. & Shakhnovich, E.I. (994). Protein folding: theoretical studies of thermodynamics and dynamics. In Protein Folding. (Creighton,., ed). pp. 7 96, W.H. Freeman and Company, ew York. 4. Shakhnovich, E.I. (997). heoretical studies of protein-folding, thermodynamics and kinetics. Curr. Opin. Struct. Biol. 7, 9-4. 5. aketomi, H., Ueda, Y. & Go,. (975). Studies on protein folding, unfolding and fluctuations by computer simulations. Int. J. Peptide Protein Res. 7, 445. 6. Go,, & Abe, H. (98). oninteracting local-structure model of folding and unfolding transition in globular proteins. I. Formulation. Biopolymers, 99-. 7. Abe, H. & Go,. (98). oninteracting local-structure model of folding and unfolding transition in globular proteins. II. Application to two-dimensional lattice proteins. Biopolymers, 3-3. 8. Dill, K.A. (985). heory for the folding and stability of globular proteins. Biochemistry 4, 5-59. 9. Bryngelson, J.D. & Wolynes, P.G. (989). Intermediates and barrier crossing in a random energy model (with applications to protein folding). J. Phys. Chem. 93, 69-695.. Shakhnovich, E.I. (994). Proteins with selected sequences fold into unique native conformation. Phys. Rev. Lett. 7, 397-39.. Abkevich, V.I. Gutin, A.M., & Shakhnovich, E.I. (994). Specific nucleus as the transition state for protein folding: evidence from the lattice model. Biochemistry 33, 6-36.. Gutin, A.M., Abkevich, V.I. & Shakhnovich, E.I. (995). Evolution-like selection of fast-folding model proteins. Proc. atl Acad. Sci. USA 9, 8-86. 3. Shakhnovich, E.I. Abkevich, V.I. & Ptitsyn, O. (996). Conserved residues and the mechanism of protein folding. ature 379, 96-98. 4. Dill, K.A. (99). Dominant forces in protein folding. Biochemistry 9, 733-755. 5. Creighton,. (99). Protein Structure and Molecular Properties. Freeman, San Francisco. 6. Privaov, P.L. (989). hermodynamic problems of protein structure. Annu. Rev. Biophys. Chem. 8, 47-69. 7. Klimov, D.K. & hirumalai, D. (996). Criterion that determines the foldability of proteins. Phys. Rev. Lett. 76, 47-473. 8. Baumgartner, A. (987). Simulations of polymer models. In Applications of the Monte-Carlo Simulations in Statistical Physics. pp. 8-39, Springer, ew York. 9. Irbäck, A. & Schwarze, H. (995). Sequence dependence of selfinteracting random chains. J. Phys. A: Math. Gen. 8, -3.. Berriz, G.F., Gutin, A.M. & Shakhnovich, E.I. (997). Cooperativity and stability in a Langevin model of protein like folding. J. Chem. Phys. 6, 976-985.. Guo, Z., Brooks III, C.L. (997). hermodynamics of protein folding: a statistical mechanical study of a small all-β-protein. Biopolymers 4, 745-757.. Zhou, Y., Karplus, M., Wichert, J.M. & Hall, C.K. (997). Equilibrium thermodynamics of homopolymers and clusters: molecular dynamics and Monte-Carlo simulations of system with square-well interactions. J. Chem. Phys. 7,69-78. 3. Zhou, Y. & Karplus, M. (997). Folding thermodynamics of a threehelix-bundle protein. Proc. atl Acad. Sci. USA 94, 449-443. 4. Doi, M. (996). Introduction to Polymer Physics. Clarendon Press,

Research Paper Discrete molecular dynamics studies of the folding of a protein-like model Dokholyan et al. 587 Oxford. 5. Shakhnovich, E.I. & Gutin, A.M. (993). Engineering of stable and fast folding sequences of model proteins. Proc. atl Acad. Sci. USA 9, 795-799. 6. Abkevich, V.I. Gutin, A.M. & Shakhnovich, E.I. (996). Improved design of stable and fast-folding model proteins. Fold. Des., -3. 7. Alder, B.J. & Wainwright,.E. (959). Studies in molecular dynamics. I. General method. J. Chem. Phys. 3, 459-466. 8. Grosberg, A. Yu. & Khokhlov, A.R. (997). Giant Molecules. Appendix, Academic Press, Boston. 9. Allen, M.P. & ildesley, D.J. (987). Molecular dynamics. In Computer Simulation of Liquids. pp. -, Clarendon Press, Oxford. 3. Rapaport, D.C. (997). Step potential. In he Art of Molecular Dynamics Simulation. pp. 85-36, Cambridge University Press, Cambridge. 3. Mirny, L.A., Abkevich, V. & Shakhnovich, E.I. (996). Universality and diversity of the protein folding scenarios: a comprehensive analysis with the aid of a lattice model. Fold. Des., 3-6. 3. Landau, L.D. & Lifshiftz, E.M. (98). Statistical Physics. Pergamon, London. 33. Wetlaufer, D.B. (973). ucleation, rapid folding, and globular interchain regions in proteins. Proc. atl Acad. Sci. USA 7, 69-7. 34. Karplus, M. & Weaver, D.L. (976). Protein-folding dynamics. ature 6, 44-46. 35. Abkevich, V.I., Gutin, A.M. & Shakhnovich, E.I. (995). Domains in folding of model proteins. Protein Sci. 4, 67-77. 36. Lazaridis,. & Karplus, M. (997). ew view of protein folding reconciled with the old through multiple unfolding simulations. Science 78, 98-93. 37. Anifsen, C.B. (973). Principles that govern the folding of the protein chains. Science 8, 3-3. 38. song,.y. & Baldwin, R.L. (978). Effects of solvent viscosity and different guanidine salts on the kinetics of ribonuclease A chain folding. Biopolymers 7, 669-678. 39. Bai, Y., Sosnick,.R., Mayne, L. & Englander, S.W. (995). Protein folding intermediates: native-state hydrogen exchange. Science 69, 9-97. 4. Lacroix, E., Bruix, M., López-Hernández, E., Serrano, L. & Rico, M. (997). Amide hydrogen exchange and internal dynamics the chemotactic protein CheY from Escherichia coli. J. Mol Biol. 7, 47-487. 4. Kabsch, W. (978). A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallogr. A 34, 87-88. 4. Brooks, C.L. III. (99). Characterization of native apolyglobin by molecular dynamics simulations. J. Mol. Biol. 7, 375-38. 43. Daggett, V. & Levitt, M. (99). A model of the molten globule state from molecular dynamics simulations. Proc. atl Acad. Sci. USA 89, 54-546. 44. Sheinerman, F.B. & Brooks, C.L. III. (997). A molecular dynamics simulation study of segment B of protein G. Proteins 9, 9-. Because operates a Continuous Publication System for Research Papers, this paper has been published on the internet before being printed. he paper can be accessed from http://biomednet.com/cbiology/fad for further information, see the explanation on the contents pages.