Energy landscapes, global optimization and dynamics of the polyalanine Ac ala 8 NHMe

JOURNAL OF CHEMICAL PHYSICS VOLUME 114, NUMBER 14 8 APRIL 2001 Energy landscapes, global optimization and dynamics of the polyalanine Ac ala 8 NHMe Paul N. Mortenson and David J. Wales University Chemical Laboratories, Lensfield Road, Cambridge CB2 1EW, United Kingdom Received 17 October 2000; accepted 1 December 2000 A parallel searching algorithm using eigenvector-following is used to generate databases of minima and transition states for an all-atom model of Ac ala) 8 NHMe. The AMBER force field of Cornell et al. J. Am. Chem. Soc. 117, 5179 1995 is employed both with and without a simple implicit solvent. We use a master equation approach to analyze the dynamics of both systems, and relate the results to the potential energy landscapes using disconnectivity graphs. 2001 American Institute of Physics. DOI: 10.1063/1.1343486 I. INTRODUCTION The difficulty a protein faces in finding its native structure among an exponentially large number of alternatives was first recognized by Levinthal in 1969. 1 He estimated the number of possible conformations a polypeptide chain could adopt, and showed that, for the time scale on which a typical protein folds, only an infinitesimally small fraction of these conformations could be searched. This apparent discrepancy has become known as Levinthal s paradox. It has since been realized that such a random search corresponds to a flat energy landscape, 2 and that the topology of realistic potential energy surfaces PES s holds the key to resolving the paradox. 3,4 In order to be a reliable folder, a protein must fulfill two criteria. First, the native conformation must be thermodynamically stable at the temperature of interest. This condition leads to Anfinsen s thermodynamic hypothesis, 5 which states that the native conformation of a protein is the global free energy minimum structure at in vivo temperatures. Second, the protein has to be able to find this conformation on a reasonable time scale at the same temperature. We denote the highest temperature at which the first criterion is fulfilled as T f and the temperature below which the dynamics of the protein slow down dramatically, and it becomes stuck in local minima, as T g it has been suggested that this temperature is analogous to the glass transition temperature for solids. T g is usually defined as the temperature at which the folding rate passes below a certain threshold. A good folder should have a high ratio T f /T g. 6,7 Socci et al. and others have suggested that these criteria could be fulfilled if the free energy surface for folding is funnel-shaped, 8 i.e., consisting of a convergent set of reaction pathways. Unfortunately, free energy surfaces are not easy to calculate. They require one or more order parameters to be defined that describe the transition from the unfolded to the folded states, and it is then necessary to average over all other degrees of freedom. The radius of gyration and the fraction of native amino acid contacts are two order parameters that are often used. 9 PES s, in contrast, are easier to define, being functions only of the atomic coordinates of the system. Doye and Wales showed how efficient relaxation may occur in terms of the underlying PES. 4 It is often assumed that the global free energy minimum structure at in vivo temperatures corresponds to a very low potential energy minimum, if not the global minimum. Much work has been done recently attempting to define a criterion that could be applied to a PES to determine the global relaxation dynamics. Sali, Karplus and Shakhnovich, among others, have performed several studies using lattice models. 10 14 They calculated conformational energy distributions for various models and concluded that a significant energy gap between the group of nativelike structures and the lowest energy non-native structures is a sufficient condition for the sequence to be resistant to mutations and a fast folder. The fastest folder in this context is the one that requires the fewest Monte Carlo MC steps to find the native state starting from a random configuration. Real dynamics are obviously not possible with lattice models as there are no energy barriers between the states. Our group has recently published details of the potential energy surface of an off-lattice model protein. 15 The model in question was the three-color, 46 bead model proposed by Honeycutt and Thirumalai, 16,17 which had previously been described as both a good structure-seeker 18 and also as frustrated. 19 Here, frustration arises from an inability to satisfy all favorable interactions simultaneously. Frustration exhibits itself on the PES in the form of roughness high barriers between local minima. Roughness can be thought of in terms of the ratio of the average barrier height between minima to the average energy difference between them. Nymeyer et al. also ran molecular dynamics MD simulations for the same system with a Gō-like model, where the frustration is partly removed, and showed this to be a much better folder. The landscape of the original model contains a number of low energy minima, with similar structures but large barriers between them. The Gō-like model, in contrast, has a funnel-shaped landscape, 15 which explains the differing dynamics of the two systems. In this paper, we analyze the dynamics of two different all-atom models of Ac(ala) 8 NHMe using a master equation approach. 20,21 We generate global optimization statistics for both models, and present disconnectivity graphs for each one. We use these graphs to explain the differing dynamics, and the efficiency of the global optimization algorithm. 0021-9606/2001/114(14)/6443/12/$18.00 6443 2001 American Institute of Physics

6444 J. Chem. Phys., Vol. 114, No. 14, 8 April 2001 P. N. Mortenson and D. J. Wales Throughout this paper, unless otherwise stated, distances are measured in Å and energies are in kcal mol 1. Thus forces are measured in kcal mol 1 Å 1 and second derivatives in kcal mol 1 Å 2. The terms population and occupation probability will be used interchangeably when referring to minima, and correspond to the instantaneous probability of the system being in the catchment basin for a particular minimum, i.e., in a conformation such that local minimization would lead to that minimum. II. METHODS A. The Potentials The force fields used were two versions of the AMBER95 22 potential of Cornell et al., which we programed ourselves. The potential energy is given by FIG. 1. Probability of a given dihedral angle being shifted in any one move, as a function of position in the polypeptide chain. E bonds k r r r 0 2 angles k 0 2 torsions k 1 cos n i j A ij r B ij 12 ij r q iq j 6 ij ij r ij, where r, r 0, and 0 are the bond length, equilibrium bond length, bond angle and equilibrium bond angle, respectively. Bond lengths and angles are held near their equilibrium values by simple harmonic potentials. There is an explicit torsion angle term, taking the form of a truncated Fourier series. For the majority of dihedral angles, this series consists of one term as in Eq. 1, with periodicity n, and phase shifted by. is the actual dihedral angle. For the main backbone dihedrals there are up to three such terms, each with different n and. This term also covers improper dihedrals, which are groups of three atoms bonded to a central atom, such as a carbon sp 2 center. In this case, the angle is defined such that there is an energetic penalty for the system to move away from planarity. The last term in Eq. 1 covers the nonbonded interactions for two atoms i and j separated by at least three bonds and a distance r ij apart. The dispersion energy is represented by a 6-12 potential as shown, with coefficients A ij and B ij, which depend on the atom types i and j. Electrostatics are modeled by the Coulombic interaction of atom-centered point charges q i and q j with dielectric constant ij. The difference between the two versions of the potential used is in ij. To generate sample 1, we used a constant dielectric ij 1 and for sample 2 we used a distance dependent dielectric ij r ij. The latter choice has the effect of screening the electrostatic potential and has been used in the past as a method for representing a polar solvent such as water. 23 The parm96.dat set of parameters were used; 24 these are an improved set of parameters that differ from the original set in the main backbone dihedral angle terms, based on some high level ab initio calculations by Beachy et al. 25 1 We also coded analytic first and second derivatives of the potential with respect to atomic Cartesian coordinates, for use in geometry optimization and calculating harmonic densities of states. B. Global optimization The global optimization algorithms used in this study are based on the basin-hopping algorithm of Wales and Doye, 26 which is in turn based on the Monte Carlo with minimization MCM algorithm of Li and Scheraga. 27 Here we will present only a brief reminder of this approach. A starting configuration is generated, and the energy is minimized until the root mean square rms force is below 0.005 using the BFGS algorithm, 28 which we have found to be more efficient than both Nocedal s limited memory BFGS algorithm 29 and standard conjugate gradient routines 28 for small systems with the AMBER force field. The conformation is then changed a step as described below, and the energy of this new conformation is also minimized. The energy of the new minimum is compared with that of the old one and the move is accepted or rejected according to a Metropolis criterion. 30 This process is then repeated until either a fixed number of steps have been taken or some other end criterion is met. 1. Taking steps in conformation space In this study, steps were taken in dihedral angle space. A number of backbone dihedrals were selected and then twisted. The mechanisms for selecting and twisting the dihedrals are described below for two algorithms that we have tested. Changing a dihedral angle near the middle of the chain is likely to have a far greater effect on the overall structure, and hence on the potential energy. We therefore opted to change dihedrals near the ends of the chain more frequently than those near the center. The relative probabilities were defined using parameters P min and P max, as illustrated in Fig. 1. P min is the probability of a dihedral at the exact center of the chain being shifted, and P max is the corresponding probability for a dihedral at the end of the chain. The probability varies lin-

J. Chem. Phys., Vol. 114, No. 14, 8 April 2001 Energy landscapes and dynamics of a polyalanine 6445 early between the ends and the middle of the chain. In this study, values of P min 0.1 and P max 0.2 were used, based on a number of preliminary trials. No upper limit was placed on the number of dihedral angles that could be shifted, and the lower limit was a single dihedral. Both of the algorithms described below shift dihedral angles in this way. The two step-taking algorithms considered here differ in the angles through which the dihedrals are shifted. In the first algorithm, for each dihedral that is to be changed, an angle is chosen randomly from the range and the dihedral is rotated by this amount. The second algorithm takes each selected dihedral in turn and twists it through the range to in increments of 0.0872 5 deg. The dihedral is then set to the angle in this range with the lowest energy provided that getting to this point from the original conformation involves going over an energy barrier. The latter condition simply ensures that in cases where the point with the lowest energy is actually the original conformation, the dihedral is still shifted. We refer to the first algorithm as the basic algorithm, and the second as the smart algorithm. The idea of taking steps in dihedral angle space is not new, and neither is the use of biased MC moves. Abagyan and Totrov 31 and Derreumaux 32 34 have both used statistically obtained Ramachandran plots i.e., prior knowledge to guide their MC steps. Our smart algorithm only uses information that it generates itself, and is therefore unbiased. 2. Global optimization statistics We generated 100 starting conformations for each dielectric constant, using a high temperature MCM run, so that essentially every move was accepted. The runs were 150 steps long, and the lowest energy 100 conformations with each force field were kept and used as the starting conformations for the global optimization runs. With each force field, we then performed two basinhopping runs from each starting conformation, one using the basic algorithm and one using the smart algorithm. Each run was terminated after 5000 steps or when it first encountered the global minimum structure, whichever occurred first. The runs using the basic algorithm were performed at k B T 2 kcal mol 1 for accepting or rejecting steps, and those using the smart algorithm were run at 3 kcal mol 1. We have previously determined these to be approximately optimal temperatures for each algorithm in trial runs. C. Generation of databases The master equation approach to dynamics relies on having a database containing an exhaustive set of minima as well as the transition states connecting them. It has been estimated that the number of stationary points i.e., points where E 0) on the PES increases exponentially with the number of atoms, 35 37 so for all except very small systems it is not possible to generate such a database. Instead, we attempt to generate a sample of stationary points that is representative of the whole surface. The generation of this database can be broken down into two parts; algorithms that locate stationary points on the surface, and a method for propagating the search over the surface. The present system is small enough for a reasonably representative database to be generated, although it is by no means exhaustive; Westerburg and Floudas, for example, found over 60 000 minima for a tetra-alanine. 38 1. Locating stationary points We define a minimum as a stationary point with no negative Hessian eigenvalues, i.e., no imaginary normal mode frequencies. We define a transition state as a stationary point with exactly one negative eigenvalue. 39 All searches described here were performed using our program OPTIM. Transition states and minima were located using the eigenvector-following EF method. 40 43 For systems of the size considered here, recently developed hybrid eigenvectorfollowing methods 44 are not required, and we used analytic second derivatives and Hessian diagonalization at every step, as described elsewhere. 45 Starting from a minimum, we first took a step along the chosen Hessian eigenvector. The size of this step can seriously affect the efficiency of the search. For the present searches, a fixed eigenvector-following step size of 0.2 was used until a negative Hessian eigenvector was obtained. Subsequent steps used the dynamic trust radius scaling scheme described elsewhere. 45 Having located a transition state, we then took a smaller step (0.1) parallel to the reaction coordinate the Hessian eigenvector with the unique imaginary frequency and followed this eigenvector downhill using EF energy minimization until we reached a minimum. This procedure usually gives a reasonable approximation to the steepest descent path. We repeated the minimization taking a step antiparallel to the reaction coordinate to generate a complete pathway. At the end of this process, we have a transition state and the two minima that are connected to it. Both minima and transition states were considered converged when the rms force dropped below 10 6. 2. Propagating the search Having performed a search as described above, we then have to decide how to proceed. First, it is possible that the search for a transition state fails i.e., does not converge within a fixed number of steps. In this case, we search along the next Hessian eigenvector of the minimum from which we started. A set number of directions 15 in this study were searched from each minimum and once all of these had been considered we moved to the next minimum by index in the database. If all directions of all minima had been searched then the algorithm terminated, but this condition was not reached for either of the present potentials. Having found a transition state and connected minima, we must decide whether or not to add these to our database. As we use the sample to perform master equation dynamics, it is crucial that all minima in the sample are connected, i.e., that a pathway can be constructed between any two minima in the database using only transition states and minima that are also in the database. Thus if neither minimum was already in the database, the details of the path were recorded separately, but not included. As the searches proceeded, the databases became larger, so we periodically checked the discarded paths and if they had become connected they were

6446 J. Chem. Phys., Vol. 114, No. 14, 8 April 2001 P. N. Mortenson and D. J. Wales then included. If either minimum found in the search was already in the database, then the path was added, and the new minimum used as a starting point for subsequent transition state searches. This algorithm is easily parallelizable, as the results of all the searches are independent; it scales almost linearly with the number of processors used. D. Calculation of equilibrium populations and rate constants Here we will give only a brief description of the methods used; more details are available elsewhere. 46 Canonical equilibrium populations were calculated using the harmonic superposition approximation, 47 51 where the partition function is written as the sum of the partition functions for each minimum: Z T i Z i T, 2 with Z i (T) the partition function for the catchment basin associated with minimum i. To calculate each Z i, we perform a Taylor expansion of the potential energy around the minimum in normal mode coordinates and truncate it at the second term, E E i 1 2 2 j Q 2 j, j 1 3 TABLE I. Dihedral angles for the global minima obtained with both versions of the dielectric constant. All values in degrees. where W ij is the rate constant for transitions from state minimum j to state i, and P i (t) is the population of state i at time t. The well-known analytic solution to Eq. 6 is: 20,21 P i t P i eq j,k ij 1 U ij U kj P k 0 P k eq e j t, ij r ij Residue 1 153.239 143.180 49.841 50.162 2 50.256 120.044 56.543 49.705 3 61.193 65.573 60.663 48.365 4 45.953 41.084 55.121 51.552 5 136.416 62.248 55.717 50.960 6 60.422 76.906 56.003 48.693 7 142.285 82.591 59.969 45.628 8 153.443 144.643 62.313 45.747 7 where E i is the potential energy of minimum i, Q j is a displacement along the jth normal mode of the minimum, with angular frequency j, and is the number of degrees of freedom (3N 6, where N is the number of atoms. We thereby obtain the usual harmonic approximation: Z i T exp E i, 4 h i where i is the geometric mean normal mode frequency for minimum i and 1/k B T. Canonical rate constants were calculated using Rice Ramsperger Kassel Marcus RRKM theory, 52 giving k i T i e (E 1 Ei ), 5 where k i (T) is the rate constant out of minimum i through transition state at temperature T, E is the potential energy of the transition state and is the geometric mean normal mode frequency transition state. The rate constant for the process i j is simply the sum of Eq. 5 over all transition states linking i and j. Both rate constants and equilibrium populations were calculated at 300 K. E. Dynamics using the master equation approach Having calculated the equilibrium populations and rate constants as in Sec. II D, we can write down a differential equation for P(t), the vector of minima populations: dp i t dt W ij P j t W ji P i t, 6 j FIG. 2. Color Global minimum structure with ij 1. Top diagram shows backbone carbon green and nitrogen blue atoms, as well as the capping methyl groups gray. Bottom diagram also shows oxygen atoms red, polar hydrogens white and sidechain methyl groups. Hydrogen bonds are indicated by dotted lines. Diagrams generated using Xmakemol Ref. 66.

J. Chem. Phys., Vol. 114, No. 14, 8 April 2001 Energy landscapes and dynamics of a polyalanine 6447 FIG. 3. Color A right-handed -helix, the global minimum for ij r ij. Top diagram shows backbone carbon green and nitrogen blue atoms, as well as the capping methyl groups. Bottom diagram also shows oxygen atoms red, polar hydrogens white and sidechain methyl groups. Hydrogen bonds are indicated by dotted lines. Diagrams generated using Xmakemol Ref. 66. where U is the orthogonal matrix that diagonalizes the symmetric matrix W with elements P j eq W ij P i eq W ij ij W k ki. 8 We used Eq. 7 to calculate P at times from 10 12 sto 100 s. For P(0) we took the equilibrium population at a temperature of about 2500 K, calculated as described in Sec. II D. This starting point simply provides a convenient nonequilibrium distribution. In order to visualize the potential energy surfaces of the two samples, we constructed disconnectivity graphs for both. These graphs have been extensively used recently for displaying key features of the landscape, 15,53 58 and also to interpret dynamics and thermodynamic properties. III. RESULTS A. Global optimization Structures for the global minima for both force fields are given in Table I in terms of the backbone dihedral angles. Figure 2 shows the global minimum structure for ij 1 which is a loop structure, and Fig. 3 shows the global minimum structure for ij r ij, which is a right-handed -helix. Table II shows the success rates for global optimization using the two step-taking algorithms described in Sec. II B and both force fields. The number of runs in each case that found the global minimum within 5000 steps is given, along with the average number of steps the successful runs took before first encountering the global minimum mean first encounter time. We emphasize that as the average has only been taken over the successful runs, this is simply a lower bound to the mean first encounter time. B. Samples Both runs were terminated when it became apparent that very few new low energy minima were being found. Sample 1 ( ij 1) contains 7400 minima and 10 266 transition states, sample 2 ( ij r ij ) contains 6075 minima and 11 661 transition states. Although far from exhaustive, we believe these results are complete enough to give a useful picture of the full PES. Figure 4 shows the disconnectivity graph for sample 1, and Fig. 5 shows the graph for sample 2. Only the 250 TABLE II. Global optimization statistics for both potentials with the two step-taking algorithms. n hits is the number of runs that found the global minimum within 5000 steps. 100 runs were performed in total. n steps is the average number of steps taken to find the global minimum by those runs that were successful. Algorithm ij 1 ij r ij n hits n steps n hits n steps Basic 37 2057 100 686 Smart 25 2328 99 363

6448 J. Chem. Phys., Vol. 114, No. 14, 8 April 2001 P. N. Mortenson and D. J. Wales FIG. 4. Disconnectivity graph for sample 1 showing the 250 lowest energy minima. Vertical scale is energy in kcal mol 1. minima lowest in energy from each sample are shown, for clarity. C. Dynamics Minima are grouped together within each sample according to the number of native residues they contain. A residue is defined as native if both of its backbone dihedral angles are within 20 deg of the values for that residue in the global minimum structure. Minima are then classified in three different ways; non-native if they contain 0 or 1 native residues, part-native if they contain between 2 and 6 native residues and near-native if they contain 7 or 8 native residues. Table III shows the numbers of minima in each group for both samples. The populations of these groups of minima were calculated as a function of time by summing the occupation probabilities of the minima within each group, calculated using the master equation approach. Results are shown in Fig. 6 for both samples. IV. DISCUSSION The two samples obtained with and without the distance dependent dielectric constant are very different in character. In sample 1 ( ij 1) there are far more structures close to the global minimum in energy than in sample 2 ( ij r ij ). This difference is reflected by the fact that in sample 1 there are 77 minima within 5 kcal mol 1 of the global minimum, and 1244 within 10 kcal mol 1, whereas for sample 2 there are 7 and 107 such minima, respectively. Figure 7 shows the energy of minima in the two samples plotted against their rank in an energy ordered list; the graph for sample 2 is initially much steeper. Interestingly, after about the first 300 minima, the slopes become similar, suggesting that the underlying densities of minima for the samples are approximately equal. The samples are, of course, not exhaustive. However, we believe that they are almost comprehensive for the low energy regions of the PES, and reasonably representative of the whole surface. It is impossible to know if this is true without further calculations, but we can speculate on the effect that the incomplete nature of the samples will have on the dynamics of the system. Assuming that our samples contain most of the low energy minima on the PES, there are two possible shortcomings. First, we almost certainly have only a tiny fraction of the higher energy minima on the PES. However, we would expect these to have very small equilibrium occupations and that the escape time from any of these minima would be very short. Thus we would not expect these minima to have a large effect on the dynamics. The other possibility is that we have underestimated the connectivity of the low energy minima, i.e., we are missing transition states for low energy minima. Looking at the ratio of transition states to minima for the samples ( 1.4 for sample 1, and 1.9 for sample 2 this is probably true. We would expect that this underestimate of the connectivity will lead to an overestimate of the time required for the system to evolve. However, for low energy minima we tend to have more transition states, so our representation of the PES and its connectivity improves as the system relaxes. Figure 8 shows the distribution of minima with energy relative to the global minimum for both samples. The distribution for sample 1 is centered around 13 kcal mol 1 and that for sample 2 around 18 kcal mol 1. The distribution for sample 2 also has a longer tail on the low energy side of the peak than sample 1. It has been suggested that an important criterion for the stability of the native state of a protein is a large energy gap between the small set of nativelike structures and the lowest in energy of the much larger set of non-native structures. 10 12,14,59 This criterion is well satisfied in the case of sample 2, but not for sample 1, so we would expect the native state for ij r ij to be much more stable thermodynamically than that for ij 1. Table IV shows the equilibrium occupation probabilities for the three groups of minima non-native, part-native, near-native for each sample at 300 K and confirms the above conclusion, with near-native minima having an occupation probability of 0.998 in sample 2, but only 0.332 in sample 3. Figure 9 shows the equilibrium occupation probabilities for the nearnative group of minima of both samples as a function of temperature, again clearly illustrating the large difference in thermodynamic stability between the native states of the two potentials. The results from the global optimization runs are also very different for the two different dielectric constants. Those for ij r ij have success rates of 100% basic algorithm and 99% smart algorithm, compared to 37% and

J. Chem. Phys., Vol. 114, No. 14, 8 April 2001 Energy landscapes and dynamics of a polyalanine 6449 FIG. 5. Color Disconnectivity graph for sample 2 showing the 250 lowest energy minima. Vertical scale is energy in kcal mol 1. The funnel leading to the global minimum is marked in red, while that leading to a competitive minimum is marked in blue. 25%, respectively, for ij 1. These results can be partly explained by the much larger number of competitive, nonnative minima in sample 1, but we would also expect the topology of the PES to play its part, and this effect is visible in the disconnectivity graphs. It is also noteworthy that the two step-taking algorithms had differing degrees of success with the two force fields. The smart algorithm fared better for sample 2, scoring almost the same number of hits successful runs, but needed only about half as many steps as for the basic algorithm. For sample 1, in contrast, the basic algorithm was both more reliable more hits and quicker if it was successful. We can use the graphs to explain the differing success of the two algorithms for each sample. We postulate that the smart algorithm is very good at finding the bottom of any funnel, or superbasin, 53 that it finds itself in. Therefore in sample 2, where the whole landscape leads toward the global minimum with one small diversion into the blue funnel, it is much quicker at finding the global minimum than the basic algorithm. However, in sample 1, where there are many funnels leading to low energy non-

6450 J. Chem. Phys., Vol. 114, No. 14, 8 April 2001 P. N. Mortenson and D. J. Wales TABLE III. Number of minima within each classification for both samples. Non-native structures contain 0 or 1 native residues, part-native structures contain between 2 and 6 native residues and near-native structures contain 7 or 8 native residues. Classification Sample 1 ( ij 1) Sample 2 ( ij r ij ) Non-native 6719 4897 Part-native 680 1170 Near-native 1 8 native minima, this tendency to find the bottom of any funnel rapidly is undesirable when the starting point is in the wrong region of conformation space. In this case, the basic algorithm is actually more successful, and even in sample 2, the FIG. 7. Graph plotting the energy of minima relative to the global minimum in each sample against their rank by energy in the sample. Sample 1 is shown by the solid line and sample 2 by the dashed line. smart algorithm failed in one run to find the global minimum within 5000 steps. The results of the master equation dynamics for both samples are shown in Fig. 6. For sample 2, the distribution stays approximately fixed, until about 10 9 s, with approximately 95% of the probability in non-native minima and the remainder in part-native states. At about 10 9 s there is a flow of probability from non-native states to native states, stopping at around 10 7 s when the populations are about 0.6 and 0.4, respectively. During this phase the population of part-native states also peaks, suggesting that these states act as intermediates for the transfer of probability. The populations of all groups then stay approximately constant until around 10 5 s, when the population of the near-native group of minima starts to increase again. This time there is no concurrent rise in the population of the part- FIG. 6. Graph showing the time evolution of the occupation probabilities of three groups of minima for sample 1 top and sample 2 bottom. The long-dashed line represents the non-native minima 0 or 1 native residues, the short-dashed line represents the part-native minima between 2 and 6 native residues and the solid line follows the near-native minima 7 or8 native residues. FIG. 8. Histograms showing the number of minima lying between E and E 1.0 kcal mol -1 for sample 1 black and sample 2 gray.

J. Chem. Phys., Vol. 114, No. 14, 8 April 2001 Energy landscapes and dynamics of a polyalanine 6451 TABLE IV. Equilibrium occupation probabilities at 300 K for the three groups of minima for each sample. Group Sample 1 Sample 2 Non-native 0.413 0.000 191 Part-native 0.255 0.001 51 Near-native 0.332 0.998 FIG. 9. Graph showing the equilibrium population of the near-native group of minima for sample 1 solid and sample 2 dashed at various temperatures. FIG. 10. Graph comparing the occupation probabilities from the full dashed and pruned solid versions of sample 2. The pruned sample contains 4191 minima and 9339 transition states. native minima, so here it appears that these states do not act as intermediates, or are very short-lived if they do. By 10 2 s, the distribution approaches equilibrium, and shortly thereafter the populations of all three groups of minima appear to fall rapidly not shown. The latter effect is due to the accumulation of numerical errors within the algorithm. The sum of the components of the probability vector drops below 0.999 after 0.017 s and below 0.99 after 0.18 s. The matrix W see Sec. II E should have exactly one zero eigenvalue, corresponding to the eigenvector with the equilibrium probability distribution. However, the nonzero eigenvalues of W range over 14 orders of magnitude, and at 300 K the lowest one is quite close to zero. This distribution makes it hard to accurately resolve the smallest nonzero eigenvalue, and as there is a term in Eq. 7 that is exponential in both the eigenvalues and time, the formula loses precision at long time scales. We attempted to resolve this problem by pruning the sample and rerunning the master equation dynamics. Pruning involves removing all dead-end minima within the sample whose equilibrium population is below some threshold value, 60 in this case 10 6. A dead-end minimum is one that is only connected to one other minimum in the sample. The results for this pruned sample are shown in Fig. 10. It can be seen that although the algorithm still loses accuracy, it maintains integrity for about another order and a half of magnitude. The sum of the components of the probability vector now drops below 0.999 after 0.43 s and below 0.99 after 3.8 s. These times are long enough to see the distribution reach its equilibrium position. The graph also shows that the pruning process does not qualitatively affect the dynamics of the system, even though we have removed just over 30% of the minima. This is a useful observation, as the computer time required to diagonalize W scales as M 3 where M is the number of minima, so pruning the sample in this fashion decreases the computation time required for this step by a factor of 3. It also tends to increase the ratio of transition states to minima in the sample, thus increasing their average connectivity, which may give a more realistic view of the surface. The two stage flow of probability into near-native minima seen for sample 2 suggests the existence of a kinetic trap, i.e., a non-native minimum or group of minima that is are low enough in energy to be metastable on a short timescale. It should be possible to spot such a feature by looking at the corresponding disconnectivity graph, as discussed below. The dynamics for sample 1 as shown in Fig. 6 occur over a longer timescale than for sample 2. The system appears to reach equilibrium between 1 and 10 s, which is only two orders of magnitude longer than sample 2. However, the populations of the three groups stay virtually constant until 10 6 s and a significant flow of probability does not begin until between 10 5 and 10 4 s. The corresponding probability flow occurs at a time of about 10 8 s for sample 2. Again, at longer time scales there are numerical problems and the populations of all three groups of minima start to drop significantly after about 10 s. The sum of the components of P dropped below 0.999 after 1.6 s and below 0.99 after 17.6 s. We tried pruning the sample using the same criteria as for sample 2, but on this occasion it did not enable us to significantly extend the accessible time scale. The disconnectivity graphs for the two samples Figs. 4 and 5 reveal two rather different landscapes. The graph for sample 1 very much resembles the banyan tree motif, 55 which implies a relatively flat landscape with minima of similar energy separated by high barriers. The graph for

6452 J. Chem. Phys., Vol. 114, No. 14, 8 April 2001 P. N. Mortenson and D. J. Wales FIG. 12. Color Graph showing the time evolution of the occupation probabilities of the two competing funnels for sample 2. The red line follows the native funnel and the blue line follows the competing funnel. FIG. 11. Color Structure of the minimum at the bottom of the blue funnel in the disconnectivity graph for sample 2. Top diagram shows backbone carbon atoms and nitrogen blue atoms, and terminal methyl groups gray. Bottom diagram shows all backbone atoms, polar hydrogens white and sidechain methyl groups. Hydrogen bonds are indicated by dotted lines. Diagrams generated using Xmakemol Ref. 66. sample 2 consists of a funnel leading to the global minimum red, another leading to a minimum at E 15.674 blue and the rest of the landscape gently sloping toward the main funnel, which is considerably lower in energy. Figure 11 shows the minimum at the bottom of the blue funnel, and Table V gives its dihedral angles. This description in terms TABLE V. Dihedral angles for the minimum at the bottom of the blue funnel in the disconnectivity graph for sample 2. All values in degrees. Residue 1 59.95 141.23 2 49.11 121.24 3 65.39 47.76 4 53.79 37.34 5 137.57 48.93 6 150.40 94.04 7 140.96 23.28 8 68.59 113.12 of a double funnel may be justified from the intermediate population observed in the dynamics. The above results explain why sample 1 takes about two orders of magnitude longer to reach an equilibrium distribution than sample 2. It is also possible to see why it takes four orders of magnitude longer for any significant movement of probability between the groups, as there are much higher barriers between basins than there are in sample 2, and it is this inter-basin flow that is most significant in Fig. 6. In sample 1 we can also see a number of low energy minima, competitive with the global minimum but separated from it by large barriers. We would expect this system to be a poor structure-seeker : at a temperature low enough for the global minimum to be significantly populated, the barriers between competing low energy minima and the global minimum are difficult to surmount. In sample 2, the minimum at the bottom of the blue funnel contains only one native residue as defined in Sec. III B and so is structurally very different from the global minimum. We might expect the former minimum to act as a good kinetic trap, which could explain the two stage increase in the near-native population with time. In order to determine if this is the case, we performed further master equation dynamics and this time followed the populations of the blue and red funnels over time. The results are shown in Fig. 12. The populations of the two funnels are quite similar for the first half of the graph, remaining almost static and then increasing quite sharply between 10 9 and 10 7 s. Then, between 10 5 and 10 4 s, there is a flow of probability from minima in the blue funnel to minima in the red, native funnel. We can thus explain the two stage character of the dynamics in terms of the potential energy landscape; the overall landscape is funnel-shaped, but behaves like a double funnel at low energies. The first stage in relaxation from a high temperature distribution is the rapid increase in the populations of both funnels as probability flows from higher energy minima, and the second stage is the much slower process of

J. Chem. Phys., Vol. 114, No. 14, 8 April 2001 Energy landscapes and dynamics of a polyalanine 6453 the system moving from the non-native funnel to the lower energy native one. V. CONCLUSIONS We have shown through master equation dynamics that the polyalanine Ac(ala) 8 NHMe modeled with a screened Coulombic term relaxes more efficiently to the global minimum than for a fixed dielectric constant ij 1. The global minimum for ij r ij is a right-handed -helix, while for ij 1 the global minimum has a looped structure. For ij r ij the system equilibrates on a reasonable time scale at a temperature where the global minimum and closely related structures are the only significantly populated states at equilibrium. We have related this finding to the very different energy landscapes the two force fields produce, one which is broadly funnellike but exhibits a secondary funnel corresponding to a deep kinetic trap, and the other a flat landscape with high barriers, which has been shown to provide a very difficult problem for global optimization. 58,61 We describe the first landscape in two ways because it exhibits characteristics of both kinds of landscapes. Relaxation to the global minimum is relatively fast, so the surface is broadly funnellike. However, we have shown that the blue minimum from Fig. 5 acts as a kinetic trap, so we also feel justified in describing the surface as a double funnel. We have also shown that in this instance, the existence of a significant energy gap between nativelike and non-native structures greatly improves the thermodynamic stability of the native state, but that the global connectivity of the PES contributes to the ability of the system to relax to this native state. This same feature of the landscape also determines the relative ease with which an unbiased global optimization algorithm will succeed. We have detailed a dihedral step-taking algorithm that possesses the useful ability to find the bottom of funnels quickly, but which tends to get stuck in low energy local minima. The ability to escape such features more quickly would greatly increase the usefulness of this algorithm. Parallel runs employing jump-walking, 62 replicas 63,64 or simulated tempering 65 might be useful in this regard. Our results are, of course, limited by the accuracy of the empirical force field employed, and by the harmonic approximations made in deriving model densities of states for the dynamics and thermodynamics. However, we have probably managed to survey the low energy region of the corresponding potential energy surfaces quite thoroughly, and the principal features of our analysis should only depend on the fundamental structure of the energy landscape in question. The most interesting result is probably the more efficient relaxation in the system with implicit solvent, modeled by a distance dependent dielectric. These characteristics are also reflected in the efficiency of global optimization. It is noteworthy that the calculated landscapes for a tetrapeptide with the same two potentials revealed more efficient relaxation for the unscreened Coulomb potential. 46 The present results for Ac(ala) 8 NHMe are in better agreement with our expectation that common elements of secondary structure could readily form in solvent. ACKNOWLEDGMENTS We thank Dr. Mark Miller for providing the Fair_Phyllis program that we adapted for our parallel searching algorithm. We gratefully acknowledge financial support from the EPSRC. 1 C. Levinthal, Mossbauer Spectroscopy in Biological Systems Proceedings of a Meeting Held at Allerton House, Monticello, Illinois, University of Illinois Press 1969. 2 J. D. Bryngelson, J. N. Onuchic, N. D. Socci, and P. G. Wolynes, Proteins: Struct., Funct., Genet. 21, 167 1995. 3 P. Leopold, M. Montal, and J. Onuchic, Proc. Natl. Acad. Sci. U.S.A. 89, 8721 1992. 4 J. P. K. Doye and D. J. Wales, J. Chem. Phys. 105, 8428 1996. 5 C. B. Anfinsen, Science 181, 223 1973. 6 N. D. Socci and J. N. Onuchic, J. Chem. Phys. 101, 1519 1994. 7 N. D. Socci, J. N. Onuchic, and P. G. Wolynes, J. Chem. Phys. 104, 5860 1996. 8 N. D. Socci, J. N. Onuchic, and P. G. Wolynes, Proteins: Struct., Funct., Genet. 32, 136 1998. 9 B. D. Bursulaya and C. L. Brooks III, J. Am. Chem. Soc. 121, 9947 1999. 10 A. Sali, E. I. Shakhnovich, and M. Karplus, Nature London 369, 248 1994. 11 A. Sali, E. I. Shakhnovich, and M. Karplus, J. Mol. Biol. 235, 1614 1994. 12 E. I. Shakhnovich, Phys. Rev. Lett. 72, 3907 1994. 13 M. Karplus and A. Sali, Curr. Opin. Struct. Biol. 5, 58 1995. 14 H. Li, R. Helling, C. Tang, and N. Wingreen, Science 273, 666 1996. 15 M. A. Miller and D. J. Wales, J. Chem. Phys. 111, 6610 1999. 16 J. D. Honeycutt and D. Thirumalai, Proc. Natl. Acad. Sci. U.S.A. 87, 3526 1990. 17 J. D. Honeycutt and D. Thirumalai, Biopolymers 32, 695 1992. 18 R. S. Berry, N. Elmaci, J. P. Rose, and B. Vekhter, Proc. Natl. Acad. Sci. U.S.A. 94, 9520 1997. 19 H. Nymeyer, A. E. García, and J. N. Onuchic, Proc. Natl. Acad. Sci. U.S.A. 95, 5921 1998. 20 N. G. van Kampen, Stochastic Processes in Physics and Chemistry North Holland, Amsterdam, 1981. 21 R. E. Kunz, Dynamics of First-Order Phase Transitions Deutsch, Thun, 1995. 22 W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. W. Merz Jr., D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. A. Kollman, J. Am. Chem. Soc. 117, 5179 1995. 23 B. R. Gelin and M. Karplus, Proc. Natl. Acad. Sci. U.S.A. 72, 2002 1975. 24 P. Kollman, R. Dixon, W. Cornell, T. Fox, C. Chipot, and A. Pohorille, Computer Simulation of Biomolecular Systems 3, 83 1997. 25 M. D. Beachy, D. Chasman, R. B. Murphy, T. A. Halgren, and R. A. Friesner, J. Am. Chem. Soc. 119, 5908 1997. 26 D. J. Wales and J. P. K. Doye, J. Phys. Chem. A 101, 5111 1997. 27 Z. Li and H. A. Scheraga, Proc. Natl. Acad. Sci. U.S.A. 84, 6611 1987. 28 W. Press, B. Flannery, S. Teukolsky, and W. Vetterling, Numerical Recipes Cambridge University Press, Cambridge, 1986. 29 D. Liu and J. Nocedal, Mathematical Programming B 45, 503 1989. 30 N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, J. Chem. Phys. 21, 1087 1953. 31 R. Abagyan and M. Totrov, J. Mol. Biol. 235, 983 1994. 32 P. Derreumaux, J. Chem. Phys. 106, 5260 1997. 33 P. Derreumaux, J. Chem. Phys. 107, 1941 1997. 34 P. Derreumaux, J. Chem. Phys. 111, 2301 1999. 35 F. H. Stillinger and T. A. Weber, Phys. Rev. A 25, 978 1982. 36 F. H. Stillinger and T. A. Weber, Science 225, 983 1984. 37 F. H. Stillinger, Phys. Rev. E 59, 48 1999. 38 K. M. Westerberg and C. A. Floudas, J. Chem. Phys. 110, 9259 1999. 39 J. N. Murrell and K. J. Laidler, J. Chem. Soc., Faraday Trans. 64, 371 1968. 40 G. M. Crippen and H. A. Scheraga, Arch. Biochem. Biophys. 144, 462 1971. 41 J. Pancír, Collect. Czech. Chem. Commun. 40, 1112 1974. 42 R. L. Hilderbrandt, Comput. Chem. Oxford 1, 179 1977. 43 C. J. Cerjan and W. H. Miller, J. Chem. Phys. 75, 2800 1981.

6454 J. Chem. Phys., Vol. 114, No. 14, 8 April 2001 P. N. Mortenson and D. J. Wales 44 L. J. Munro and D. J. Wales, Phys. Rev. B 59, 3969 1999. 45 D. J. Wales and T. R. Walsh, J. Chem. Phys. 105, 6957 1996. 46 D. Wales, J. Doye, M. Miller, P. Mortenson, and T. Walsh, Adv. Chem. Phys. 115, 1 2000. 47 D. J. McGinty, J. Chem. Phys. 55, 580 1971. 48 J. J. Burton, J. Chem. Phys. 56, 3133 1972. 49 M. R. Hoare, Adv. Chem. Phys. 40, 49 1979. 50 G. Franke, E. R. Hilf, and P. Borrmann, J. Chem. Phys. 98, 3496 1993. 51 D. J. Wales, Mol. Phys. 78, 151 1993. 52 R. G. Gilbert and S. C. Smith, Theory of Unimolecular and Recombination Reactions Blackwell, Oxford, 1990. 53 O. M. Becker and M. Karplus, J. Chem. Phys. 106, 1495 1997. 54 O. M. Becker, J. Mol. Struct.: THEOCHEM 398-399, 507 1997. 55 D. J. Wales, M. A. Miller, and T. R. Walsh, Nature London 394, 758 1998. 56 Y. Levy and O. M. Becker, Phys. Rev. Lett. 81, 1126 1998. 57 M. A. Miller, J. P. K. Doye, and D. J. Wales, J. Chem. Phys. 110, 328 1999. 58 J. P. K. Doye, M. A. Miller, and D. J. Wales, J. Chem. Phys. 110, 6896 1999. 59 M.-H. Hao and H. A. Scheraga, J. Chem. Phys. 107, 8089 1997. 60 M. A. Miller, J. P. K. Doye, and D. J. Wales, Phys. Rev. E 60, 3701 1999. 61 J. P. K. Doye, D. J. Wales, and M. Miller, J. Chem. Phys. 109, 8143 1998. 62 D. D. Frantz, D. L. Freeman, and J. D. Doll, J. Chem. Phys. 93, 2769 1990. 63 R. H. Swedensen and J.-S. Wang, Phys. Rev. Lett. 57, 2607 1986. 64 D. Gront, A. Kolinski, and J. Skolnick, J. Chem. Phys. 113, 5065 2000. 65 E. Marinari and G. Parisi, Europhys. Lett. 19, 451 1992. 66 M. P. Hodges, XMakemol: A Program for Visualizing Atomic and Molecular Systems, Version 4., Nottingham 1999.