Computer Modeling of Protein Folding: Conformational and Energetic Analysis of Reduced and Detailed Protein Models

Size: px
Start display at page:

Download "Computer Modeling of Protein Folding: Conformational and Energetic Analysis of Reduced and Detailed Protein Models"

Transcription

1 Cust. Ref. No. PEW 62/94 [SGML] J. Mol. Biol. (1995) 247, : Conformational and Energetic Analysis of Reduced and Detailed Protein Models Alessandro Monge, Elizabeth J. P. Lathrop, John R. Gunn Peter S. Shenkin and Richard A. Friesner* Department of Chemistry and Center for Biomolecular Simulation, Columbia University, New York NY 10027, U.S.A. *Corresponding author Recently we developed methods to generate low-resolution protein tertiary structures using a reduced model of the protein where secondary structure is specified and a simple potential based on a statistical analysis of the Protein Data Bank is employed. Here we present the results of an extensive analysis of a large number of detailed, all-atom structures generated from these reduced model structures. Following side-chain addition, minimization and simulated annealing simulations are carried out with a molecular mechanics potential including an approximate continuum solvent treatment. By combining reduced model simulations with molecular modeling calculations we generate energetically competitive, plausible misfolded structures which provide a more significant test of the potential function than current misfolded models based on superimposing the native sequence on the folded structures of completely different proteins. The various contributions to the total energy and their interdependence are analyzed in detail for many conformations of three proteins (myoglobin, the C-terminal fragment of the L7/L12 ribosomal protein, and the N-terminal domain of phage 434 repressor). Our analysis indicates that the all-atom potential performs reasonably well in distinguishing the native structure. It also reveals inadequacies in the reduced model potential, which suggests how this potential can be improved to yield greater accuracy. Preliminary results with an improved potential are presented. Keywords: protein folding; computer modeling; potential functions Introduction The thermodynamics of protein folding has been the subject of intense experimental and theoretical study for many years (Privalov, 1989; Yang et al., 1992). From a theoretical point of view, the problem is extraordinarily difficult because the free energy of folding is a small fraction of the total free energy of the protein; consequently, one needs to calculate a small energy difference with empirical potential functions whose quality for this purpose is not known. Because globally different protein conformations must be compared, the cancellation of error that is often relied upon in free energy perturbation calculations is problematic. An adequate evaluation of the total energy of the protein is therefore Abbreviations used: r.m.s., root-mean-square; PDB, Protein Data Bank; RSA, rotamer simulated annealing; MBO, myoglobin; CTF, C-terminal fragment of the L7/L12 ribosomal protein; R69, N-terminal domain of phage 434 repressor; CPU, central processor unit. necessary. Furthermore, the size and complexity of the molecule and associated solvent make the usual procedures of molecular mechanics computationally expensive; this difficulty is compounded by the existence of a huge number of local minima (even in the neighborhood of the global minimum) which impedes rapid conformational sampling of phase space. Over the past several years, a few papers have appeared which have examined the total molecular mechanics energy of a small number of different protein conformations (Levitt & Sharon, 1988; Daggett & Levitt, 1991; Mark & van Gunsteren, 1992; Novotny et al., 1984, 1988; Bryant & Lawrence, 1993). While many of these results have been interesting, the amount of data obtained is not really sufficient to fully address the problem described above. It is relatively straightforward to generate large numbers of plausible conformations that are very close (e.g. 1 to 2 Å r.m.s. deviation) to the native structure, for example starting from the X-ray structure and using molecular dynamics or /95/ $08.00/ Academic Press Limited

2 996 simulated annealing algorithms (Daggett & Levitt, 1991; Mark & van Gunsteren, 1992). It is also straightforward to thread the sequence of one protein through the structure of another, as was done by Karplus and co-workers in a pioneer study of this type (Novotny et al., 1984) and more recently, for instance, by Bryant & Lawrence (1993). However, in order to design a methodology for protein folding by computer, it is crucial to examine a large number of low energy structures with conformations significantly different (at least in the 4 to 6 Å r.m.s. deviation range, which is estimated to be in the molten globule regime) from the native. Such structures can only be produced via simulations capable of rapidly traversing configuration space, which at the same time utilize a potential function that is a plausible approximation to the actual potential. Many studies have also been carried out using reduced protein models and approximate potentials (Friedrichs et al., 1991; Hinds & Levitt, 1992; Sun, 1993; Skolnick & Kolinski, 1990; Kolinski et al., 1993; Lau & Dill, 1989; Shakhnovich et al., 1991; Covell & Jernigan, 1990; Covell, 1992). Most of this work has been concerned with sequence alignment and homology modeling, i.e. identifying the native structure from the set of structures in the Protein Data Bank (PBD: Bernstein et al., 1977), given the sequence. Again, however, the restriction to PDB structures is qualitatively inadequate if one wishes to understand how the native conformation is selected as compared with alternatives that are actually suitable to the sequence. To begin with, a realistic treatment of excluded volume constraints is required and most of the database potentials in the literature simply ignore such constraints, as PDB structures have them built in automatically. Furthermore, the quality of these reduced model potentials is even more of an issue than the molecular mechanics potentials which at least have a performance record that can be evaluated for small molecules. In our earlier work on computer modeling of protein folding (Monge et al., 1994; Gunn et al., 1994), we have used a reduced model of the protein where we fix secondary structure and employ a simple potential based on a statistical analysis of PDB structures. The idea of fixing secondary structure was proposed in a number of previous works (Ptitsyn & Rashin, 1975; Warshel & Levitt, 1976; Cohen et al., 1979), but while promising results were reported, the methods used have not been of general applicability. Our efforts have been toward developing a genuinely automated algorithm to fold proteins of arbitrary complexity using the secondary structure as a starting point. Such an algorithm could be applied to proteins of unknown structure when combined with NMR spectroscopy. In fact, the NMR method can provide a precise characterization of the protein secondary structure at an early stage of a structure determination and quite independently of the complete structure calculation (Wüthrich et al., 1984, 1991; Wishart et al., 1992). In this paper, we combine an extensive set of reduced model simulations, using a potential with a primitive but reasonably effective set of excluded volume constraints, with molecular modeling calculations using the AMBER* force field (Weiner et al., 1984; McDonald & Still, 1992) for the protein and the generalized Born (GB) continuum solvent model of Still and co-workers (Still et al., 1990) to represent the aqueous environment. Large numbers of energetically competitive reduced model structures are generated, as described in previous papers (Monge et al., 1994; Gunn et al., 1994), for three proteins: myoglobin, an eight -helix protein (PDB code 1MBO); the C-terminal fragment of the L7/L12 ribosomal protein, a small mixed / protein (PDB code 1CTF); and the amino-terminal domain of phage 434 repressor, a small helical protein (PDB code 1R69). A subset of structures are then selected for further study: side-chains are added via the rotamer simulated annealing (RSA) program of Shenkin and co-workers (Farid et al., 1992), and minimization and simulated annealing runs are carried out with the AMBER*/GB potential using the MacroModel/BatchMin molecular modeling program (Mohamadi et al., 1990). Recently, Vieth et al. (1994) have studied the GCN4 leucine zipper (a dimer of two helices each containing 33 residues) using a hierarchical approach similar to ours, where a lattice model is used first and then all-atom structures are generated. They report very good agreement with the crystal structure and their results appear to be promising. However, a validation of their method must come in the context of larger and more complex proteins. Our results on three proteins allow us, for the first time, to systematically investigate the major issues described above. Can a molecular mechanics potential with a continuum solvent pick out the native structure when compared with a set of genuinely competitive alternatives, as opposed to structures of a completely different protein? What sort of energy gap is there (if any) between the native and other structures, and which terms in the potential contribute to it most substantially? How good is the correlation between reduced model and molecular mechanics potential for the total energy and for each of the component parts of the energy? While the conclusions that emerge from this investigation are in accordance with several previous speculations, the systematic trends, which can readily be observed in all three test proteins, are quite striking. The results suggest a strategy for constructing a significantly improved reduced model potential by identifying critical flaws in the potentials that have been produced to date. Work in this direction is currently in progress and an initial preliminary result is presented. This paper is organized into four sections. In section two, we review our reduced model and associated computational algorithms, and present new results for CTF and R69 (results for MBO can be found in Gunn et al., 1994). CTF is the first -strand containing protein that we have studied; the results

3 997 are quite satisfactory with the addition of a hydrogen-bonding potential to generate strand pairing (we do not otherwise bias how the strands pair). R69 is the first protein that we have studied for which our current model potential is grossly inadequate. As is often the case, one can learn as much or more from failure as from success; here, the difficulties in the potential, which are to some extent reflected even in the AMBER*/GB model, provide a key to understanding significant problems with the underlying physics of the approximate model. Section three describes the AMBER*/GB molecular mechanics calculations, including technical details of the simulations, statistical summaries of the results and discussion of the implications of these results for protein folding. Finally, section four contains conclusions and directions for future work. The Reduced Model Overview In our studies of protein folding we use a hierarchical approach in which we represent the polypeptide chain at different levels of detail. The crudest level consists of cylinders connected by spheres. The cylinders contain either -helices or -strands and the spheres enclose loop regions. The next level of detail incorporates explicitly the backbone atoms and represents side-chains as spheres centered at the -carbon atomic positions. The use of ideal geometries for helices and strands and of precalculated loop lists for loop segments results in a one-to-one correspondence between the two levels; in particular, each sphere at the coarse level corresponds to a possible loop at the more detailed level. Fixing the protein secondary structure provides a simplification of the problem and can be viewed as a computational technique with no implications for protein folding kinetics. In practice, secondary structure might be specified from NMR experimental data, as suggested by Wüthrich et al. (1991). In this section we describe the two levels of representation which define our reduced model. This model was introduced in our earlier work on myoglobin (Gunn et al., 1994). Here we emphasize recent algorithmic developments and extensions of the model to include -strands, and report new results for CTF and R69. Model representation and algorithms The geometric representation of the molecule is based on the assignment of each residue to one of 18 possible states which specify the backbone dihedral angles and with all other internal coordinates assuming standard values. These states are chosen to span the allowed regions of the Ramachandran map, but are otherwise not weighted according to local energy and are not residuespecific. This is to allow maximum flexibility for the loops by eliminating only impossible conformations. Each segment of repeated dihedral angle state ( -helices or -strands) can be represented by a cylinder described by the axis and the radial vectors of the terminal residues. Each loop can be represented by a vector connecting the end-points of adjacent cylinders, with its geometry specified by the internal coordinates (angles and dihedral angles) formed with the axes and radii of the cylinders. The first level of representation (cylinders and spheres) thus consists of the segmented chain formed by the cylinder axes and radii with the connecting loops. The second level consists of the sequence of dihedral angle states which uniquely specifies the positions of all C and C atoms. Trial moves are carried out by replacing the loop segments with new values selected from a pre-calculated list. The remainder of the structure can be pivoted into the new conformation simply by making use of the stored values of the internal coordinates corresponding to each loop. This allows for very fast construction and evaluation of trial structures using the cylinder-sphere representation. The more detailed representation of the molecule can then be constructed at periodic intervals by using the sequence of dihedral angle states which were used to construct each loop and are stored along with the corresponding geometry in the loop list. The entire chain can thus be rebuilt from the sequence of dihedral angles when required. The minimization procedure consists of an inner loop of trial moves in which the loops are randomly replaced by loops from the loop list. These structures are checked for self-avoidance and rejected if any of the secondary structure elements, modeled as hard cylinders and spheres, are closer than a minimum allowed distance. This effective radius is a parameter which describes an impenetrable core, but which does not enclose all atoms. Rejection at this level rules out grossly self-overlapping structures. After a number of iterations the resulting structure is used as the trial move for evaluation with the complete residue residue potential function. In this way, the more expensive potential, which is the quantity to be minimized by simulated annealing, is only evaluated for structures which have been selected by what is effectively a short minimization of a simpler model. The structures at this level are checked for self-avoidance with a cutoff distance for each pair of C or C atoms. The minimum distances for each possible pair of residues is determined by taking the shortest distances observed in a survey of the PDB. For each overlap in the structure a constant penalty is added to the total energy used to accept or reject the structure. If the structure is rejected, the previously accepted structure is used for the next cycle of the inner loop. The success of the algorithm depends significantly on the choice of the overlap penalty. If it is too high, at the start, most trial moves would be rejected, since the cylinder-sphere potential does not completely prevent atomic overlaps from occurring. This is because hard spheres and cylinders, which eliminate possible overlaps by enclosing all C and C atoms,

4 998 would grossly over-estimate the excluded volume and prevent the formation of compact structures. However, since there is a hard core in the simple model which does prevent impossible folding topologies, most overlaps can be alleviated with relatively small changes in the structure. This is achieved by gradually increasing the value of the penalty during the simulation so that those structures with fewer overlaps are progressively selected out. This method generates final structures with all overlaps removed without a significant increase in the energy. The minimization is carried out simultaneously for a large number of structures and includes periodic implementation of a genetic algorithm. In this step, a loop is chosen as a splice point and a number of hybrids are created by taking parts of different structures and connecting them together at the splice point with a new loop selected from the loop list. Each hybrid undergoes a minimization cycle using the simple model to select a reasonable loop to connect the two parts. For each parent, defined as the structure contributing the larger segment, the lowest energy hybrid is selected and used as a new trial move for the complete potential, following the procedure described above for the mutation steps. Further refinement is carried out by selecting the lowest energy structures in the ensemble, replicating them, and continuing the simulation. The potential function used in these simulations is based on a statistical analysis of the PDB (Casari & Sippl, 1992). Only pairs of residues far apart in the sequence are considered, so that the potential does not depend on the local geometry, but rather represents the overall packing of hydrophobic and hydrophilic residues and the formation of a hydrophobic core. In addition, the potential is long-ranged so that it can be used to evaluate non-compact structures. The potential has the form E = N (h i + h j + 2h 0 ) r i r j (1) i j 20 where the coefficients h i correspond to the relative hydrophobicities of the residues and h 0 is a net hydrophobicity of the molecule, which provides a driving force for compactness. For use with the cylinder-sphere representation, the potential for a pair of secondary structure segments can be expanded around the center center distance. The first two terms of this expansion can be interpreted as a net hydrophobicity interaction and a hydrophobic dipole interaction. This approximation is sufficiently accurate to provide a useful estimate of the total energy in the inner loop of the minimization. To improve the performance of the potential in differentiating similar compact structures, the all-residue potential also contains a contact term which consists of a residue dependent constant energy for each pair of C atoms within a cutoff distance (Maiorov & Crippen, 1992). This contact potential is added to the hydrophobic potential with a coefficient that is treated as an adjustable parameter. This allows the contact term to be smoothly turned on during the simulation. Since this term provides an additional driving force towards compactness, the net hydrophobicity h 0 is also reduced during the simulation. This parameter annealing combined with the increasing overlap penalty discussed above, allows the potential to become less smooth and more rugged during minimization, along with the usual lowering of the effective temperature. It should be remarked that the same functional form of the potential is used for different proteins. The net hydrophobicity h 0 and the weighting for the contact potential depend solely on the sequence, and the parameter annealing can be regarded as a computational artifice used to achieve an efficacious minimization. The above potential proved to be rather poor at generating structures with the correct pairing of -strands. The strands tended to clump together in bundles much like helices, rather than maintain a parallel arrangement. In order to describe the inter-strand hydrogen bonding, an additional term was added to the potential designed to mimic an attraction between backbone O and H atoms with an appropriate geometry. It would be very computationally expensive to consider interactions among all pairs of atoms in a pair of residues, so only a very simple strand strand potential was considered. Since the potential in this case involves short-range interactions between long extended segments, it is impossible to systematically approximate an allatom function with an effective center center interaction for use with the cylinders as in the case of the hydrophobic potential. Instead, an ad hoc function was constructed which provides a crude approximation of a hydrogen-bonding potential for many relative orientations of two strands, with the antiparallel configuration being favored. This has the form E (log R ij (R ij L i) 2 (R ij L j) 2 2.4) (1 2(L i L j) 5 )/R ij (2) where R ij is the center center vector and the L i are the axial vectors of the cylinders. This potential was found to provide an improved correlation between energy and r.m.s. deviation for the strand pairing, and therefore was subsequently used for both the cylinder-sphere and all-residue representations of the molecule. It should be emphasized that the form of this function is essentially arbitrary and is intended only to generate the simplest features of strand pairing. This is clearly only a first step towards developing a realistic hydrogen-bonding potential. In order to compensate for the increased attraction of the strands towards one another, the contact potential was set to zero for the strand residues. Note that the additional strand strand potential does not in any way specify the way in which strands must combine to form a given -sheet, but simply requires nearby strands to assume an antiparallel configuration.

5 1000 Figure 3. Distribution of reduced model structures plotted with r.m.s. deviation versus total energy for CTF. Structures considered in the all-atom analysis are identified by their ID number. Figure 5. Distribution of reduced model structures plotted with r.m.s. deviation versus total energy for R69. Structures considered in the all-atom analysis are identified by their ID number. of the native structure, with the exception of the details of the strand pairing, which requires additional refinement with a more detailed model to adequately represent. This structure is shown superimposed on the native in Figure 4. Both for myoglobin and for CTF, the results of the simulations indicate that for our reduced model representation with secondary structure fixed the number of potential minima is drastically reduced and the native-like topology is one of a small number of distinct low-energy conformations. Since the description of the protein chain is very coarse, it is not surprising that we find misfolded structures energetically competitive with the native one. The final example discussed here is R69 (60 residues, five helices), for which the results are shown in Figure 5. Although there are a few structures generated with relatively low r.m.s. deviation from the native, they are no lower in energy than the average for the distribution. More importantly, the native structure itself exhibits a very high energy relative to misfolded ones. This implies that while further annealing of the structures shown might be expected to lower the energy of the ensemble, there is no reason to expect lower-energy structures to be any more native-like, and in fact the lowest r.m.s. deviation of the ensemble may well increase. This is the first case we have studied where the current potential function appears to make significant errors in distinguishing the native fold from reasonable compact alternatives encountered in the simulation. It thus provides an important test for the further understanding of the potential and the evaluation of alternatives. Figure 4. Superimposition of C worms for the native (yellow) and calculated (blue) structures of CTF. The r.m.s. deviation between the two structures is 5.0 Å.

6 1001 The Detailed Model Overview Structures generated with the reduced model of the previous section can be further analyzed by introducing another level in the hierarchical framework. The detailed model is a standard united atom representation of the protein molecule which can then be simulated employing traditional molecular mechanics force fields. This model serves a twofold purpose: a detailed all-atom representation is clearly required if accurate structures at the 1 to 2 Å resolution level are to be obtained; furthermore, detailed analysis of competitive reduced model structures should help understand the strengths and limitations of the simplified potentials. Another important issue that can be addressed in the context of detailed model simulations is the quality of molecular mechanics potentials and of continuum solvent treatments. Our reduced model is capable of generating diverse protein conformations that are competitive alternatives to the native. Analysis of the energetics of these structures and comparison with the native will allow critical evaluation of the force field. We first present the representation of the detailed all-atom model, the procedure used to map a minimized reduced structure onto an all-atom one, and the potential employed in the simulations. We then describe in detail the calculations for the detailed model, which were carried out using the MacroModel/BatchMin modeling package (Mohamadi et al., 1990). Finally, we report the results for our three test proteins. Model representation and algorithms All-atom structures are generated from the reduced model main-chain fold by adding explicit side-chains. This is a well known and difficult problem (Janin et al., 1978; Lee & Subbiah, 1991; Desmet et al., 1992), whose complexity is due to the astronomical number of possible structural permutations. For our purposes it is not actually necessary to achieve an accurate prediction of side-chain conformations, but rather some reasonable initial guess that can then be manipulated via minimization and/or simulated annealing. This is even truer in view of the fact that one would like to let the main-chain relax along with the side-chains. To generate detailed atomic models we used the RSA program developed by Peter Shenkin and co-workers (Farid et al., 1992). Reduced model structures are initially dressed with planar sidechains and random 1 torsion angles. Side-chain rotamer space is then explored with the RSA code which uses a Monte Carlo algorithm to sample from a rotamer library. A simulated annealing scheme is used to minimize bumps between side-chains. The RSA code produces all-atom structures which might still have bad contacts. This could be due to inadequacies in the optimization procedure and/or to the restriction to rotamers in describing side-chain conformations. In the reduced model, each residue and dihedral angles are not restricted to any particular region of the Ramachandran map, i.e. each residue can sample the same discrete set of main-chain dihedral angles, regardless of amino-acid type. This choice is motivated by the fact that in this way the reduced model is more flexible, and consequently capable of traversing configuration space more efficaciously. However, the all-atom structure modeling of proline residues requires the angle to be nearly fixed. In the present studies we have treated prolines as alanines (there are four proline residues in myoglobin, one in CTF and two in R69), judging that even at this stage of refinement added flexibility for these residues could be beneficial. Terminal loops are not modeled in the reduced structures. In order to avoid spurious interactions that might derive from charged and/or polar ends, we capped the all-atom structures with an acetyl group at the N terminus and N-methyl amide at the C terminus. These two groups are modeled from the two C atoms at either end of the reduced representation. Calculations for the detailed model are carried out using the AMBER* force field, the original AMBER potential of Kollman and co-workers (Weiner et al., 1984) with additional parameters for organic functionality (McDonald & Still, 1992). A united-atom scheme was used, whereby hydrogen atoms are explicitly considered only for polar atoms. Solvent was treated with the GB/SA continuum solvation model (Still et al., 1990). This model is based on a continuum dielectric for solvent polarization and solvent-accessible surface area treatment of the cavity and van der Waals solvation components. Structures generated with the RSA scheme are subjected to minimization using the conjugate gradient method in MacroModel/BatchMin. Minimization is carried out including solvation and employing analytical approximation of surface areas. To speed up the calculation we also employ cutoffs for non-bonded interactions: 7 Å for van der Waals interactions and 12 Å for electrostatic interactions. With these cutoffs, 27%, 65% and 70% of the non-bonded pair interactions are included in the calculation for myoglobin, CTF and R69, respectively. A convergence criterion is set by requiring that the gradient be less or equal to 0.05 kj mol 1 Å 1. In practice we have run minimization with a prefixed maximum number of iteration of 10,000, achieving convergence in most cases (typically after 7000 iterations for myoglobin, 3500 iterations for CTF and 4000 iterations for R69). The energy for the final structures is evaluated with essentially infinite cutoffs (i.e. cutoffs for which no significant change in the energy is observed by increasing them) and by using accurate numerical areas for solvation. The computational cost of conjugate gradient minimization is substantial; we ran our calculations on IBM

7 1002 RS/ and 370 workstations with an average CPU time of 13.5 hours for myoglobin, two hours for CTF and 2.5 hours for R69. Nonetheless, these calculations are orders of magnitude less expensive than calculations involving explicit solvent, where hundreds or thousands of discrete solvent molecules are used to model solvent effects. A subset of the minimized structures was further optimized using simulated annealing. Conformational sampling is performed by means of stochastic dynamics using the SHAKE protocol to constrain bonds to hydrogen atoms and a 1.5 fs time step. An initial equilibration is carried out at 300 K for 10 ps. Typically, the temperature is then lowered to 50 K in 40 ps with a linear cooling schedule. This is followed by a 3000 iteration conjugated gradient minimization. We experimented with different cooling rates and higher initial temperatures for CTF, and found that the annealing protocol described above produced structures with the lowest energy. The results of this analysis are described below in Simulated annealing, of section three, where we also discuss convergence for our simulated annealing procedure. In the following, when we refer to simulated annealing calculations, we always mean the combination of equilibration, annealing and minimization described above. The CPU time for simulated annealing runs is determined by the time required for each step of stochastic dynamics (3, 0.9, and 1 seconds for MBO, CTF, and R69, respectively) and the time of each conjugate gradient minimization iteration (7, 2, and 2.3 seconds for MBO, CTF, and R69, respectively). Results Overview The procedures described above were used to produce a large number of energetically plausible but substantially different conformations of the three proteins considered in this study. While the complexity of the potential energy functions and the procedure itself make the analysis of the results nontrivial, it is still possible to ask and provide at least preliminary answers to a number of important questions. These are as follows: (1) What is the performance of the AMBER*/GB potential in ranking the native structure as compared to the alternatives that we have generated? Can anything be inferred about the strengths or weaknesses of the molecular mechanics force field and solvation model used? (2) What are the uncertainties at each step of the dressing process (addition of side-chains, conjugate gradient minimization, simulated annealing), i.e. what sort of energetic and geometrical variations are obtained in the final structure if one starts from a given reduced model structure and repeats the dressing procedure many times? (3) Are there systematic differences in any of the components of the energy for the native structure as compared to alternative structures? Can anything be inferred about the driving forces from protein folding from such systematic behavior? (4) How well does the reduced model potential correlate with the AMBER*/GB potential? Are particular terms in either potential good predictors of native-like structures? (5) Can a better reduced model potential (i.e. one capable of higher resolution and reliability) be designed after the above analysis is completed? In addition to presenting the raw data in various schematic forms, we shall attempt to address each of these questions. The computations presented here are rather expensive, so the procedures described in Model representation and algorithms, of section three, were not applied to all the structures generated with the reduced model. Instead, we have tried to mix a broad survey of many structures with an in-depth analysis of the computational procedures for a small subset of these structures. In analyzing in detail the components of the AMBER*/GB potential energy, we focus on the van der Waals, electrostatic Coulombic, electrostatic solvation and surface area terms. The remaining terms (stretches, bends and torsions) are critical in maintaining the connectivity of the protein and its local geometry, but exhibit very small differences from structure to structure and hence are unimportant, at least at the level of resolution examined here, in the ranking of conformations. Detailed model data sets Detailed model calculations were performed on sets of structures selected from the reduced model distributions (Figures 1, 3 and 5) and on the corresponding X-ray structures. We typically selected structures with low reduced energy or with low r.m.s. deviation from the native, but we have also analyzed structures in the middle of the distribution or with high energy and high r.m.s. deviation. For myoglobin we have also studied two extended structures that do not appear in the reduced model distribution; these were obtained by running the reduced model code for just a few steps so that each of the loop regions, initially in helical conformation, is assigned a loop from the loop list. The results of conjugate gradient minimization and simulated annealing calculations for MBO, CTF and R69 are presented in Table 1. Each structure is identified by a code corresponding to its sequential position in the reduced model distribution (which contains 1024 not necessarily distinct structures). The r.m.s. deviations reported in the Table are based on the positions of the -carbon atoms only and are relative to the minimized native structure. For consistency with the generated structures, the native structure was stripped of the loops at each end and capped as described in Model representations and algorithms of section three. The prosthetic heme group of myoglobin was neglected in the calculations. Figure 6 plots the reduced model energy versus the total AMBER*/GB energy for the minimized MBO structures. A triangular distribution is observed,

8 1003 indicating that structures with high reduced energy also have high AMBER*/GB energy while structures with low reduced energy do not necessarily have a good AMBER*/GB energy. The r.m.s. deviation from the minimized native structure as function of the total AMBER*/GB energy is plotted for MBO in Figure 7. A gap is present between native and generated structures (a feature that is absent in the distribution for the reduced potential). We observe that the lowest-energy structures in the high-r.m.s. and low-r.m.s. clusters have comparable energies. Analysis of the different energetic components listed in Table 1 reveals that a correlation exists between surface area and van der Waals energies and between electrostatic Coulombic and solvation energies. This is shown in Figures 8 and 9, respectively, for the MBO structures of Table 1A. The sum of the internal electrostatic energy and of the electrostatic solvation energy has a much lower variance overstructure (hundreds of kj/mol) than the individual components, which vary by thousands of kj/mol. The r.m.s. deviation versus the total electrostatic energy is plotted in Figure 14 for all of the structures of Table 1A. Side-chain addition Our most detailed study of variability in the side-chain addition procedure has been carried out for the native backbone conformation of CTF. The RSA procedure of Shenkin and co-workers was run 52 times, followed by conjugate gradient minimiz- Table 1 Minimization and simulated annealing results for MBO, CTF and R69 ID IRG r.m.s. RG SA vdw ESC ESS TES E A. MBO SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA E E B. CTF SA SA SA SA continued overleaf

9 1004 Table 1 (continued) ID IRG r.m.s. RG SA vdw ESC ESS TES E SA SA SA SA SA SA SA SA C. R SA SA SA SA SA Uncapped structure. ID is an identifier for the structure, IRG is the radius of gyration of the starting structure in Å, r.m.s. is the C r.m.s. deviation from the native in Å, RG is the final radius of gyration in Å, SA is the surface area energy, vdw is the van der Waals energy, ESC is the electrostatic Coulombic energy, ESS is the electrostatic solvation energy, TES is the total electrostatic energy and E is the total energy. All energies are in kj/mol. Each structure s ID consists of its sequential number as obtained from the reduced model distributions (compare Figures 1, 3, and 5), 0 being used for the native, a second number (1, 2,...) if different side-chain additions were considered, and the suffix SA if the structure was the result of a simulated annealing run (different simulated annealing runs with the same initial structure are indexed SA-1, SA-2,...). E1 and E2 refer to the two MBO extended structures (see text for details). ation. The average C r.m.s. deviation of all runs from the native structure is 0.65 Å and the average all-atom r.m.s. is 1.65 Å; this is competitive with results from other procedures reported in the literature for side-chain addition (Lee & Subbiah, 1991). With regard to packing (as measured by the van der Waals energy), our side-chain addition method does rather well; indeed, there is little variation in this component over the 52 runs (the average difference for the van der Waals energy between the generated side-chains and the native is kj/mol, with a standard deviation of 2.32 kj/mol). For electrostatics (Coulombic plus solvation), the average difference of all runs from the native is kj/ mol, with a standard deviation of 5.37 kj/mol. This is not surprising because electrostatics and solvation are not included in the approximate side-chain potential function used in the RSA procedure. Such potentials can be added to the side-chain dressing procedure and work along these lines is currently in progress. Although the relative difference of both van der Waals and total electrostatic energies of all runs with respect to those of the native is comparable (of the order of 1 to 2%), it is the actual value of the energy that matters in the ranking of structures. The overall energetic gap observed over the 52 runs between the native and the generated structures is almost entirely due to the total electrostatic component. Further evidence of this fact is that some of the generated structures show van der Waals energies lower than the native, while this is never observed for the electrostatic energy. In addition to this extensive study employing the native backbone conformation of CTF, we have also carried out a number of experiments where the same

10 1005 Figure 6. Plot of reduced model energy versus AMBER*/GB energy for minimized MBO structures. backbone conformation generated with the reduced model was dressed to give different initial all-atom structures. Examples of this analysis are reported in Table 1 for myoglobin structures 18, 994, 51, and 475, and CTF structures 249 and 579. The above results suggest that electrostatic stabilization of the generated structures may be underestimated by the side-chain addition/conjugated gradient minimization procedure as compared to the native structure. Again, this is no surprise, considering the overlap-only nature of the RSA method. The ability of simulated annealing to close this gap is addressed in the next subsection. Simulated annealing In view of the extreme jaggedness of the potential energy surface in the all-atom model, conjugate gradient minimization is not the most satisfactory energy optimization technique. Minimized structures might be surrounded by structures with lower energy which even relatively small barriers can make inaccessible when a conjugate gradient method is used. To investigate this scenario we Figure 8. Plot of surface area energy versus van der Waals energy for all MBO structures in Table 1A. In this Figure, as well as in the following ones, squares label structures obtained via minimization and triangles label structures obtained via minimization followed by simulated annealing. subjected some of the minimized structures to simulated annealing. The effects of the simulated annealing procedure (described in Model representation and algorithms of section three) are presented in Figure 10, which plot the C r.m.s. deviation of the starting structure from the native versus the r.m.s. deviation of the final structure from the starting one and versus the change in total energy, respectively, for MBO. Although non-native structures tended to move by 2 Å r.m.s. in the course of simulated annealing, this procedure altered the r.m.s deviation to the native structure by only about 0.5 Å. To ensure that our simulated annealing scheme is appropriate for study of medium to large molecules, we used CTF as an example to investigate the choice of parameters (cooling rate, initial temperature, random seed selection) on the final energy. All runs start from the same minimized solvated native structure and in all cases simulated annealing is Figure 7. Plot of r.m.s. deviation versus AMBER*/GB energy for minimized MBO structures. Figure 9. Plot of electrostatic Coulombic energy versus electrostatic solvation energy for all MBO structures in Table 1A.

11 1006 (a) (b) 600 K yield final energies about 500 kj/mol higher than energies obtained via conjugate gradient minimization, indicating that too high initial temperatures lead to other less stable local energy minima. On the other hand, runs starting from lower temperatures (200, 300 and 400 K) give comparable final energies. How random seed selection affects the final energy was investigated using both the native and one of the dressed structures (249). For both structures we performed ten runs, each one with a different seed initialization. For the native we obtained a mean energy of kj/mol with a standard deviation of kj/mol, and for the 249 structure a mean energy of kj/mol with a standard deviation of kj/mol. Based on these results, we expect energies obtained via simulated annealing to be accurate only up to approximately 15 kj/mol. Total energies Figure 10. Conformational and energetic effects of simulated annealing for MBO structures: plots of r.m.s. deviation of starting structure from native MBO versus r.m.s. deviation of final annealed structure from starting one (a) and versus change in total energy upon simulated annealing (b). preceded by a 10 ps equilibration at 300 K and is followed by 3000 conjugate gradient minimization steps. We tried five different linear cooling rates from 300 to 50 K. The final energies are reported in Table 2, which shows that the largest energy difference observed is 85 kj/mol. Although most of our simulations were performed with an initial temperature of 300 K, we also studied the effect of varying the initial temperature. Experiments starting from Table 2 Final energies of native CTF for five simulated annealing runs with different cooling rates Length Rate Final energy Run (ps) (K/step) (kj/mol) In all runs the initial and final temperatures were 300 K and 50 K, respectively. How well does the AMBER*/GB potential do at discriminating the native structure from misfolded structures generated by our reduced model simulations? The r.m.s. deviation from the native versus the total AMBER*/GB energy for the structures presented in Table 1 is plotted in Figure 11, where different symbols are used for minimized structures and for minimized structures followed by simulated annealing. First, it is noteworthy that the minimized native structure does not have a lower energy than alternative structures subjected to simulated annealing. The conformational adjustments generated by simulated annealing are responsible for a significant lowering of the energy. After simulated annealing, the native structure is lowest for myoglobin (by 40 kj/mol) and for CTF (by 140 kj/mol). These energy gaps are not unreasonable estimates for the stabilization of the native structure as compared to plausible compact alternatives. For R69, some misfolded structures are very close in energy to the native even after simulated annealing has been carried out. The conclusion is therefore that the AMBER*/GB potential has a reasonably good performance (certainly much better than the present reduced model potential), but that improvements are necessary to render it completely reliable for an arbitrary protein. With regard to the remaining structures, nativelike structures (e.g. 6 Å r.m.s. structures for myoglobin) do not have energies that are substantially better on average than non-native basins, in particular the basin centered around 12.5 Å r.m.s. deviation. We defer a discussion of the implications of this observation until the various components of the energies have been analyzed in detail. On the other hand, structures with very poor score from the reduced model potential typically have substantially higher energies with the AMBER*/GB potential; the correlation between the two potentials is plotted in

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Ruhong Zhou 1 and Bruce J. Berne 2 1 IBM Thomas J. Watson Research Center; and 2 Department of Chemistry,

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure

More information

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Protein Dynamics The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Below is myoglobin hydrated with 350 water molecules. Only a small

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

Protein Structures. 11/19/2002 Lecture 24 1

Protein Structures. 11/19/2002 Lecture 24 1 Protein Structures 11/19/2002 Lecture 24 1 All 3 figures are cartoons of an amino acid residue. 11/19/2002 Lecture 24 2 Peptide bonds in chains of residues 11/19/2002 Lecture 24 3 Angles φ and ψ in the

More information

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC The precise definition of a dihedral or torsion angle can be found in spatial geometry Angle between to planes Dihedral

More information

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation Jakob P. Ulmschneider and William L. Jorgensen J.A.C.S. 2004, 126, 1849-1857 Presented by Laura L. Thomas and

More information

Introduction to" Protein Structure

Introduction to Protein Structure Introduction to" Protein Structure Function, evolution & experimental methods Thomas Blicher, Center for Biological Sequence Analysis Learning Objectives Outline the basic levels of protein structure.

More information

The protein folding problem consists of two parts:

The protein folding problem consists of two parts: Energetics and kinetics of protein folding The protein folding problem consists of two parts: 1)Creating a stable, well-defined structure that is significantly more stable than all other possible structures.

More information

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES Protein Structure W. M. Grogan, Ph.D. OBJECTIVES 1. Describe the structure and characteristic properties of typical proteins. 2. List and describe the four levels of structure found in proteins. 3. Relate

More information

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015, Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015, Course,Informa5on, BIOC%530% GraduateAlevel,discussion,of,the,structure,,func5on,,and,chemistry,of,proteins,and, nucleic,acids,,control,of,enzyma5c,reac5ons.,please,see,the,course,syllabus,and,

More information

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig part of Bioinformatik von RNA- und Proteinstrukturen Computational EvoDevo University Leipzig Leipzig, SS 2011 Protein Structure levels or organization Primary structure: sequence of amino acids (from

More information

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein

More information

Unfolding CspB by means of biased molecular dynamics

Unfolding CspB by means of biased molecular dynamics Chapter 4 Unfolding CspB by means of biased molecular dynamics 4.1 Introduction Understanding the mechanism of protein folding has been a major challenge for the last twenty years, as pointed out in the

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Useful background reading

Useful background reading Overview of lecture * General comment on peptide bond * Discussion of backbone dihedral angles * Discussion of Ramachandran plots * Description of helix types. * Description of structures * NMR patterns

More information

Peptide folding in non-aqueous environments investigated with molecular dynamics simulations Soto Becerra, Patricia

Peptide folding in non-aqueous environments investigated with molecular dynamics simulations Soto Becerra, Patricia University of Groningen Peptide folding in non-aqueous environments investigated with molecular dynamics simulations Soto Becerra, Patricia IMPORTANT NOTE: You are advised to consult the publisher's version

More information

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition Sequence identity Structural similarity Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Fold recognition Sommersemester 2009 Peter Güntert Structural similarity X Sequence identity Non-uniform

More information

Prediction and refinement of NMR structures from sparse experimental data

Prediction and refinement of NMR structures from sparse experimental data Prediction and refinement of NMR structures from sparse experimental data Jeff Skolnick Director Center for the Study of Systems Biology School of Biology Georgia Institute of Technology Overview of talk

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling

More information

Protein Structure Prediction

Protein Structure Prediction Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on

More information

Ab-initio protein structure prediction

Ab-initio protein structure prediction Ab-initio protein structure prediction Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center, Cornell University Ithaca, NY USA Methods for predicting protein structure 1. Homology

More information

Assignment 2 Atomic-Level Molecular Modeling

Assignment 2 Atomic-Level Molecular Modeling Assignment 2 Atomic-Level Molecular Modeling CS/BIOE/CME/BIOPHYS/BIOMEDIN 279 Due: November 3, 2016 at 3:00 PM The goal of this assignment is to understand the biological and computational aspects of macromolecular

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Basics of protein structure

Basics of protein structure Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu

More information

Building 3D models of proteins

Building 3D models of proteins Building 3D models of proteins Why make a structural model for your protein? The structure can provide clues to the function through structural similarity with other proteins With a structure it is easier

More information

arxiv:cond-mat/ v1 2 Feb 94

arxiv:cond-mat/ v1 2 Feb 94 cond-mat/9402010 Properties and Origins of Protein Secondary Structure Nicholas D. Socci (1), William S. Bialek (2), and José Nelson Onuchic (1) (1) Department of Physics, University of California at San

More information

Introduction to Computational Structural Biology

Introduction to Computational Structural Biology Introduction to Computational Structural Biology Part I 1. Introduction The disciplinary character of Computational Structural Biology The mathematical background required and the topics covered Bibliography

More information

Molecular Mechanics, Dynamics & Docking

Molecular Mechanics, Dynamics & Docking Molecular Mechanics, Dynamics & Docking Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine Larry.Hunter@uchsc.edu http://compbio.uchsc.edu/hunter

More information

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions: Van der Waals Interactions

More information

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror Molecular dynamics simulation CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror 1 Outline Molecular dynamics (MD): The basic idea Equations of motion Key properties of MD simulations Sample applications

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

Exercise 2: Solvating the Structure Before you continue, follow these steps: Setting up Periodic Boundary Conditions

Exercise 2: Solvating the Structure Before you continue, follow these steps: Setting up Periodic Boundary Conditions Exercise 2: Solvating the Structure HyperChem lets you place a molecular system in a periodic box of water molecules to simulate behavior in aqueous solution, as in a biological system. In this exercise,

More information

arxiv: v1 [cond-mat.soft] 22 Oct 2007

arxiv: v1 [cond-mat.soft] 22 Oct 2007 Conformational Transitions of Heteropolymers arxiv:0710.4095v1 [cond-mat.soft] 22 Oct 2007 Michael Bachmann and Wolfhard Janke Institut für Theoretische Physik, Universität Leipzig, Augustusplatz 10/11,

More information

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror Please interrupt if you have questions, and especially if you re confused! Assignment

More information

Computer simulations of protein folding with a small number of distance restraints

Computer simulations of protein folding with a small number of distance restraints Vol. 49 No. 3/2002 683 692 QUARTERLY Computer simulations of protein folding with a small number of distance restraints Andrzej Sikorski 1, Andrzej Kolinski 1,2 and Jeffrey Skolnick 2 1 Department of Chemistry,

More information

Lecture 11: Potential Energy Functions

Lecture 11: Potential Energy Functions Lecture 11: Potential Energy Functions Dr. Ronald M. Levy ronlevy@temple.edu Originally contributed by Lauren Wickstrom (2011) Microscopic/Macroscopic Connection The connection between microscopic interactions

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr

More information

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome

More information

Bioengineering 215. An Introduction to Molecular Dynamics for Biomolecules

Bioengineering 215. An Introduction to Molecular Dynamics for Biomolecules Bioengineering 215 An Introduction to Molecular Dynamics for Biomolecules David Parker May 18, 2007 ntroduction A principal tool to study biological molecules is molecular dynamics simulations (MD). MD

More information

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Biophysical Journal, Volume 98 Supporting Material Molecular dynamics simulations of anti-aggregation effect of ibuprofen Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Supplemental

More information

Free Energy Landscape of Protein Folding in Water: Explicit vs. Implicit Solvent

Free Energy Landscape of Protein Folding in Water: Explicit vs. Implicit Solvent PROTEINS: Structure, Function, and Genetics 53:148 161 (2003) Free Energy Landscape of Protein Folding in Water: Explicit vs. Implicit Solvent Ruhong Zhou* IBM T.J. Watson Research Center, Yorktown Heights,

More information

Protein Structure Prediction, Engineering & Design CHEM 430

Protein Structure Prediction, Engineering & Design CHEM 430 Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment

More information

DETECTING NATIVE PROTEIN FOLDS AMONG LARGE DECOY SETS WITH THE OPLS ALL-ATOM POTENTIAL AND THE SURFACE GENERALIZED BORN SOLVENT MODEL

DETECTING NATIVE PROTEIN FOLDS AMONG LARGE DECOY SETS WITH THE OPLS ALL-ATOM POTENTIAL AND THE SURFACE GENERALIZED BORN SOLVENT MODEL Computational Methods for Protein Folding: Advances in Chemical Physics, Volume 12. Edited by Richard A. Friesner. Series Editors: I. Prigogine and Stuart A. Rice. Copyright # 22 John Wiley & Sons, Inc.

More information

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS TASKQUARTERLYvol.20,No4,2016,pp.353 360 PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS MARTIN ZACHARIAS Physics Department T38, Technical University of Munich James-Franck-Str.

More information

NMR, X-ray Diffraction, Protein Structure, and RasMol

NMR, X-ray Diffraction, Protein Structure, and RasMol NMR, X-ray Diffraction, Protein Structure, and RasMol Introduction So far we have been mostly concerned with the proteins themselves. The techniques (NMR or X-ray diffraction) used to determine a structure

More information

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE Examples of Protein Modeling Protein Modeling Visualization Examination of an experimental structure to gain insight about a research question Dynamics To examine the dynamics of protein structures To

More information

= (-22) = +2kJ /mol

= (-22) = +2kJ /mol Lecture 8: Thermodynamics & Protein Stability Assigned reading in Campbell: Chapter 4.4-4.6 Key Terms: DG = -RT lnk eq = DH - TDS Transition Curve, Melting Curve, Tm DH calculation DS calculation van der

More information

Simulating Folding of Helical Proteins with Coarse Grained Models

Simulating Folding of Helical Proteins with Coarse Grained Models 366 Progress of Theoretical Physics Supplement No. 138, 2000 Simulating Folding of Helical Proteins with Coarse Grained Models Shoji Takada Department of Chemistry, Kobe University, Kobe 657-8501, Japan

More information

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions Van der Waals Interactions

More information

Computer Simulations of De Novo Designed Helical Proteins

Computer Simulations of De Novo Designed Helical Proteins 92 Biophysical Journal Volume 75 July 1998 92 105 Computer Simulations of De Novo Designed Helical Proteins Andrzej Sikorski,* Andrzej Kolinski,* # and Jeffrey Skolnick # *Department of Chemistry, University

More information

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur Lecture - 06 Protein Structure IV We complete our discussion on Protein Structures today. And just to recap

More information

Introduction The gramicidin A (ga) channel forms by head-to-head association of two monomers at their amino termini, one from each bilayer leaflet. Th

Introduction The gramicidin A (ga) channel forms by head-to-head association of two monomers at their amino termini, one from each bilayer leaflet. Th Abstract When conductive, gramicidin monomers are linked by six hydrogen bonds. To understand the details of dissociation and how the channel transits from a state with 6H bonds to ones with 4H bonds or

More information

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Supporting Information Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Christophe Schmitz, Robert Vernon, Gottfried Otting, David Baker and Thomas Huber Table S0. Biological Magnetic

More information

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed. Macromolecular Processes 20. Protein Folding Composed of 50 500 amino acids linked in 1D sequence by the polypeptide backbone The amino acid physical and chemical properties of the 20 amino acids dictate

More information

Conformational Searching using MacroModel and ConfGen. John Shelley Schrödinger Fellow

Conformational Searching using MacroModel and ConfGen. John Shelley Schrödinger Fellow Conformational Searching using MacroModel and ConfGen John Shelley Schrödinger Fellow Overview Types of conformational searching applications MacroModel s conformation generation procedure General features

More information

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding?

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding? The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation By Jun Shimada and Eugine Shaknovich Bill Hawse Dr. Bahar Elisa Sandvik and Mehrdad Safavian Outline Background on protein

More information

Discrimination of Near-Native Protein Structures From Misfolded Models by Empirical Free Energy Functions

Discrimination of Near-Native Protein Structures From Misfolded Models by Empirical Free Energy Functions PROTEINS: Structure, Function, and Genetics 41:518 534 (2000) Discrimination of Near-Native Protein Structures From Misfolded Models by Empirical Free Energy Functions David W. Gatchell, Sheldon Dennis,

More information

Computational protein design

Computational protein design Computational protein design There are astronomically large number of amino acid sequences that needs to be considered for a protein of moderate size e.g. if mutating 10 residues, 20^10 = 10 trillion sequences

More information

Docking. GBCB 5874: Problem Solving in GBCB

Docking. GBCB 5874: Problem Solving in GBCB Docking Benzamidine Docking to Trypsin Relationship to Drug Design Ligand-based design QSAR Pharmacophore modeling Can be done without 3-D structure of protein Receptor/Structure-based design Molecular

More information

Folding of small proteins using a single continuous potential

Folding of small proteins using a single continuous potential JOURNAL OF CHEMICAL PHYSICS VOLUME 120, NUMBER 17 1 MAY 2004 Folding of small proteins using a single continuous potential Seung-Yeon Kim School of Computational Sciences, Korea Institute for Advanced

More information

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Zhong Chen Dept. of Biochemistry and Molecular Biology University of Georgia, Athens, GA 30602 Email: zc@csbl.bmb.uga.edu

More information

3. Solutions W = N!/(N A!N B!) (3.1) Using Stirling s approximation ln(n!) = NlnN N: ΔS mix = k (N A lnn + N B lnn N A lnn A N B lnn B ) (3.

3. Solutions W = N!/(N A!N B!) (3.1) Using Stirling s approximation ln(n!) = NlnN N: ΔS mix = k (N A lnn + N B lnn N A lnn A N B lnn B ) (3. 3. Solutions Many biological processes occur between molecules in aqueous solution. In addition, many protein and nucleic acid molecules adopt three-dimensional structure ( fold ) in aqueous solution.

More information

Context of the project...3. What is protein design?...3. I The algorithms...3 A Dead-end elimination procedure...4. B Monte-Carlo simulation...

Context of the project...3. What is protein design?...3. I The algorithms...3 A Dead-end elimination procedure...4. B Monte-Carlo simulation... Laidebeure Stéphane Context of the project...3 What is protein design?...3 I The algorithms...3 A Dead-end elimination procedure...4 B Monte-Carlo simulation...5 II The model...6 A The molecular model...6

More information

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target. HOMOLOGY MODELING Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental

More information

Abstract. Introduction

Abstract. Introduction In silico protein design: the implementation of Dead-End Elimination algorithm CS 273 Spring 2005: Project Report Tyrone Anderson 2, Yu Bai1 3, and Caroline E. Moore-Kochlacs 2 1 Biophysics program, 2

More information

Course Notes: Topics in Computational. Structural Biology.

Course Notes: Topics in Computational. Structural Biology. Course Notes: Topics in Computational Structural Biology. Bruce R. Donald June, 2010 Copyright c 2012 Contents 11 Computational Protein Design 1 11.1 Introduction.........................................

More information

Why Proteins Fold? (Parts of this presentation are based on work of Ashok Kolaskar) CS490B: Introduction to Bioinformatics Mar.

Why Proteins Fold? (Parts of this presentation are based on work of Ashok Kolaskar) CS490B: Introduction to Bioinformatics Mar. Why Proteins Fold? (Parts of this presentation are based on work of Ashok Kolaskar) CS490B: Introduction to Bioinformatics Mar. 25, 2002 Molecular Dynamics: Introduction At physiological conditions, the

More information

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models Protein Modeling Generating, Evaluating and Refining Protein Homology Models Troy Wymore and Kristen Messinger Biomedical Initiatives Group Pittsburgh Supercomputing Center Homology Modeling of Proteins

More information

Applications of Molecular Dynamics

Applications of Molecular Dynamics June 4, 0 Molecular Modeling and Simulation Applications of Molecular Dynamics Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, The University of Tokyo Tohru

More information

Crystal Structure Prediction using CRYSTALG program

Crystal Structure Prediction using CRYSTALG program Crystal Structure Prediction using CRYSTALG program Yelena Arnautova Baker Laboratory of Chemistry and Chemical Biology, Cornell University Problem of crystal structure prediction: - theoretical importance

More information

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration:

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration: PTEI STUTUE ydrolysis of proteins with aqueous acid or base yields a mixture of free amino acids. Each type of protein yields a characteristic mixture of the ~ 20 amino acids. AMI AIDS Zwitterion (dipolar

More information

Homework Problem Set 4 Solutions

Homework Problem Set 4 Solutions Chemistry 380.37 Dr. Jean M. Standard omework Problem Set 4 Solutions 1. A conformation search is carried out on a system and four low energy stable conformers are obtained. Using the MMFF force field,

More information

Supersecondary Structures (structural motifs)

Supersecondary Structures (structural motifs) Supersecondary Structures (structural motifs) Various Sources Slide 1 Supersecondary Structures (Motifs) Supersecondary Structures (Motifs): : Combinations of secondary structures in specific geometric

More information

Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials

Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials THE JOURNAL OF CHEMICAL PHYSICS 122, 024904 2005 Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials Alan E. van Giessen and John E. Straub Department

More information

The Molecular Dynamics Method

The Molecular Dynamics Method H-bond energy (kcal/mol) - 4.0 The Molecular Dynamics Method Fibronectin III_1, a mechanical protein that glues cells together in wound healing and in preventing tumor metastasis 0 ATPase, a molecular

More information

Orientational degeneracy in the presence of one alignment tensor.

Orientational degeneracy in the presence of one alignment tensor. Orientational degeneracy in the presence of one alignment tensor. Rotation about the x, y and z axes can be performed in the aligned mode of the program to examine the four degenerate orientations of two

More information

The Dominant Interaction Between Peptide and Urea is Electrostatic in Nature: A Molecular Dynamics Simulation Study

The Dominant Interaction Between Peptide and Urea is Electrostatic in Nature: A Molecular Dynamics Simulation Study Dror Tobi 1 Ron Elber 1,2 Devarajan Thirumalai 3 1 Department of Biological Chemistry, The Hebrew University, Jerusalem 91904, Israel 2 Department of Computer Science, Cornell University, Ithaca, NY 14853

More information

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO! Identification of Protein-model accuracy Why is it important? What is accuracy RMSD, fraction correct, Protein model correctness/quality

More information

Molecular Mechanics. I. Quantum mechanical treatment of molecular systems

Molecular Mechanics. I. Quantum mechanical treatment of molecular systems Molecular Mechanics I. Quantum mechanical treatment of molecular systems The first principle approach for describing the properties of molecules, including proteins, involves quantum mechanics. For example,

More information

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004 CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004 Lecture #2: 1 April 2004 Topics: Kinematics : Concepts and Results Kinematics of Ligands and

More information

Conformational Geometry of Peptides and Proteins:

Conformational Geometry of Peptides and Proteins: Conformational Geometry of Peptides and Proteins: Before discussing secondary structure, it is important to appreciate the conformational plasticity of proteins. Each residue in a polypeptide has three

More information

Automated Assignment of Backbone NMR Data using Artificial Intelligence

Automated Assignment of Backbone NMR Data using Artificial Intelligence Automated Assignment of Backbone NMR Data using Artificial Intelligence John Emmons στ, Steven Johnson τ, Timothy Urness*, and Adina Kilpatrick* Department of Computer Science and Mathematics Department

More information

Energy landscapes of model polyalanines

Energy landscapes of model polyalanines JOURNAL OF CHEMICAL PHYSICS VOLUME 117, NUMBER 3 15 JULY 2002 Energy landscapes of model polyalanines Paul N. Mortenson, David A. Evans, and David J. Wales University Chemical Laboratories, Lensfield Road,

More information

Computer design of idealized -motifs

Computer design of idealized -motifs Computer design of idealized -motifs Andrzej Kolinski a) University of Warsaw, Department of Chemistry, Pasteura 1, 02-093 Warsaw, Poland and The Scripps Research Institute, Department of Molecular Biology,

More information

Conformational Analysis of the Crystal Structure for MDI/ BDO Hard Segments of Polyurethane Elastomers

Conformational Analysis of the Crystal Structure for MDI/ BDO Hard Segments of Polyurethane Elastomers Conformational Analysis of the Crystal Structure for MDI/ BDO Hard Segments of Polyurethane Elastomers CHRIS W. PATTERSON, DAVID HANSON, ANTONIO REDONDO, STEPHEN L. SCOTT, NEIL HENSON Theoretical Division,

More information

Protein Structure Determination

Protein Structure Determination Protein Structure Determination Given a protein sequence, determine its 3D structure 1 MIKLGIVMDP IANINIKKDS SFAMLLEAQR RGYELHYMEM GDLYLINGEA 51 RAHTRTLNVK QNYEEWFSFV GEQDLPLADL DVILMRKDPP FDTEFIYATY 101

More information

Computational Protein Design

Computational Protein Design 11 Computational Protein Design This chapter introduces the automated protein design and experimental validation of a novel designed sequence, as described in Dahiyat and Mayo [1]. 11.1 Introduction Given

More information

Kd = koff/kon = [R][L]/[RL]

Kd = koff/kon = [R][L]/[RL] Taller de docking y cribado virtual: Uso de herramientas computacionales en el diseño de fármacos Docking program GLIDE El programa de docking GLIDE Sonsoles Martín-Santamaría Shrödinger is a scientific

More information

Importance of chirality and reduced flexibility of protein side chains: A study with square and tetrahedral lattice models

Importance of chirality and reduced flexibility of protein side chains: A study with square and tetrahedral lattice models JOURNAL OF CHEMICAL PHYSICS VOLUME 121, NUMBER 1 1 JULY 2004 Importance of chirality and reduced flexibility of protein side chains: A study with square and tetrahedral lattice models Jinfeng Zhang Department

More information

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a Lecture 11: Protein Folding & Stability Margaret A. Daugherty Fall 2004 How do we go from an unfolded polypeptide chain to a compact folded protein? (Folding of thioredoxin, F. Richards) Structure - Function

More information

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins Margaret Daugherty Fall 2004 Outline Four levels of structure are used to describe proteins; Alpha helices and beta sheets

More information

From Amino Acids to Proteins - in 4 Easy Steps

From Amino Acids to Proteins - in 4 Easy Steps From Amino Acids to Proteins - in 4 Easy Steps Although protein structure appears to be overwhelmingly complex, you can provide your students with a basic understanding of how proteins fold by focusing

More information

Universal Similarity Measure for Comparing Protein Structures

Universal Similarity Measure for Comparing Protein Structures Marcos R. Betancourt Jeffrey Skolnick Laboratory of Computational Genomics, The Donald Danforth Plant Science Center, 893. Warson Rd., Creve Coeur, MO 63141 Universal Similarity Measure for Comparing Protein

More information