Generalized ensemble methods for de novo structure prediction. 1 To whom correspondence may be addressed.

Size: px
Start display at page:

Download "Generalized ensemble methods for de novo structure prediction. 1 To whom correspondence may be addressed."

Transcription

1 Generalized ensemble methods for de novo structure prediction Alena Shmygelska 1 and Michael Levitt 1 Department of Structural Biology, Stanford University, Stanford, CA Contributed by Michael Levitt, December 11, 2008 (sent for review October 12, 2008) Current methods for predicting protein structure depend on two interrelated components: (i) an energy function that should have a low value near the correct structure and (ii) a method for searching through different conformations of the polypeptide chain. Identification of the most efficient search methods is essential if we are to be able to apply such methods broadly and with confidence. In addition, efficient search methods provide a rigorous test of existing energy functions, which are generally knowledge-based and contain different terms added together with arbitrary weights. Here, we test different search methods with one of the most accurate and predictive energy functions, namely Rosetta the knowledge-based force-field from Baker s group [Simons K, Kooperberg C, Huang E, Baker D (1997) J Mol Biol 268: ]. We use an implementation of a generalized ensemble search method to scale relevant parts of the energy function. This method, known as Hamiltonian Replica Exchange Monte Carlo, outperforms the original Monte Carlo Simulated Annealing used in the Rosetta package in terms of sampling low-energy states. It also outperforms another widely used generalized ensemble search method known as Temperature Replica Exchange Monte Carlo. Our results reveal clear deficiencies in the low-resolution Rosetta energy function in that the lowest energy structures are not necessarily the most native-like. By using a set of nonnative low-energy structures found by our extensive sampling, we discovered that the long-range and short-range backbone hydrogen-bonding energy terms of the Rosetta energy discriminate between the nonnative and native-like structures significantly better than the low-resolution score used in Rosetta. conformational search protein folding Rosetta force field Predicting the functional 3-dimensional structure (the native state) of a protein from its amino acid sequences is of central importance to structural and functional biology and has enormous applications in alleviating human disease. Even if the structures of all proteins were known, we would still not be able to answer questions related to diseases directly caused by protein misfolding, such as certain types of cancer and Alzheimer s and Parkinson disease. For this we would need to understand the physical basis of the energy terms that make the native state so special. Such understanding of the energetics of the system would also lead to more efficient and comprehensive drug design. Structure prediction depends on solving two problems: (i) describing the energy function with sufficient accuracy and (ii) searching the conformational space sufficiently well. These problems are particularly severe for proteins of biologically relevant lengths ( 150 aa). In this work we focus on conformational sampling, which has been recognized as the critical step in high-resolution structure prediction (1 3). Most widely used standard methods for de novo structure prediction are based on the variants of the Monte Carlo method (4 6) and are unable to explore low-energy regions efficiently because of the ruggedness of the potential energy surface. To overcome these problems, a number of generalized ensemble Monte Carlo methods have been developed (7 10). These methods strive to search energy space better by computing the density of states, sampling expanded ranges of temperatures, or computing other physical quantities affecting transitions between the states during the search. In particular, advanced methods such as Temperature Replica Exchange Monte Carlo (TREM) (8) and Hamiltonian Replica Exchange Monte Carlo (HREM) (10), have been shown to outperform standard Monte Carlo in terms of sampling for both simplified and all-atom force fields of small proteins (8, 10, 11). For longer proteins, the computational cost and ruggedness of the all-atom energy function makes solving this problem particularly challenging as evidenced by the modest success of fullatom refinement (12 14). For this reason, there are multiscale approaches that start with low-resolution or reduced-model energy functions and then use all-atom energy functions on a few selected conformations [often relying on additional steps such as use of sequence homologs (2) or clustering (3, 4)] been developed (4, 6, 12, 13). These approaches often fail to generate low-resolution models within the radius of convergence (rmsd 3 Å) of the native state necessary for the success of subsequent full-atom refinement (2). In this work, we test whether enhanced conformational sampling of low-resolution models can improve structure prediction. Specifically, we apply generalized Monte Carlo methods to one of the most powerfully predictive de novo protein potential energy functions, the low-resolution Rosetta force field (1). We compare the performance of two of the best-performing search methods, Temperature Replica Exchange Monte Carlo and Hamiltonian Replica Exchange Monte Carlo, with the fourstage Monte Carlo Simulated Annealing protocol used in the original Rosetta algorithm. We show that for a representative set of 40 proteins containing,,, and / folds both the HREM and, to a lesser degree, TREM methods enhance sampling of low-energy states as compared with the original Rosetta method. More importantly, we are able to use the nonnative-like low-energy structures sampled by generalized ensemble methods to suggest improvements of the low-resolution scoring function used in Rosetta. Our analysis of energy landscapes and structure clusters shows that HREM outperforms other search methods, not only in terms of finding more low-energy states, but also in sampling a more diverse set of compact structures for use in optimization of energy functions. Results and Discussion Four Stages of the Rosetta Scoring Function. Rosetta s lowresolution Monte Carlo method (known here as ROSETTA) employs a hierarchical protocol consisting of four sequential searches that involve swapping fragments of length 9 and then 3 residues. Each stage employs a different scoring function. These Author contributions: A.S. and M.L. designed research; A.S. performed research; A.S. analyzed data; and A.S. and M.L. wrote the paper. The authors declare no conflict of interest. Freely available online through the PNAS open access option. 1 To whom correspondence may be addressed. alena.shmygelska@stanford.edu or michael.levitt@stanford.edu. This article contains supporting information online at /DCSupplemental by The National Academy of Sciences of the USA BIOPHYSICS cgi doi pnas PNAS February 3, 2009 vol. 106 no

2 Fig. 1. Energy and rmsd differences for HREM and TREM as compared to ROSETTA. (A) Showing the difference in energy value between conformations sampled during 20,000 independent runs by HREM and TREM and those conformations independently generated by ROSETTA for 40 selected proteins from the four structural classes of SCOP ( aa). In each case we show differences for (i) the lowest energy values (min), (ii) the cutoff energy value for the 90th percentile of low-energy structures (p90, best 10% of structures), and (iii) the lowest energy values from the five largest clusters (Cbest). In all cases, HREM gets lower energy values than ROSETTA (energy differences 0), whereas TREM is better than ROSETTA in just 50% of the cases. (B) Showing the difference in C root mean square deviation (rmsd) values between same conformations. In each case we show differences for (i) the rmsd for the lowest energy structure (min), (ii) the mean rmsd for the 90th percentile of low-energy structures (p90), and (iii) the cluster centroid rmsd from five largest clusters (Cbest). four different energy-scoring functions involve (i) replacement of the extended chain (score0), (ii) buildup of the secondary structure (score1), (iii) alternation of high (score2) and low (score5) sheet weights, and (iv) low-resolution centroid refinement (score3) (15). Finally, structures are selected according to another low-resolution centroid refinement score (score4). Each subsequent scoring function used in ROSETTA adds new terms, while leaving many energy contributions unchanged; this provides significant overlap of the energy values of conformations sampled by different scoring functions. In addition, the cumulative nature of the energy functions used consecutively in ROSETTA allows one to represent each scoring function as a scaled variant of the full energy function (score3). Additional information about specific energy contributions and scaling parameters for each energy component used by ROSETTA is provided in Materials and Methods. Our observation of overlap between scoring functions used in ROSETTA lead us to introduce a new HREM implementation for Rosetta. Overlap provides a number of similar Hamiltonians that are related by a scale. In our implementation of HREM, we assign each replica to one of the four scoring functions, and attempt exchanges between the replicas. We find that HREM s low-effectivetemperature replicas (replicas that use the full, nonscaled energy potential) sample lower energies than those sampled by the final stage of the low-resolution protocol in ROSETTA. Moreover, the overlap between the distributions of conformations sampled by four different scoring functions is increased by our HREM scheme [supporting information (SI) Fig. S1]. Low-Energy and Low-rmsd Conformational Sampling. To study energy landscape features of each of the three search methods, we examined energy values and rmsd value of conformations sampled during 20,000 runs starting from an extended state. Fig. 1 shows results for 40 sequences of different lengths ( aa) and belonging to the four different structural classes,, /, and. Analyzing the low energies sampled (Fig. 1A) we found that the HREM search method generally outperforms other search methods in terms of sampling low-energy states on all sequences. In particular, performance differences between the generalized ensemble methods, HREM and TREM, and ROSETTA (the lowest energy, the energy level below which 10% of the structures lie, and the lowest energy among the five highly populated clusters) become more marked as the length of the protein increases and seems to be larger for -folds. In comparison with ROSETTA, HREM (consistently) and TREM (often) gave rise to significant improvement in terms of lower energy values. This did not always lead to the improvement in rmsd because of false minima in the energy landscapes (Fig. 1B). Energy Landscapes Sampled by ROSETTA and HREM. To gain additional insight into the energy landscape encountered during the search for a given protein, we examined the 2-dimensional distribution of conformations as a function of the low-resolution Rosetta s score (score4), on the y axis and the C rmsd or C global distance test total score [GDT TS (16)] to the native state, on the x axis. We used the density of states to reveal the free energy of the underlying landscape when folding with HREM and ROSETTA for all 40 proteins. Particular insight comes from comparing results obtained starting from an extended and the native state: both starting states have converged to a similar structure in the lower-energy range for most proteins. However, simulations from the native state showing location of the nearnative states (rmsd 3 Å) usually reveal a false region of attraction with rmsd from 3.5 to 17.4 Å (average value, Å) having energies kt lower than the near-native conformations (energy differences range from 14.4 to kt). Longer proteins ( 90 aa) tend to be at the upper end of this range for both rmsd and energy differences. By contrast, shorter proteins tend to have energy landscapes with a flat, false lowest-energy region: many states with a wide range of rmsd values have almost the same low energy value, which are lower than the energy values of the near-native states cgi doi pnas Shmygelska and Levitt

3 Fig. 2. Shown are distributions of conformations (blue to red for low to high density, a measure of the underlying free energy) generated by HREM and ROSETTA as a function of the low-resolution Rosetta score (score4) and the fit to the native structure as measured by either the C root mean square deviation (rmsd) or the C Global Distance Test Total Score (GDT TS). A total of 20,000 structures were generated for each method starting from (i) an extended state or (ii) the native state of the all- protein, 1e43a1 (Alpha Amylase, C-terminal -sheet domain from Bacillus licheniformis) containing 90 aa). Clearly, HREM generates a much better sampling of conformations than ROSETTA. Fig. 2 shows typical energy landscapes sampled from the extended and the native state by HREM and ROSETTA. ROSETTA tends to sample only a local part of the energy landscape, whereas HREM samples much more extensively. Nevertheless, because of false regions of attraction, HREM simulations from an extended state do not sample near-native conformations of interest (rmsd 3 Å). Together with the analysis presented in the previous section, these results show that the low-resolution energy function in Rosetta cannot reliably recognize near-native states. Differences in Top Cluster Centers for ROSETTA and HREM. In principle, an accurate energy function should always recognize near-native conformations and discriminate them from nonnative conformations. In practice, there are scoring function inaccuracies and structural clustering must be used by de novo structure prediction methods to identify native-like structures (2, 4, 13). This makes two assumptions: (i) that the native conformation should have more structural neighbors than any other conformation because of the loss in configurational entropy on folding; and (ii) that this near-native energy basin is detected by the knowledge-based scoring functions used in Rosetta in that the basin results from the long-range hydrophobic interactions associated with native globular proteins (17). In our work, we used the LEADER clustering algorithm extensively tested with the Rosetta protocol in the Critical Assessment of Structure Prediction (CASP) competition (2, 12, 13, 18). We clustered 5,000 lowest-energy structures of the 20,000 lowest-energy conformations found in independent runs by each method. As seen in Fig. 3, HREM finds more diverse and larger clusters than TREM or ROSETTA. It is worth noting that agreement within HREM clusters is stronger than for TREM or ROSETTA (see Tables S4 and S5). A possible explanation for these observations, also supported by the energy landscape analysis in the previous section, is that HREM samples a more diverse set of highly populated low-energy basins of conformations. This indicates that the basins associated with false local minima are highly populated and thus represent conformations against which the scoring functions should be improved. Deficiencies of the Rosetta Low-Resolution Scoring Function. To further understand how the individual energy terms of the Rosetta energy function discriminate near-native states from incorrect low-energy states, we examined two sets of 1,000 low-rmsd states and one set of 1,000 low-energy (score4) states generated by all three methods for each of the 40 proteins in our dataset. The first set of low-rmsd states (low-rmsd nat, mean rmsd 3.98 Å) was generated by sampling from the native state (structures closer than 1.0 Å were removed to prevent possible artifacts in recognition by an energy function parameterized on Fig. 3. Differences in top 5 clusters sampled by ROSETTA, HREM, and TREM. (A) Shown is how the average rmsd of the top five clusters depends on protein length. (B) Shown is how the average cluster size (for the top five largest clusters) depends on protein length. In both cases we present data for the entire set of 40 proteins for the tree methods: ROSETTA, HREM, and TREM. Superior performance of HREM is clear for both measures. BIOPHYSICS Shmygelska and Levitt PNAS February 3, 2009 vol. 106 no

4 Table 1. Native-like average Z score (top portion of the table), Pearson s correlation coefficient between rmsd and energy score (bottom portion of the table), for low-rmsd nat vs. low-score4 ext and low-rmsd ext vs. low-score4 ext discrimination Low-rmsd nat vs. low-score4 ext discrimination Low-rmsd ext vs. low-score4 ext discrimination score4 hb srbb hb lrbb rama score4 hb srbb hb lrbb rama Z Z Z / Z Z all r r r / r r all Mean and standard deviation are provided. high-resolution crystal structures), the second set of low-rmsd states (low-rmsd ext, mean rmsd 8.56 Å) and the set of low score4 energy states (low-score4 ext, mean rmsd Å) were generated by sampling from an extended state. With its much greater efficiency, HREM contributed most (85%) to the set of lowenergy structures found by starting with an extended state. Because these low-scoring decoys were produced by rigorously sampling the energy function, they represent a challenging set of local minima of Rosetta s low-resolution energy function. We calculated two independent statistical measures that capture the ability of the scoring function to discriminate the native-like conformations from nonnative-like: (i) the Z score of the rmsd values of native-like conformations; and (ii) the Pearson correlation coefficient between rmsd and score. In Table 1, we give these values for the following discrimination tasks: (i) discriminate low-rmsd nat from low-score4 ext, and (ii) discriminate low-rmsd ext from low-score4 ext for the original low-resolution score (score4) as well as a selected set of Rosetta s low-resolution energy score terms that were identified as having most discriminatory power. We analyze each of the four structural classes separately, in addition to a combined analysis for all proteins in our dataset. As seen (Table 1), good native-like average Z scores (Z 1.0), and higher Pearson s correlation coefficients (r 0.3), indicate the enhanced discrimination power of the hydrogen bond backbone backbone scores for both tasks for all structural classes. For discrimination between perturbed native states (lowrmsd nat ) and incorrectly scored low-energy states (lowscore4 ext ), the long-range hydrogen bond term (hb lrbb), where donor and acceptor of a backbone backbone hydrogen bond separated by at least 5 amino acids along the sequence, is more successful. For the more challenging discrimination between near-native states sampled during ab initio folding (lowrmsd ext ) and incorrectly scored low-energy states (low-score4 ext ), the short-range backbone backbone hydrogen bond term (hb srbb), where donor and acceptor of a hydrogen bond is 4 or fewer amino acids apart along the sequence, is more successful. Although this holds for all protein folds, it is less marked for / proteins; in our dataset these folds have longer lengths and larger rmsd values for both near-native and low-energy sets. Enhanced discrimination is also shown by the Ramachandran score (rama). We observed that low-rmsd nat conformations differed from the low-rmsd ext conformations in having more favorable longrange hydrogen bonds for,, and / folds (mean Z score is ) and lower Ramachandran energies for -folds (Z score is ) as well as having less favorable short-range hydrogen bonds for -folds (Z score is ) and higher contact order for all folds (mean Z score is ). Thus, the low-rmsd ext decoys are less native-like and have fewer nonlocal interactions resulting in less favorable long-range and more favorable short-range hydrogen bonds; this suggests that, as folding from an extended state proceeds to form more long-range interactions, the discriminatory power of hydrogen bonds shifts from short-range to long-range. Fig. 4 shows how the long-range and short-range hydrogen bonding backbone backbone potential transforms the original low-resolution Rosetta energy (score4) landscape assigning lower energies to closer-to-native (low-rmsd) conformations. This is shown separately for all of the proteins in each of the four fold classes:,,, and /. In Fig. 4A, we show low-rmsd nat set (start from the native state) and low-score4 ext set (start from an extended state) discrimination by the original low-resolution energy (score4) and by long-range hydrogen bond score (hb lrbb). In Fig. 4B, we show a more difficult discrimination test for the low-rmsd ext and low-score4 ext decoys (both start from an extended state). Decoys in low-rmsd ext set are less native-like than those in the low-rmsd nat set and are thus harder to distinguish from the low-score4 ext decoys. In Fig. 4B, we see that the score4 energy function is unable to distinguish structures with low rmsd values. In fact, the structures with the lowest score4 energies are generally 10 Å from the native structure; there is a distinct pattern of anticorrelation with the energy becoming more favorable as the rmsd increases. A different energy term, the short-range backbone backbone hydrogen bond energy (hb srbb) shown on Fig. 4B Right is generally able to reverse this anticorrelation, but the energy of the low-rmsd decoys is now about the same as that of the decoys with higher rmsd. These results indicate that with efficient search methods such as HREM the discrimination power of low-resolution energy functions can be improved. A promising methodology to improve the discrimination power further is to use efficient methods like HREM to locate decoys that are energy minima; this is then followed by optimization of the energy function against these decoys. This paradigm, pioneered in 1996 (19 20), proved important in the formulation of Rosetta (1) and will likely be as important for future improvement of methods for structure prediction. An orientation-dependent hydrogen-bonding energy term was first added to Rosetta energy force field to enhance discrimination between native-like and nonnative conformations just before and during full-atom refinement (21). This energy is a linear combination of four terms that are parameterized by using a set of high-resolution protein crystal structures: (i) distancedependent energy term derived from the distribution of distances between the hydrogen and acceptor atoms (distances range from 1.4 to 2.6 Å), (ii) angular energy measuring angle at cgi doi pnas Shmygelska and Levitt

5 Fig. 4. Discrimination between native-like and nonnative-like conformations. (A) Comparing the ability of the low-resolution Rosetta scoring function (score4) and the long-range hydrogen-bonding backbone backbone potential (hb lrbb) to discriminate between native-like low-rmsd nat (starting from the native state) and nonnative low-score4 ext (starting from an extended state) conformations. Colors used for each fold class are as follows: all-, bright and dark red; all-, bright and dark green; /, bright and dark orange; and, bright and dark purple. The darker color is always used for the near-native decoys (low rmsd). Note how the hb lrbb score is better at discrimination than score4 is. (B) Comparing the discrimination ability of the low-resolution Rosetta scoring function (score4) and the short-range hydrogen-bonding backbone backbone potential (hb srbb) to discriminate between native-like low-rmsd ext and nonnative low-score4 ext conformations (both starting from an extended state). The hb srbb score generally raises the energy of the nonnative-like conformations (low-score4 ext, high rmsd) relative to the near-native conformations (low-rmsd ext ); this makes the lower edge of the distribution slope toward rather than away from the native state. the hydrogen atom, (iii) angular energy measuring angle at the acceptor atom, and (iv) dihedral angle term corresponding to rotation around the acceptor acceptor base bond in the case of an sp 2 hybridized acceptor (21). It has also been shown that this knowledge-based hydrogen-bonding potential in Rosetta is consistent with the quantum mechanical calculations unlike molecular mechanics force fields, including CHARM27, OPLS-AA, and MM (22). Recently a modified version of Rosetta s hydrogen bonding potential was successfully used for protein structure refinement of homology models (23). That study showed that the modified Rosetta s hydrogen bonding potential in combination with two other statistical potentials can discriminate near-native models (obtained by using temperature replica exchange molecular dynamics) with an accuracy comparable to Rosetta s full-atom score (23). In our work, we show that backbone backbone hydrogen-bonding energy terms significantly enhance discrimination of near-native and misfolded native-like states for ab initio protocol. Conclusion In this work we have shown that development of search methods that more efficiently sample local minima is important for two reasons: (i) better protein structure prediction and (ii) better optimization of the energy function. We have found that Hamiltonian Replica Exchange Monte Carlo method is the most promising search method for de novo protein structure prediction with low-resolution force fields; it outperforms Temperature Replica Exchange Monte Carlo and the original Rosetta Monte Carlo method. A better set of local minima provides a more challenging decoy set against which the energy function can be optimized. Thus, our results reveal some of the deficiencies of the existing energy terms in Rosetta, including the presence of false local minima and a general flatness of the energy landscape near the native states. Only through better understanding of these deficiencies, as revealed by our very powerful search method, will we be able to develop better energy functions and representations. We used an implementation of HREM that utilizes the four scoring functions from the existing Rosetta protocol; in future work we will investigate other implementations of HREM that will scale individual energy contributions of different energy terms. Our results confirm that the Hamiltonian Replica Exchange Monte Carlo method and its variants are promising and deserve further study. Materials and Methods Protein Dataset Used. To evaluate and compare the algorithms, a set of 40 nonhomologous folds was selected from the Structural Classification of Proteins (SCOP) (24) structural domain database (ranging in length from 55 to 208 aa). Protein families in the test set span four SCOP class categories: all, all, /, and are of different protein sequence lengths to ensure the generality of the reported results. We generated six independent sets of 20,000 decoys for each protein sequence for each search method starting from the completely extended state and starting from the native state. The Rosetta Energy Function. All of the search methods developed and implemented were tested for Rosetta s low-resolution protein structure representation and scoring functions. Rosetta is a protein structure prediction program developed in Baker s group and made freely available to academic community (1). Rosetta incorporates (i) a low-resolution representation of a protein that uses the main chain atoms and a side-chain centroid and (ii) a high-resolution representation that uses all atoms. The low-resolution Rosetta energy function includes the van der Waals hard sphere repulsion (vdw), environment (env), pair (pair), C packing density (cb), secondary structure packing [helix helix pairing (hh), helix-strand pairing (hs), strand-strand pairing (ss), strand pair distance/register (rsigma) and strand arrangement into sheets (sheet)], radius of gyration (rg) energetic contributions, contact order (co), and Ramachandran torsion angle filters (rama) (2, 14). Additional hydrogen bonding (short- (hb srbb) and long-range (hb lrbb) backbone backbone hydrogen bond) energy terms are added right before (score6) and used during full-atom refinement (score12). All of the energy scoring components of Rosetta s energy score are described in details elsewhere (15). BIOPHYSICS Shmygelska and Levitt PNAS February 3, 2009 vol. 106 no

6 TREM. Standard (8) implementation of Temperature Replica Exchange Monte Carlo (TREM) is used here. Eight replicas run at related exponentially distributed temperatures ( i : 1.40, 1.95, 2.72, 3.79, 5.29, 7.38, 10.31, and kt) to ensure efficiency of the exchanges, underwent four different stages of Monte Carlo interrupted by the attempted exchanges after each stage (see Additional Methods Description. TREM in the SI). These specific temperature settings were optimized in a number of short preliminary runs. Following the general criterion for choosing the exchange frequency between replicas by integrating autocorrelation time of the higher temperature simulation (25), exchanges between replicas were attempted after every 2,000 steps. HREM. Hamiltonian Replica Exchange Monte Carlo uses several related Hamiltonians for different replicas, where only some of the terms of the potential energy function, U(X), are modified across replicas through scaling parameters i (10). Similarly to TREM, exchanges between pairs of replicas are attempted with a certain frequency, allowing it to overcome interactions responsible for the ruggedness of the landscape to be weakened. Unlike regular TREM that scales with the square root of total degrees of freedom in the number of replicas required to guarantee optimal overlap, HREM scales as a square root of only relevant subsystem degrees of freedom and is therefore preferable for large systems. The key difference between the standard implementation of HREM (10) in our work, is that i is a vector of weights and not a scalar: where U i X U A X i U B X, U A X U vdw X U B X U env X U pair X U sheet X U hs X U ss X The weights are: U cb X U rsigma X U rg X. i i,env, i,pair, i,sheet, i,hs, i,ss, i,cb, i,rsigma, i,rg, for i (0, 1, 2, 3). In order for HREM to be effective, energy Hamiltonians should only differ in a limited number of energy components. Four different low-resolution scores of Rosetta low-energy function satisfy this condition with the following sets of Rosetta scaling parameters: score i,env, i,pair, i,sheet, i,hs, i,ss, i,cb, i,rsigma, i,rg score 0 0, 0, 0, 0, 0, 0, 0, 0 score 1 1, 1, 1, 1, 0.3, 0, 0, 0 score 2 1, 1, 1, 1, 1, 0.5, 0, 0 score 3 1, 1, 1, 1, 1, 1, 1, 1 To satisfy the condition of the detailed balance, the probability of attempted pairwise exchanges between replicas follows the equation: where W X i, X j 3 X i, X j min 1, e Xi, Xj 3 Xi, Xj, X i, X j 3 X i, X j U i X U j X U j X U i X. The exchange frequency between replicas was chosen by integrating the autocorrelation time of the highest effective temperature (score0) simulation (25); exchanges between replicas were tried after every 2,000 Monte Carlo steps. The inverse temperature,, was set to 2.0 kt as in ROSETTA. ACKNOWLEDGMENTS. We thank members of the Levitt lab for helpful discussions. This work was supported by Natural Sciences and Engineering Council of Canada Postdoctoral Fellowship PGS-D (to A.S.) and National Institutes of Health Grant GM (to M.L.). National Science Foundation Award CNS provided computer resources. 1. Simons K, Kooperberg C, Huang E, Baker D (1997) Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring function. J Mol Biol 268: Bradley P, Misura K, Baker D (2005) Toward high-resolution de novo structure prediction for small proteins. Science 309: Schueler-Furman O, Wang C, Bradley P, Misura K, Baker D (2005) Progress in modeling of protein structures and interactions. Science 310: Zhang Y, Arakaki AK, Skolnick J (2005) TASSER: An automated method for the prediction of protein tertiary structures in CASP6. Proteins Suppl 7: Ortiz AR, Kolinski A, Rotkiewicz P, Ilkowski B, Skolnick J (1999) Ab initio folding of proteins using restrains derived from evolutionary information. Proteins Suppl 3: Zhang Y, Kihara D, Skolnick J (2002) Local energy landscape flattening: Parallel hyperbolic monte carlo sampling of protein folding. Proteins 48: Swendsen RH, Wang JS (1986) Replica Monte Carlo simulation of spin-glasses. Phys Rev Lett 57: Okamoto Y (2004) Generalized-ensemble algorithms: Enhanced sampling techniques for Monte Carlo and molecular dynamic simulations. J Mol Graphics Model 22: Hansmann UHE (1999) Protein folding simulations in a deformed energy landscape. Eur Phys J B 12: Fukunishi H, Watanabe O, Takada S (2002) On the Hamiltonian replica exchange method for efficient sampling, of biomolecular systems: Application to protein structure prediction. J Chem Phys 116: Liu P, Kim B, Friesner RA, Berne BJ (2005) Replica exchange with solute tempering: A method for sampling biological systems in explicit water. Proc Natl Acad Sci USA 102: Das R, Baker D (2008) Macromolecular modeling with Rosetta. Annu Rev Biochem 77: Misura KMS, Baker D (2005) Progress and challenges in high-resolution refinement of protein structure models. Proteins 59: Jagielska A, Wroblewska L, Skolnick J (2008) Protein model refinement using an optimized physics-based all-atom force field. Proc Natl Acad Sci USA 105: Rohl CA, Strauss CEM, Misura KMS, Baker D (2004) Protein structure prediction using Rosetta. Methods Enzymol 383: Zemla A (2003) LGA: A method for finding 3D similarities in protein structures. Nucleic Acids Res 31: Shortle D, Simons K, Baker D (1998) Clustering of low-energy conformations near the native structures of small proteins. Proc Natl Acad Sci USA 95: Das R, et al. (2007) Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home. Proteins 69(Suppl 8): Huang ES, Subbiah S, Tsai J, Levitt M (1996) Using a hydrophobic contact potential to evaluate native and near-native folds generated by molecular dynamics simulations. J Mol Biol 257(33): Park B, Levitt M (1996) Energy functions that discriminate x-ray and near-native folds from well-constructed decoys. J Mol Biol 258: Kortemme, T., Morozov AV, Baker D (2003) An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. J Mol Biol 326: Morozov AV, Kortemme T, Tsemekhman K, Baker D (2004) Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations. Proc Natl Acad Sci USA 101: Zhu J, Fan H, Periole X, Honig B, Mark AE (2008) Refining homology models by combining replica-exchange molecular dynamics and statistical potentials. Proteins 72: Murzin A, Brenner SE, Hubbard TJP, Chothia C (1995) SCOP: A structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247: Newman MEJ, Barkma G.T (1999) Monte Carlo Methods in Statistical Physics (Clarendon, Oxford) cgi doi pnas Shmygelska and Levitt

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

Computer simulations of protein folding with a small number of distance restraints

Computer simulations of protein folding with a small number of distance restraints Vol. 49 No. 3/2002 683 692 QUARTERLY Computer simulations of protein folding with a small number of distance restraints Andrzej Sikorski 1, Andrzej Kolinski 1,2 and Jeffrey Skolnick 2 1 Department of Chemistry,

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling

More information

Protein quality assessment

Protein quality assessment Protein quality assessment Speaker: Renzhi Cao Advisor: Dr. Jianlin Cheng Major: Computer Science May 17 th, 2013 1 Outline Introduction Paper1 Paper2 Paper3 Discussion and research plan Acknowledgement

More information

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure

More information

Protein Structure Prediction

Protein Structure Prediction Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on

More information

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein

More information

Ab-initio protein structure prediction

Ab-initio protein structure prediction Ab-initio protein structure prediction Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center, Cornell University Ithaca, NY USA Methods for predicting protein structure 1. Homology

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information

Protein Structure Prediction, Engineering & Design CHEM 430

Protein Structure Prediction, Engineering & Design CHEM 430 Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment

More information

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome

More information

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC The precise definition of a dihedral or torsion angle can be found in spatial geometry Angle between to planes Dihedral

More information

Improved Recognition of Native-Like Protein Structures Using a Combination of Sequence-Dependent and Sequence-Independent Features of Proteins

Improved Recognition of Native-Like Protein Structures Using a Combination of Sequence-Dependent and Sequence-Independent Features of Proteins PROTEINS: Structure, Function, and Genetics 34:82 95 (1999) Improved Recognition of Native-Like Protein Structures Using a Combination of Sequence-Dependent and Sequence-Independent Features of Proteins

More information

Clustering of low-energy conformations near the native structures of small proteins

Clustering of low-energy conformations near the native structures of small proteins Proc. Natl. Acad. Sci. USA Vol. 95, pp. 11158 11162, September 1998 Biophysics Clustering of low-energy conformations near the native structures of small proteins DAVID SHORTLE*, KIM T. SIMONS, AND DAVID

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

Prediction and refinement of NMR structures from sparse experimental data

Prediction and refinement of NMR structures from sparse experimental data Prediction and refinement of NMR structures from sparse experimental data Jeff Skolnick Director Center for the Study of Systems Biology School of Biology Georgia Institute of Technology Overview of talk

More information

Template-Based Modeling of Protein Structure

Template-Based Modeling of Protein Structure Template-Based Modeling of Protein Structure David Constant Biochemistry 218 December 11, 2011 Introduction. Much can be learned about the biology of a protein from its structure. Simply put, structure

More information

Evolutionary design of energy functions for protein structure prediction

Evolutionary design of energy functions for protein structure prediction Evolutionary design of energy functions for protein structure prediction Natalio Krasnogor nxk@ cs. nott. ac. uk Paweł Widera, Jonathan Garibaldi 7th Annual HUMIES Awards 2010-07-09 Protein structure prediction

More information

The typical end scenario for those who try to predict protein

The typical end scenario for those who try to predict protein A method for evaluating the structural quality of protein models by using higher-order pairs scoring Gregory E. Sims and Sung-Hou Kim Berkeley Structural Genomics Center, Lawrence Berkeley National Laboratory,

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization

Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization Biophysical Journal Volume 101 November 2011 2525 2534 2525 Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization Dong Xu and Yang Zhang

More information

Contact map guided ab initio structure prediction

Contact map guided ab initio structure prediction Contact map guided ab initio structure prediction S M Golam Mortuza Postdoctoral Research Fellow I-TASSER Workshop 2017 North Carolina A&T State University, Greensboro, NC Outline Ab initio structure prediction:

More information

Protein Structure Prediction

Protein Structure Prediction Protein Structure Prediction Michael Feig MMTSB/CTBP 2006 Summer Workshop From Sequence to Structure SEALGDTIVKNA Ab initio Structure Prediction Protocol Amino Acid Sequence Conformational Sampling to

More information

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Supporting Information Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Christophe Schmitz, Robert Vernon, Gottfried Otting, David Baker and Thomas Huber Table S0. Biological Magnetic

More information

ALL LECTURES IN SB Introduction

ALL LECTURES IN SB Introduction 1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL

More information

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Ruhong Zhou 1 and Bruce J. Berne 2 1 IBM Thomas J. Watson Research Center; and 2 Department of Chemistry,

More information

AbInitioProteinStructurePredictionviaaCombinationof Threading,LatticeFolding,Clustering,andStructure Refinement

AbInitioProteinStructurePredictionviaaCombinationof Threading,LatticeFolding,Clustering,andStructure Refinement PROTEINS: Structure, Function, and Genetics Suppl 5:149 156 (2001) DOI 10.1002/prot.1172 AbInitioProteinStructurePredictionviaaCombinationof Threading,LatticeFolding,Clustering,andStructure Refinement

More information

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Zhong Chen Dept. of Biochemistry and Molecular Biology University of Georgia, Athens, GA 30602 Email: zc@csbl.bmb.uga.edu

More information

A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics

A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics JOURNAL OF CHEMICAL PHYSICS VOLUME 115, NUMBER 3 15 JULY 2001 A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics Dominik Gront Department of

More information

Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations

Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations Alexandre V. Morozov, Tanja Kortemme, Kiril Tsemekhman, David Baker

More information

Presenter: She Zhang

Presenter: She Zhang Presenter: She Zhang Introduction Dr. David Baker Introduction Why design proteins de novo? It is not clear how non-covalent interactions favor one specific native structure over many other non-native

More information

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback

More information

Ab initio protein structure prediction Corey Hardin*, Taras V Pogorelov and Zaida Luthey-Schulten*

Ab initio protein structure prediction Corey Hardin*, Taras V Pogorelov and Zaida Luthey-Schulten* 176 Ab initio protein structure prediction Corey Hardin*, Taras V Pogorelov and Zaida Luthey-Schulten* Steady progress has been made in the field of ab initio protein folding. A variety of methods now

More information

Protein Structure Prediction

Protein Structure Prediction Protein Structure Prediction Michael Feig MMTSB/CTBP 2009 Summer Workshop From Sequence to Structure SEALGDTIVKNA Folding with All-Atom Models AAQAAAAQAAAAQAA All-atom MD in general not succesful for real

More information

All-atom ab initio folding of a diverse set of proteins

All-atom ab initio folding of a diverse set of proteins All-atom ab initio folding of a diverse set of proteins Jae Shick Yang 1, William W. Chen 2,1, Jeffrey Skolnick 3, and Eugene I. Shakhnovich 1, * 1 Department of Chemistry and Chemical Biology 2 Department

More information

TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6

TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6 PROTEINS: Structure, Function, and Bioinformatics Suppl 7:91 98 (2005) TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6 Yang Zhang, Adrian K. Arakaki, and Jeffrey

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Useful background reading

Useful background reading Overview of lecture * General comment on peptide bond * Discussion of backbone dihedral angles * Discussion of Ramachandran plots * Description of helix types. * Description of structures * NMR patterns

More information

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon A Comparison of Methods for Assessing the Structural Similarity of Proteins Dean C. Adams and Gavin J. P. Naylor? Dept. Zoology and Genetics, Iowa State University, Ames, IA 50011, U.S.A. 1 Introduction

More information

Monte Carlo simulation of proteins through a random walk in energy space

Monte Carlo simulation of proteins through a random walk in energy space JOURNAL OF CHEMICAL PHYSICS VOLUME 116, NUMBER 16 22 APRIL 2002 Monte Carlo simulation of proteins through a random walk in energy space Nitin Rathore and Juan J. de Pablo a) Department of Chemical Engineering,

More information

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Protein Dynamics The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Below is myoglobin hydrated with 350 water molecules. Only a small

More information

Improved Beta-Protein Structure Prediction by Multilevel Optimization of NonLocal Strand Pairings and Local Backbone Conformation

Improved Beta-Protein Structure Prediction by Multilevel Optimization of NonLocal Strand Pairings and Local Backbone Conformation 65:922 929 (2006) Improved Beta-Protein Structure Prediction by Multilevel Optimization of NonLocal Strand Pairings and Local Backbone Conformation Philip Bradley and David Baker* University of Washington,

More information

Universal Similarity Measure for Comparing Protein Structures

Universal Similarity Measure for Comparing Protein Structures Marcos R. Betancourt Jeffrey Skolnick Laboratory of Computational Genomics, The Donald Danforth Plant Science Center, 893. Warson Rd., Creve Coeur, MO 63141 Universal Similarity Measure for Comparing Protein

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Protein Structure Analysis with Sequential Monte Carlo Method. Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University

Protein Structure Analysis with Sequential Monte Carlo Method. Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University Protein Structure Analysis with Sequential Monte Carlo Method Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University Introduction Structure Function & Interaction Protein structure

More information

Assignment 2 Atomic-Level Molecular Modeling

Assignment 2 Atomic-Level Molecular Modeling Assignment 2 Atomic-Level Molecular Modeling CS/BIOE/CME/BIOPHYS/BIOMEDIN 279 Due: November 3, 2016 at 3:00 PM The goal of this assignment is to understand the biological and computational aspects of macromolecular

More information

As of December 30, 2003, 23,000 solved protein structures

As of December 30, 2003, 23,000 solved protein structures The protein structure prediction problem could be solved using the current PDB library Yang Zhang and Jeffrey Skolnick* Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington Street,

More information

A Physical Approach to Protein Structure Prediction

A Physical Approach to Protein Structure Prediction 36 Biophysical Journal Volume 82 January 2002 36 49 A Physical Approach to Protein Structure Prediction Silvia Crivelli,* Elizabeth Eskow, Brett Bader, Vincent Lamberti, Richard Byrd, Robert Schnabel,

More information

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding?

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding? The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation By Jun Shimada and Eugine Shaknovich Bill Hawse Dr. Bahar Elisa Sandvik and Mehrdad Safavian Outline Background on protein

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target. HOMOLOGY MODELING Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental

More information

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation Jakob P. Ulmschneider and William L. Jorgensen J.A.C.S. 2004, 126, 1849-1857 Presented by Laura L. Thomas and

More information

Free Radical-Initiated Unfolding of Peptide Secondary Structure Elements

Free Radical-Initiated Unfolding of Peptide Secondary Structure Elements Free Radical-Initiated Unfolding of Peptide Secondary Structure Elements Thesis of the Ph.D. Dissertation by Michael C. Owen, M.Sc. Department of Chemical Informatics Faculty of Education University of

More information

DETECTING NATIVE PROTEIN FOLDS AMONG LARGE DECOY SETS WITH THE OPLS ALL-ATOM POTENTIAL AND THE SURFACE GENERALIZED BORN SOLVENT MODEL

DETECTING NATIVE PROTEIN FOLDS AMONG LARGE DECOY SETS WITH THE OPLS ALL-ATOM POTENTIAL AND THE SURFACE GENERALIZED BORN SOLVENT MODEL Computational Methods for Protein Folding: Advances in Chemical Physics, Volume 12. Edited by Richard A. Friesner. Series Editors: I. Prigogine and Stuart A. Rice. Copyright # 22 John Wiley & Sons, Inc.

More information

Replica Exchange with Solute Scaling: A More Efficient Version of Replica Exchange with Solute Tempering (REST2)

Replica Exchange with Solute Scaling: A More Efficient Version of Replica Exchange with Solute Tempering (REST2) pubs.acs.org/jpcb Replica Exchange with Solute Scaling: A More Efficient Version of Replica Exchange with Solute Tempering (REST2) Lingle Wang, Richard A. Friesner, and B. J. Berne* Department of Chemistry,

More information

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Biophysical Journal, Volume 98 Supporting Material Molecular dynamics simulations of anti-aggregation effect of ibuprofen Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Supplemental

More information

Aggregation of the Amyloid-β Protein: Monte Carlo Optimization Study

Aggregation of the Amyloid-β Protein: Monte Carlo Optimization Study John von Neumann Institute for Computing Aggregation of the Amyloid-β Protein: Monte Carlo Optimization Study S. M. Gopal, K. V. Klenin, W. Wenzel published in From Computational Biophysics to Systems

More information

EBBA: Efficient Branch and Bound Algorithm for Protein Decoy Generation

EBBA: Efficient Branch and Bound Algorithm for Protein Decoy Generation EBBA: Efficient Branch and Bound Algorithm for Protein Decoy Generation Martin Paluszewski og Pawel Winter Technical Report no. 08-08 ISSN: 0107-8283 Dept. of Computer Science University of Copenhagen

More information

Folding of small proteins using a single continuous potential

Folding of small proteins using a single continuous potential JOURNAL OF CHEMICAL PHYSICS VOLUME 120, NUMBER 17 1 MAY 2004 Folding of small proteins using a single continuous potential Seung-Yeon Kim School of Computational Sciences, Korea Institute for Advanced

More information

Protein Folding Prof. Eugene Shakhnovich

Protein Folding Prof. Eugene Shakhnovich Protein Folding Eugene Shakhnovich Department of Chemistry and Chemical Biology Harvard University 1 Proteins are folded on various scales As of now we know hundreds of thousands of sequences (Swissprot)

More information

Introduction to Computational Structural Biology

Introduction to Computational Structural Biology Introduction to Computational Structural Biology Part I 1. Introduction The disciplinary character of Computational Structural Biology The mathematical background required and the topics covered Bibliography

More information

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a Lecture 11: Protein Folding & Stability Margaret A. Daugherty Fall 2004 How do we go from an unfolded polypeptide chain to a compact folded protein? (Folding of thioredoxin, F. Richards) Structure - Function

More information

Docking. GBCB 5874: Problem Solving in GBCB

Docking. GBCB 5874: Problem Solving in GBCB Docking Benzamidine Docking to Trypsin Relationship to Drug Design Ligand-based design QSAR Pharmacophore modeling Can be done without 3-D structure of protein Receptor/Structure-based design Molecular

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Analysis and Prediction of Protein Structure (I)

Analysis and Prediction of Protein Structure (I) Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng

More information

Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials

Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials THE JOURNAL OF CHEMICAL PHYSICS 122, 024904 2005 Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials Alan E. van Giessen and John E. Straub Department

More information

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015, Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015, Course,Informa5on, BIOC%530% GraduateAlevel,discussion,of,the,structure,,func5on,,and,chemistry,of,proteins,and, nucleic,acids,,control,of,enzyma5c,reac5ons.,please,see,the,course,syllabus,and,

More information

Structural and mechanistic insight into the substrate. binding from the conformational dynamics in apo. and substrate-bound DapE enzyme

Structural and mechanistic insight into the substrate. binding from the conformational dynamics in apo. and substrate-bound DapE enzyme Electronic Supplementary Material (ESI) for Physical Chemistry Chemical Physics. This journal is the Owner Societies 215 Structural and mechanistic insight into the substrate binding from the conformational

More information

AB initio protein structure prediction or template-free

AB initio protein structure prediction or template-free IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 10, NO. X, XXXXXXX 2013 1 Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab Initio Protein Structure Prediction

More information

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror Please interrupt if you have questions, and especially if you re confused! Assignment

More information

Simulating Folding of Helical Proteins with Coarse Grained Models

Simulating Folding of Helical Proteins with Coarse Grained Models 366 Progress of Theoretical Physics Supplement No. 138, 2000 Simulating Folding of Helical Proteins with Coarse Grained Models Shoji Takada Department of Chemistry, Kobe University, Kobe 657-8501, Japan

More information

3DRobot: automated generation of diverse and well-packed protein structure decoys

3DRobot: automated generation of diverse and well-packed protein structure decoys Bioinformatics, 32(3), 2016, 378 387 doi: 10.1093/bioinformatics/btv601 Advance Access Publication Date: 14 October 2015 Original Paper Structural bioinformatics 3DRobot: automated generation of diverse

More information

Finding Similar Protein Structures Efficiently and Effectively

Finding Similar Protein Structures Efficiently and Effectively Finding Similar Protein Structures Efficiently and Effectively by Xuefeng Cui A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy

More information

It is not yet possible to simulate the formation of proteins

It is not yet possible to simulate the formation of proteins Three-helix-bundle protein in a Ramachandran model Anders Irbäck*, Fredrik Sjunnesson, and Stefan Wallin Complex Systems Division, Department of Theoretical Physics, Lund University, Sölvegatan 14A, S-223

More information

Protein Structure Determination

Protein Structure Determination Protein Structure Determination Given a protein sequence, determine its 3D structure 1 MIKLGIVMDP IANINIKKDS SFAMLLEAQR RGYELHYMEM GDLYLINGEA 51 RAHTRTLNVK QNYEEWFSFV GEQDLPLADL DVILMRKDPP FDTEFIYATY 101

More information

An Improved Protein Decoy Set for Testing Energy Functions for Protein Structure Prediction

An Improved Protein Decoy Set for Testing Energy Functions for Protein Structure Prediction PROTEINS: Structure, Function, and Bioinformatics 53:76 87 (2003) An Improved Protein Decoy Set for Testing Energy Functions for Protein Structure Prediction Jerry Tsai, 1 * Richard Bonneau, 2 Alexandre

More information

proteins Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field

proteins Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field proteins STRUCTURE O FUNCTION O BIOINFORMATICS Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field Dong Xu1 and Yang Zhang1,2* 1 Department

More information

Improving De novo Protein Structure Prediction using Contact Maps Information

Improving De novo Protein Structure Prediction using Contact Maps Information CIBCB 2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology Improving De novo Protein Structure Prediction using Contact Maps Information Karina Baptista dos Santos

More information

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia

More information

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics. Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond

More information

Biomolecules: lecture 10

Biomolecules: lecture 10 Biomolecules: lecture 10 - understanding in detail how protein 3D structures form - realize that protein molecules are not static wire models but instead dynamic, where in principle every atom moves (yet

More information

Hydrophobic Aided Replica Exchange: an Efficient Algorithm for Protein Folding in Explicit Solvent

Hydrophobic Aided Replica Exchange: an Efficient Algorithm for Protein Folding in Explicit Solvent 19018 J. Phys. Chem. B 2006, 110, 19018-19022 Hydrophobic Aided Replica Exchange: an Efficient Algorithm for Protein Folding in Explicit Solvent Pu Liu, Xuhui Huang, Ruhong Zhou,, and B. J. Berne*,, Department

More information

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO! Identification of Protein-model accuracy Why is it important? What is accuracy RMSD, fraction correct, Protein model correctness/quality

More information

Lecture 11: Protein Folding & Stability

Lecture 11: Protein Folding & Stability Structure - Function Protein Folding: What we know Lecture 11: Protein Folding & Stability 1). Amino acid sequence dictates structure. 2). The native structure represents the lowest energy state for a

More information

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding Lecture 11: Protein Folding & Stability Margaret A. Daugherty Fall 2003 Structure - Function Protein Folding: What we know 1). Amino acid sequence dictates structure. 2). The native structure represents

More information

FlexPepDock In a nutshell

FlexPepDock In a nutshell FlexPepDock In a nutshell All Tutorial files are located in http://bit.ly/mxtakv FlexPepdock refinement Step 1 Step 3 - Refinement Step 4 - Selection of models Measure of fit FlexPepdock Ab-initio Step

More information

Packing of Secondary Structures

Packing of Secondary Structures 7.88 Lecture Notes - 4 7.24/7.88J/5.48J The Protein Folding and Human Disease Professor Gossard Retrieving, Viewing Protein Structures from the Protein Data Base Helix helix packing Packing of Secondary

More information

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions: Van der Waals Interactions

More information

Novel Monte Carlo Methods for Protein Structure Modeling. Jinfeng Zhang Department of Statistics Harvard University

Novel Monte Carlo Methods for Protein Structure Modeling. Jinfeng Zhang Department of Statistics Harvard University Novel Monte Carlo Methods for Protein Structure Modeling Jinfeng Zhang Department of Statistics Harvard University Introduction Machines of life Proteins play crucial roles in virtually all biological

More information

Free Energy Landscape of Protein Folding in Water: Explicit vs. Implicit Solvent

Free Energy Landscape of Protein Folding in Water: Explicit vs. Implicit Solvent PROTEINS: Structure, Function, and Genetics 53:148 161 (2003) Free Energy Landscape of Protein Folding in Water: Explicit vs. Implicit Solvent Ruhong Zhou* IBM T.J. Watson Research Center, Yorktown Heights,

More information

arxiv: v1 [cond-mat.soft] 22 Oct 2007

arxiv: v1 [cond-mat.soft] 22 Oct 2007 Conformational Transitions of Heteropolymers arxiv:0710.4095v1 [cond-mat.soft] 22 Oct 2007 Michael Bachmann and Wolfhard Janke Institut für Theoretische Physik, Universität Leipzig, Augustusplatz 10/11,

More information

Abstract. Introduction

Abstract. Introduction In silico protein design: the implementation of Dead-End Elimination algorithm CS 273 Spring 2005: Project Report Tyrone Anderson 2, Yu Bai1 3, and Caroline E. Moore-Kochlacs 2 1 Biophysics program, 2

More information

Physiochemical Properties of Residues

Physiochemical Properties of Residues Physiochemical Properties of Residues Various Sources C N Cα R Slide 1 Conformational Propensities Conformational Propensity is the frequency in which a residue adopts a given conformation (in a polypeptide)

More information

TOUCHSTONE: A Unified Approach to Protein Structure Prediction

TOUCHSTONE: A Unified Approach to Protein Structure Prediction PROTEINS: Structure, Function, and Genetics 53:469 479 (2003) TOUCHSTONE: A Unified Approach to Protein Structure Prediction Jeffrey Skolnick, 1 * Yang Zhang, 1 Adrian K. Arakaki, 1 Andrzej Kolinski, 1,2

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information