Prediction of the structures of proteins with the UNRES force eld, including dynamic formation and breaking of disul de bonds

Size: px
Start display at page:

Download "Prediction of the structures of proteins with the UNRES force eld, including dynamic formation and breaking of disul de bonds"

Transcription

1 Protein Engineering, Design & Selection vol. 17 no. 1 pp. 29±36, 2004 DOI: /protein/gzh003 Prediction of the structures of proteins with the UNRES force eld, including dynamic formation and breaking of disul de bonds Cezary Czaplewski 1,2, Stanisøaw Oødziej 1,2, Adam Liwo 1,2,3 and Harold A.Scheraga 1,4 1 Baker Laboratory of Chemistry, Cornell University, Ithaca, NY , USA, 2 Faculty of Chemistry, University of GdanÂsk, ul. Sobieskiego 18, GdanÂsk and 3 Academic Computer Center in GdanÂsk TASK, ul. Narutowicza 11/12, Gdansk, Poland 4 To whom correspondence should be addressed. has5@cornell.edu The presence of disul de bonds is essential for maintaining the structure and function of many proteins. The disul de bonds are usually formed dynamically during folding. This process is not accounted for in present algorithms for protein-structure prediction, which either deduce the possible positions of disul de bonds only after the structure is formed or assume xed disul de bonds during the course of simulated folding. In this work, the conformational space annealing (CSA) method and the UNRES unitedresidue force eld were extended to treat dynamic formation of disul de bonds. A harmonic potential is imposed on the distance between disul de-bonded cysteine sidechain centroids to describe the energetics of bond distortion and an energy gain of 5.5 kcal/mol is added for disul de-bond formation. Formation, breaking and rearrangement of disul de bonds are included in the CSA search by introducing appropriate operations; the search can also be carried out with a xed disul de-bond arrangement. The algorithm was applied to four proteins: 1EI0 (a), 1NKL (a), 1L1I (b-helix) and 1ED0 (a + b). For 1EI0, a low-energy structure with correct fold was obtained both in the runs without and with disul de bonds; however, it was obtained as the lowest in energy only with the native disul de-bond arrangement. For the other proteins studied, structures with the correct fold were obtained as the lowest (1NKL and 1L1I) or lowenergy structures (1ED0) only in runs with disul de bonds, although the nal disul de-bond arrangement was non-native. The results demonstrate that, by including the possibility of formation of disul de bonds, the predictive power of the UNRES force eld is enhanced, even though the disul de-bond potential introduced here rarely produces disul de bonds in native positions. To the best of our knowledge, this is the rst algorithm for energy-based prediction of the structure of disul de-bonded proteins without any assumption as to the positions of native disul des or human intervention. Directions for improving the potentials and the search method are suggested. Keywords: conformational-space annealing/disul de bond/ global optimization/protein structure prediction/united residue Introduction Disul de bonds occur frequently in many proteins, especially in extracellular soluble globular proteins. These bonds provide stability to the native structure of a protein and may compensate for the absence of a signi cant hydrophobic core in small proteins (Betz, 1993; Petersen et al., 1999). It is still an unsettled question as to whether the disul de bonds are formed before or after the secondary structure elements are formed (Welker et al., 2001). Some disul de bonds are essential for maintaining the structure and function of the protein, while others can be broken without radically changing the properties of the protein. The addition of a single disul de bond can cause cooperative, global folding of the entire protein. For example, bovine pancreatic ribonuclease A with two (26±84, 58±110) out of four of the native disul de bonds has no conformational order, but species with one additional disul de (65±72 or 40± 95) have native-like structure (Lester et al., 1997; Wedemeyer et al., 2000). Protein folding simulations show that inclusion of disul de bonds as constraints reduces the conformational space that must be searched (Skolnick et al., 1997; Huang et al., 1999; Abkevich and Shakhnovich, 2000). Most protein structure prediction methods and protein folding simulations do not take into account dynamic disul de-bond formation during folding. The particular arrangement of disul de bonds is applied as a xed set of constraints. Some studies examine the disruption of native disul des and the incorporation of novel disul des and their effects on stability, but use different simulations with different, xed, arrangements of disul de bonds during simulations (Rey and Skolnick, 1994). Computer modeling of disul de bonds which can be introduced by protein engineering into various proteins of known structure to increase their stability relative to that of the wild type is fairly common (Zhou et al., 1993; Burton et al., 2000; Dani et al., 2003). A few studies have addressed the important problem of predicting the disul de bonding state of cysteines in proteins; these include the use of statistical methods (Fiser et al., 1992), a specially optimized threading potential (Dombkowski and Crippen, 2000), neural networks and hidden Markov models (Muskal et al., 1990; Martelli et al., 2002) and methods that combine local context and global information about protein sequences (Fiser and Simon, 2000; Mucchielli-Giorgi et al., 2002). Saitoà and co-workers (Watanabe et al., 1991; Kobayashi et al., 1992) developed a protein folding simulation method with dynamic formation/breaking of disul de bonds. Their procedure is based on an assumption that folding starts with the formation of secondary structures (a-helices and b-sheets) and then proceeds to assemble them into the tertiary structure. Consequently, the simulation starts from the conformation in which the secondary structure is already formed and other regions are extended. The search for the conformation of minimum energy is carried out by changing the dihedral angles only in regions other than the secondary structures. Packing of Protein Engineering, Design & Selection vol.17 no.1 ã Oxford University Press 2004; all rights reserved 29

2 C.Czaplewski et al. the secondary structures of a polypeptide chain is guided by introduction of appropriate hydrophobic interactions which are responsible for the construction of short-distance local structure. A strict algebraic relation involving the geometry that must be satis ed for two cysteine residues to form a disul de bond cannot be applied for distances too great to yield a disul de bond. Consequently, Watanabe et al. used a geometrical graphic representation for the locus of the hydrogen atom of the SH group in the cysteine residue to draw the distributions of the cysteines at the folding stage in which they come close during simulation. They introduced a bonding potential (Equation 1 in Watanabe et al., 1991) between selected pairs of cysteines provided only that the circles representing the possible positions of the hydrogen atoms of the SH groups are face-to-face, thereby making it easy for them to intersect. In this paper, we report an extension of our hierarchical procedure for protein-structure prediction (Liwo et al., 1999a; Pillardy et al., 2001a) to proteins containing disul de bonds. This extended procedure allows for dynamic formation and breaking of disul de bonds during the simulations. As in our previous work, an extensive search is carried out at the unitedresidue level with the UNRES force eld and the use of the conformational space annealing (CSA) search method (Lee et al., 1997, 1999; Czaplewski et al., 2003). However, both the UNRES force eld and the CSA procedure have been modi ed, in order to treat the possible formation of disul de bonds. The method was applied in the following sequence to proteins with all the basic types of secondary structure: an a-helical hairpin stabilized by two disul de bonds [a fragment of the p8mtcp1 protein; Protein Data Bank (PDB) code 1EI0, 38 residues], whose three-dimensional structure was determined by NMR spectroscopy (Barthe et al., 2000); NK-lysyin (PDB code 1NKL, 78 residues), whose structure was also determined from NMR data (Liepinsh et al., 1997) and contains three disul des and four a-helices; the thermal hysteresis protein isoform YL-1 (PDB code 1L1I, 84 residues), whose three-dimensional structure (in a b-helical form) was determined recently from NMR data (Daley et al., 2002) and contains eight disul des; and Viscotoxin A3 (PDB code 1ED0, 46 residues), whose structure was determined from NMR data (Romagnoli et al., 2000) as an a/b type protein with three disul de bonds, two a-helices and two short anti-parallel strands stabilized by one of the disul de bonds. Materials and methods The UNRES force eld In the UNRES model (Liwo et al., 1997a,b, 2001, 2002; Lee et al., 2001; Pillardy et al., 2001b), a polypeptide chain is represented by a sequence of a-carbon (C a ) atoms linked by virtual bonds with attached united side chains (SC) and united peptide groups (p). Each united peptide group is located in the middle between two consecutive a-carbons, with peptide group p i being located between C a i and C a i+1. Only these united peptide groups and the united side chains serve as interaction sites, the a-carbons serving only to de ne the chain geometry (see gure 1 of Liwo et al., 1997a). All virtual bond lengths (i.e. C a ±C a and C a ±SC) are xed; the distance between neighboring C a s is 3.8 AÊ, corresponding to trans peptide groups, while the side-chain angles (a SC and b SC ) and virtualbond (q) and dihedral (g) angles can vary. The energy of the virtual-bond chain is expressed by the equation: 30 U ˆ U SCi SC j w SCp U SCi p j w el U pi p j i < j i 6ˆ j i < j 1 w tor U tor g i w tord U tord g i ; g i 1 w b U b q i i i w rot U rot a SCi ; b SCi N corr i w m corr Um corr m ˆ 2 The term U SCi SC j represents the mean free energy of the hydrophobic (hydrophilic) interactions between the side chains, which implicitly contains the contributions from the interactions of the side chain with the solvent (potential of mean force). The term U SCi p j denotes the excluded-volume potential of the side chain±peptide group interactions. The peptide group interaction potential (U pi p j ) accounts mainly for the electrostatic interactions (i.e. the tendency to form backbone hydrogen bonds) between peptide groups p i and p j. U tor, U tord, U b and U rot represent the energies of virtual-dihedral angle torsions, double torsions, virtual-bond angle bending and side-chain rotamers, respectively; these terms account for the local propensities of the polypeptide chain. Details of the parameterization of all of these terms are provided in earlier publications (Liwo et al., 1997a,b). Finally, the terms U m corr, m = 1, 2, ¼, N corr, are the correlation or multibody contributions from a cumulant expansion (Liwo et al., 2001, 2003) of the restricted free energy (RFE) and the ws are the weights of the energy terms. The multibody terms are indispensable for reproduction of regular a-helical and b-sheet structures. The UNRES force eld has been derived as an RFE function of an all-atom polypeptide chain plus the surrounding solvent, where the all-atom energy function is averaged over the degrees of freedom that are lost when passing from the all-atom to the simpli ed system. This approach enabled us to derive the U m corr, m =1,2,¼,N corr multibody terms by a generalized cumulant expansion of the RFE developed by Kubo (1962). The internal parameters of the individual Us were derived by tting the analytical expressions to the RFE surfaces of model systems (Liwo et al., 2001) or by tting the calculated distribution functions to those determined from the PDB (Liwo et al., 1997b). The ws (the weights of the energy terms), the internal parameters of the energy U m corr terms and the mean free energies of side-chain interactions of the U SCi SC j energy term were optimized by a hierarchical design of the potential-energy landscape (Liwo et al., 2002). The optimization method assumes a hierarchical structure of the energy landscape, which means that the energy decreases as the number of native-like elements in a structure increases, being lowest for structures from the native family and highest for structures with no native-like element. A level of the hierarchy is de ned as a family of structures with the same number of native-like elements (or degree of native likeness). Optimization of a potential-energy function is aimed at achieving such a hierarchical structure of the energy landscape by forcing appropriate free-energy gaps between hierarchy levels to place their energies in ascending order from the native to the most unfolded structure. This procedure is different from the method used earlier, in which the energy gap and/or the Z score between the native structure and all non-native structures were maximized, regardless of the degree of native-likeness of the non-native structures (Liwo et al., 1997b; Lee et al., 2001; Pillardy et al., 2001b). 1IGD, an (a+b)-type protein, was used as a training protein for optimization of the internal parameters i 1

3 Protein structure prediction with UNRES force eld of the U m corr energy terms (A.Liwo et al., unpublished data), while the ws and the internal parameters of U SCiSCj were optimized by using a set of four proteins (PDB codes 1E0G, 1E0L, 1GAB and 1IGD) (S.Oødziej et al., unpublished data). The UNRES force eld is able to predict the structures of proteins containing both a-helical and b-sheet structures with a reasonable degree of accuracy, as assessed by tests on model proteins (Lee et al., 1999; Liwo et al., 1999b; Pillardy et al., 2001a) and also in the CASP3 (Lee et al., 1999, 2000; Orengo et al., 1999), CASP4 (Pillardy et al., 2001a) and CASP5 (Czaplewski et al., 2002) blind prediction experiments. In order to describe the energetics of disul de bonds, for the pair of half-cystines that forms a disul de bond, we replace the U SCi SC j energy term of Equation 1 by the following function: E Cysi Cys j ˆ E Cys Cys 0:5 k Cys Cys d Cysi Cys j d Cys Cys 2 2 It should be noted that d Cysi Cys j is the distance between the centers of cysteine side chains and not the distance between the sulfur atoms of the bond. E Cys±Cys = ±5.5 kcal/mol is the energy of formation of a non-strained disul de bond from two half-cystine residues. This energy has been estimated on the basis of the energy of formation of a single disul de bond in proteins that has been measured experimentally to be ±3.5 kcal/ mol (Doig and Williams, 1991) and the energy of non-bonded interactions between cysteine side chains estimated from the Miyazawa±Jernigan cysteine±cysteine contact energy (Miyazawa and Jernigan, 1985) on the basis of Equation 3 of Liwo et al. (1993) to be ±2.0 kcal/mol. The values of d Cys±Cys = 4.2 AÊ and k Cys±Cys = 6.6 kcal/(mol AÊ 2 ) were estimated on the basis of the average distance between cysteine side-chain centroids in disul de bonds calculated from ECEPP/ 3 geometry (NeÂmethy et al., 1992) and the ECEPP/3 torsional constants of the C b ±S g ±S g ±C b dihedral angle (NeÂmethy et al., 1992). The conformational space annealing method CSA (Lee et al., 1997, 1999, 2000; Czaplewski et al., 2003) is a hybrid method which combines genetic algorithms, essential aspects of the build-up method and a local gradient-based minimization. The method is based on the idea of conformational space annealing: in the early stages, it enforces a broad conformational search and then gradually focuses the search into smaller regions with low energy. The CSA searching method allows one to focus on many different groups of lowenergy protein structures, one of which is presumably the native structure. The CSA method begins with a randomly-generated population of conformations which are energy minimized to generate the rst bank of conformations. From the initial population, a number of conformations (called seeds) are selected as parents for the trial population. These `seed' conformations are altered in a non-random fashion to create new trial conformations. As in any genetic algorithm, the trial population is generated by the use of genetic operators: mutations and crossovers. Attention is paid to ensure that all trial conformations are signi cantly different from each other and from the parent conformations. After generation, all trial conformations are energy minimized. The next step of the CSA algorithm is the update of the current population (the bank) without increasing its size. Each trial conformation is compared with each existing conformation of the bank. If the trial conformation is similar to an existing conformation of the bank, only the lower energy conformation of these two is preserved. If the trial conformation is not similar to any existing conformation in the bank, it represents a new distinct region of conformational space. Then it replaces the highest energy conformation in the bank, if its energy is lower than the highest energy in the bank, otherwise it is discarded. The distance between conformations i and j is de ned as the differences of their virtual-bond angles and virtual-bond dihedral angles (Equation 9 of Lee et al., 2000). If the distance, D ij, is less than or equal to some prede ned cutoff value, D cut, conformations i and j are considered similar, otherwise they are considered different. CSA achieves its ef ciency by beginning with a large value of D cut essentially to search all possible structures and then gradually reduces (`anneals') D cut by reducing the minimum distance between the conformations of the bank and focusing the search in lowenergy regions of conformational space. After updating the current population, the seed conformations are selected from the set of conformations not selected as seeds previously. Introducing new operators for dynamic disul de-bond formation and breaking into CSA The CSA run with dynamic disul de-bond arrangement allows for changing the positions of disul de bonds. The only information supplied to the procedure are the positions of cysteines, but the links between them are unknown. In other words, any two cysteines are allowed to be in a bonded state. The following new genetic operators have been introduced into CSA to treat the formation and breaking of disul de links; these processes are well known to in uence folding pathways considerably (Staley and Kim, 1992; Weissman and Kim, 1992). 1. Formation of new bonds. All `seed' conformations are analyzed for the presence of non-bonded cysteines, whose sidechain centers are closer than 7 AÊ and the two speci ed cysteines are more than l residues apart in the amino acid sequence. In this work l = 1, unless stated otherwise. For each seed and each pair satisfying this criterion, a trial conformation is generated and, for the pair of cysteines designated as bonded, the hydrophobic-interaction potential U SCi SC j is replaced by the disul de-bond potential given by Equation Breaking of existing bonds. For each seed conformation with disul de-bonded cysteine pairs, a pair (a disul de bond) is selected at random. Then a new trial conformation is generated in which the selected bond is broken. Consequently, for the selected pair of cysteines, the disul de-bond energy given by Equation 2 is replaced by the respective hydrophobic-interaction energy U SCi SC j (Equation 1). 3. Exchange of links between disul de bonds and free cysteines. If a free cysteine (of number i in the sequence) in a `seed' conformation is found close to a disul de bond (formed by cysteines j and k), i.e. if its distance from Cys j or Cys k is less than 7 AÊ, the trial conformations with the links Cys i ±Cys j and Cys i ±Cys k, formed instead of Cys j ±Cys k, are generated. An additional criterion on the distance in the amino acid sequence between bonded cysteines is applied as in point Exchange of links between pairs of disul de bonds. If a disul de bond Cys i ±Cys j is found close to disul de bond Cys k ± Cys l in a `seed' conformation (i.e. if one of the distances Cys i...cys k, Cys i...cys l, Cys j...cys k or Cys j...cys l is less than 7 AÊ ), trial conformations with the exchange of the current disul de links into two alternative possibilities (Cys i ±Cys k, 31

4 C.Czaplewski et al. Table I. Energies, r.m.s.ds with respect to native structure and number of disul de bonds for representative structures found in the CSA simulations Protein/run type Relative energy (kcal/mol) R.m.s.d. (AÊ ) No. of disul de bonds No. of native disul de bonds Figure 1EI0 Native, 1A No disul des F Fixed, native B Dynamic F D C Dynamic, native start F E 1EI0 Dedicated force eld Native, 2A No disul des Fixed, native D Dynamic C Dynamic, native start B 1NKL Native, 3A No disul des Fixed, native C Dynamic D Dynamic, native start B 1L1I Native, 4A No disul des Fixed, native B Dynamic Dynamic, native start C Dynamic, native start, Cys i ±Cys j (j ± i > 3) D 1ED0 Native, 5A No disul des B Fixed, native D Dynamic Dynamic, native start C Cys j ±Cys l or Cys i ±Cys l, Cys j ±Cys k ) are generated. An additional criterion on the distance in the amino acid sequence between bonded cysteines is applied as in point 1. Trial conformations generated with existing CSA operators inherit the arrangement of disul de bonds from the parent `seed' conformation but only if the distance between the bonded cysteine side-chain centroids is smaller than 7 AÊ after the mutation or crossover operation. The CSA run with variable disul de-bond arrangement can start with no disul de bonds in the randomly generated rst bank or any xed disul de-bond arrangement for the rst randomly generated conformations. In this paper, only runs with no disul de bonds and runs with native disul de-bond arrangement in the rst bank were used in CSA with variable disul de-bond arrangement. The CSA simulation with xed disul de-bond arrangement assumes that all the pairs of cysteines between which the bonds are to be formed have been speci ed. Each of these pairs is in a `bonded' (`oxidized') state and the energy of interactions between the cysteine residues is expressed by Equation 2. New operators for dynamic disul de-bond formation and breaking are not used in the CSA runs with xed disul de-bond arrangement. For each protein, four types of simulations were carried out (i) with dynamic disul de-bond arrangement starting with a 32 random bank of conformations with no disul des; (ii) with xed (native) disul de-bond arrangement; (iii) with dynamic disul de-bond arrangement starting with a random bank of conformations with native disul de-bond arrangement; and (iv) assuming that no disul de bonds can be formed. For smaller proteins (1EI0, 1ED0), minimizations per CSA run were used whereas for larger proteins (1NKL, 1L1I), more than one run of each kind with around minimizations was carried out. Results Energies, r.m.s.d. values from the experimental structures and the numbers of native disul de bonds for all types of CSA runs for the proteins studied are summarized in Table I, while representative structures are shown in Figures 1±5. Modi cations of the UNRES force eld and of the CSA global optimization procedure which allow formation of disul de bonds were rst tested on a small a-helical protein with a helix±turn±helix fold and two disul des linking both helices together, identi ed in the PDB as 1EI0 (Barthe et al., 2000) (see Figure 1A). It should be stressed that 1EI0 was not used in the force- eld optimization. The CSA global optimization runs, assuming that no disul de bonds can be formed, led to a global-minimum (GM) structure in the form of a long

5 Protein structure prediction with UNRES force eld Fig. 1. Native structure of 1EI0 (A); the lowest energy structure from the run with xed (native) disul de-bond arrangement, an energy 8.4 kcal/mol higher than the GM and an r.m.s.d. of 3.9 AÊ with respect to native (B); the closest to native structure from the run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with no disul des, an energy 13.7 kcal/mol higher than the GM and an r.m.s.d. of 2.2 AÊ (C); the structure from the run with dynamic disul de-bond arrangement, starting with a random bank of conformations with no disul des, with one native disul de bond formed, an energy 2.9 kcal/mol higher than the GM and an r.m.s.d. of 3.4 AÊ (D); the structure with both native disul de bonds formed in the run with a dynamic disul de-bond arrangement starting with a random bank of conformations with the native disul de-bond arrangement, an energy 36 kcal/mol higher than the GM and an r.m.s.d. of 4.4 AÊ (E); the GM is a straight a-helix (F). a-helix with an r.m.s.d. of 13.6 AÊ from the native structure (see Figure 1F) and also to a structure with correct fold (not shown), with an r.m.s.d. for the C a atoms of 4.1 AÊ from the average NMR structure in the PDB and an energy only 0.7 kcal/mol higher than the GM. However, in the native-like structure found in this 4.1 AÊ r.m.s.d. run, the distances between cysteine centroids are fairly large: 11.2 AÊ for Cys3±Cys34 and 12.6 AÊ for Cys13±Cys24. A CSA run with a xed (native) disul de-bond arrangement produced only native-like structures; the lowest energy was 8.4 kcal/mol higher than the GM from the previous run and the lowest energy structure has an r.m.s.d. of 3.9 AÊ (see Figure 1B) with both native disul de bonds. A CSA run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with no disul des, found the same GM as the run in which no disul des were allowed (Figure 1F). No low-energy structures with both native disul des were present in the nal population. A large number of native-like structures were found, but with only one native disul de bond. The structure closest to native in this run has an r.m.s.d. of 2.2 AÊ and an energy of 13.7 kcal/ mol higher than the GM and only the Cys13±Cys24 disul de bond was formed (see Figure1C). In the same run, structures with the correct fold with only the Cys3±Cys34 disul de bond formed were found; the structure with an energy of 2.9 kcal/ mol higher than the GM and an r.m.s.d. of 3.4 AÊ is shown in Figure 1D. The CSA run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with the native disul de-bond arrangement, gave similar results; the only difference is that a high-energy structure (36 kcal/mol higher than the GM) with both native disul de bonds formed and an Fig. 2. Native structure of 1EI0 (A) and three structures from CSA runs with the re-optimized UNRES force eld (including the 1EI0 protein in the set of training proteins): the lowest energy structure, the GM, with an r.m.s.d. of 3.5 AÊ with respect to native found in the run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with the native disul de-bond arrangement (B); the structure with both native disul des formed in the run with a dynamic disul de-bond arrangement,starting with a random bank of conformations with no disul des, an energy 17.1 kcal/mol higher than the GM and an r.m.s.d. of 2.5 AÊ (C); the structure with the lowest energy from the run with a xed (native) disul de-bond arrangement, an energy 10.5 kcal/mol higher than GM and an r.m.s.d. of 3.6 AÊ (D). r.m.s.d. of 4.4 AÊ was present in the nal population (see Figure 1E). To check the in uence of the force eld used in the CSA runs, which allow dynamic formation of disul de bonds, we reoptimized the UNRES force eld by including the 1EI0 protein in the set of training proteins (PDB codes 1E0G, 1E0L, 1GAB, 1IGD and 1EI0). The disul de-bond energy parameters of Equation 2 were not optimized; only the internal parameters of U SCi SC j and the weights of the energy terms had been changed. In the CSA run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with the native disul de-bond arrangement, the GM for 1EI0 was found with the new force eld; it had the correct fold and one disul de (Cys3±Cys34) formed and an r.m.s.d. of 3.5 AÊ with respect to the native (see Figure 2B). The lowest energy structure obtained by CSA, assuming that no disul de bonds can be formed, has an r.m.s.d. of 3.6 AÊ and an energy 3.8 kcal/mol higher than the GM and inter-cysteine distances of 5.6 and 8.8 AÊ for Cys3±Cys34 and Cys13±Cys24, respectively. In the CSA run with a dynamic disul de-bond arrangement starting with a random bank of conformations with no disul des, the native-like structure with both disul des was found as the structure with an energy 17.1 kcal/mol higher than the GM and an r.m.s.d. of 2.5 AÊ with respect to native (see Figure 2C). The lowest energy structure in the CSA run with xed (native) disul de-bond arrangement has an r.m.s.d. of 3.6 AÊ and was 10.5 kcal/mol higher than the GM (see Figure 2D). Even the force eld optimized for the 1EI0 protein does not lead to a structure of 1EI0 with two disul des formed as one of very low energy; the reason is the wrong inter-helical angle in the lowenergy structures obtained with this version of the UNRES force eld. 33

6 C.Czaplewski et al. Fig. 3. Native structure of 1NKL (A); the lowest energy structure, GM, with an r.m.s.d. of 5.4 AÊ with respect to native and one non-native disul de formed, found in the run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with the native disul de-bond arrangement (B); the lowest energy from a run with a xed (native) disul de-bond arrangement, an energy 17.5 kcal/mol higher than the GM and an r.m.s.d. of 5.1 AÊ (C); the structure from the CSA run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with no disul des with one native disul de formed, an energy of 40.6 kcal/ mol higher than GM and an r.m.s.d. of 4.9AÊ (D). The next test case was the four a-helix bundle 1NKL protein with three disul des (Liepinsh et al., 1997) (see Figure 3A). Only the original force eld, optimized on four training proteins (PDB codes 1E0G, 1E0L, 1GAB, 1IGD), was used. Assuming that no disul de bonds can be formed, all CSA runs found structures with the correct fold but they were 49.6 kcal/ mol higher in energy than the non-native three-helix cyclic-like structure which had the lowest (9.4 kcal/mol) energy in these runs. The GM, which has the correct fold, was found by the CSA run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with the native disul debond arrangement (Figure 3B). It has an r.m.s.d. of 5.4 AÊ with respect to the native structure and has 9.4 kcal/mol lower energy than the low-energy non-native structures found in the former CSA runs without disul des. Only one disul de bond is present and it is non-native. The lowest energy structure in the CSA run with a xed (native) disul de-bond arrangement has an r.m.s.d. of 5.1 AÊ and was 17.5 kcal/mol higher than the GM (see Figure 3C). An example of a structure from the CSA run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with no disul des, is shown in Figure 3D. This structure is 40.6 kcal/mol higher in energy than the GM and has an r.m.s.d. of 4.9 AÊ. Only one native disul de bond, Cys35± Cys45, forms easily. It should be noted that, although the lowest energy structure of 1NKL does not have the native disul de-bond arrangement, the very possibility of disul de-bond formation resulted in location of the structure with the correct fold as the lowest energy structure, as opposed to runs without the possibility of disul de-bond formation. From Figure 3B, it can be seen that, although the six cysteine residues do not form the native bonds, they are qualitatively positioned as in the native structure. This suggests that dynamic formation, breaking and rearrangement 34 Fig. 4. Native structure of 1LI1 (A); the lowest energy from the run with xed (native) disul de-bond arrangement, an energy 25.7 kcal/mol higher than the GM and an r.m.s.d. of 5.7 AÊ with respect to the native structure (B); the lowest energy structure, GM, from the run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with the native disul de-bond arrangement, with an r.m.s.d. of 6.1 AÊ and six non-native disul de bonds (C); the low-energy structure with four native disul des found in the run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with the native disul de-bond arrangement with the additional restriction of at least three residues in the loop between cysteines forming a disul de bond, an energy 46.8 kcal/mol higher than GM and an r.m.s.d. of 7.7 AÊ (D). Fig. 5. Native structure of 1ED0 (A) and the lowest energy, GM, non-native structure found in the run assuming that no disul de bonds exist (B); the structure with the correct fold from the run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with the native disul de-bond arrangement, an energy 14.3 kcal/mol higher than the GM and an r.m.s.d. of 4.8 AÊ with respect to native (C); the native-like structure from the run with the xed (native) disul de-bond arrangement, an energy of 17.0 kcal/mol higher than GM and an r.m.s.d. of 4.9 AÊ (D). of disul de bonds guided the CSA search to nd the correct fold, although the disul de-bond potential introduced in this work cannot predict the correct disul de-bond arrangement. An analysis of the history of the CSA search indicated that a single native disul de bond was formed in low-energy structures during the course of the run (data not shown), which explains why it helped to nd the native fold. The population of intermediate structures contained all native disul de bonds, but only one was present in a particular structure. However, because of the imperfection of the force

7 Protein structure prediction with UNRES force eld eld, after the native-like fold was reached a lower energy was achieved with non-native disul de bonds. The b-helical protein with eight disul des, identi ed in the PDB with code 1L1I, has been chosen to test the algorithm on proteins with many disul de bonds (Daley et al., 2002) (see Figure 4A). The original force eld, optimized on four training proteins (PDB codes 1E0G, 1E0L, 1GAB, 1IGD), was used. It should be stressed that no similar fold is present in any of the training proteins. Assuming that no disul de bonds can be formed, all CSA runs found only non-native structures, the lowest energy structure of these being a at six-stranded b-sheet built out of consecutive hairpins (not shown). However, the GM, which has the correct fold, was found by the CSA run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with the native disul de-bond arrangement (see Figure 4C). It has an r.m.s.d. of 6.1 AÊ with respect to native and has 32.9 kcal/mol lower energy than the low-energy non-native structures found in the former CSA runs without disul des. Six disul de bonds are present but all of them are non-native. It should be noted that the energy gain due to the formation of a single disul de bond is equal to 3.5 kcal/mol (see the section The UNRES force eld, in Materials and methods). Therefore, the energy gain is greater than that due to the formation of six bonds. The CSA run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with a native disul de-bond arrangement, was repeated with an additional restriction of at least three residues in the loop between cysteines forming a disul de bond. Consequently, the Cys18±Cys21 bond was forbidden to form. The lowest energy structure found by the new run has the correct fold with an r.m.s.d. of 7.7 AÊ, four native disul des and an energy 46.8 kcal/mol higher than the GM (see Figure 4D). The lowest energy structure in the CSA run with the xed (native) disul de-bond arrangement has an r.m.s.d. of 5.7 AÊ and is 25.7 kcal/mol higher in energy than the GM (see Figure 4B). As in the case of 1NKL, although the lowest energy structure does not have a native disul de-bond arrangement, the structure has a largely correct fold and the cysteines are arranged as in the native structure. Unlike 1NKL, where the formation of disul de bonds only guided the search, but the lowest energy structure has only one non-native disul de bond, here the native b-helical structure is entirely stabilized by disul de bonds, although they are not native. The next test case was a small a/b protein identi ed in the PDB with code 1ED0 (Romagnoli et al., 2000) (see Figure 5A). The original force eld, optimized on four training proteins (PDB codes 1E0G, 1E0L, 1GAB, 1IGD), was used. Assuming that no disul de bonds can be formed, the CSA run found only non-native structures. One of them is the GM which is a at ve-stranded b-sheet built out of consecutive hairpins with a sixth strand packed to the surface of this b-sheet (see Figure 5B). The structures with the correct fold were found by CSA runs with a dynamic disul de-bond arrangement. A representative structure found by the CSA run with a dynamic disul de-bond arrangement, starting with a random bank of conformations with the native disul de-bond arrangement, is shown in Figure 5C. It has an r.m.s.d. of 4.8 AÊ with respect to native and is 14.3 kcal/mol higher in energy than the lowenergy non-native structures found in the former CSA runs without disul des. Only one disul de bond is present and it is non-native. The lowest energy structure in the CSA run with a xed (native) disul de-bond arrangement has an r.m.s.d. of 6.8 AÊ and an energy 13.2 kcal/mol higher than the GM and bad packing of the secondary-structure elements. The closest-tonative structure from this run has an r.m.s.d. of 4.9 AÊ and an energy 17.0 kcal/mol higher than the non-native GM (see Figure 5D). Discussion The results presented in this paper show that it is possible to extend the united-residue force- eld and search procedure (CSA) developed earlier to study proteins containing disul des. To the best of our knowledge, this is the rst algorithm for energy-based prediction of the structure of disul de-bonded proteins without any assumption as to the positions of native disul des or human intervention. The earlier work of Watanabe et al. (1991) concerned packing of xed secondary-structure elements with a limited search of loop conformations. Moreover, formation of disul de bonds was not guided by energy as in our approach, but the decision was arbitrarily based on a geometrical graphic representation, as described in the Introduction. For 1NKL and 1L1I, the lowest energy structures obtained with inclusion of dynamic disul de-bond formation had the correct fold, as opposed to runs in which the formation of disul des was ignored. These structures did not have all native disul de bonds; however, the positions of the cysteine residues were qualitatively the same as in the native structure. This suggests that the role of formation of disul de bonds as a factor stabilizing the native structure or guiding the folding process towards the native structure is reasonably well accounted for by the modi ed CSA search procedure proposed in this work, even though the disul de-bond potential is not perfect. As indicated in the Results section, the search is guided towards the native structure probably because some of the native disul de bonds are formed during the run, although they can be broken or rearranged in the nal structure. It must be stressed that the examples studied in this work show that it is not possible to use only the hydrophobic side-chain potential in united-residue simulations on proteins with disul de bonds. Only in the case of 1EI0 is the structure with the correct fold low in energy without including disul de-bond formation. If the formation of disul de bonds is not assumed, the simulation results not only in incorrect packing, but also in incorrect secondary structures. Based on the results presented in this paper, we propose the following future modi cations. The energy term describing the formation of a disul de bond should include an angular dependence in addition to the distance dependence used in the current version of the algorithm. The lack of an angular dependence led to overpopulation of the structures in which the cysteines are very close in the sequence (see Results). The formation of short-range disul des is easy from an entropic point of view, which is represented by a distance dependence, but bonds (like disul des) leading to short-range loops are more restricted by the stiffness of the peptide backbone and the chemical nature of the disul de bond itself and their formation can be described by an angular and a distance dependence. The case of the 1L1I protein shows the de ciency in the current mutation operators for formation/breaking disul de bonds in CSA. Generally, to increase the probability of formation of a disul de bond, we plan to introduce more complicated genetic operators. Based on our experience, the following new genetic operators are necessary: (i) copy the 35

8 C.Czaplewski et al. disul de bond present in one conformation to another one, (ii) global perturbation of a conformation without disrupting the disul de bond(bonds) that it already contains, (iii) small local perturbation of the backbone and side chains around cysteine moieties which are geometrically close to each other but still too far to be considered as bonded. Additionally, the possibility of formation of a disul de bond between two free cysteines should be based not only on the distance between their centroids, but also on the orientation of the respective C a ± SC vectors. All these changes in the search procedure should speed up the search and increase the probability of generating structures with correct disul de bonds. Acknowledgements We thank Jarosøaw Pillardy, Daniel Ripoll and Jorge Vila for helpful comments on this paper. This work was supported by grants from the National Institutes of Health (GM-14312), the National Science Foundation (MCB ), the Fogarty Foundation (TW1064) and grant BW/ from the Polish State Committee for Scienti c Research (KBN). Support was also received from the National Foundation for Cancer Research. This research was conducted by using the resources of (a) the National Science Foundation Terascale Computing System at the Pittsburgh Supercomputer Center, (b) our 392-processor Beowulf cluster at the Baker Laboratory of Chemistry and Chemical Biology, Cornell University and (c) our 45-processor Beowulf cluster at the Faculty of Chemistry, University of GdanÂsk. References Abkevich,V.I. and Shakhnovich,E.I. (2000) J. Mol. Biol., 300, 975±985. Barthe,P., Rochette,S., Vita,C. and Roumestand,C. (2000) Protein Sci., 9, 942±955. Betz,S.F. (1993) Protein Sci., 2, 1551±1558. Burton,R.E., Hunt,J.A., Fierke,C.A. and Oas,T.G. (2000) Protein Sci., 9, 776± 785. Czaplewski,C. et al. (2002) In Fifth Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction. predictioncenter.llnl.gov/casp5/casp5.html Czaplewski,C., Liwo,A., Pillardy,J., Oødziej,S. and Scheraga,H.A. (2003) Polymer, in press. Daley,M.E., Spyracopoulos,L., Jia,Z., Davies,P.L. and Sykes,B.D. (2002) Biochemistry, 41, 5515±5525. Dani,V.S., Ramakrishnan,C. and Varadarajan,R. (2003) Protein Eng., 16, 187±193. Doig,A.J. and Williams,D.H. (1991) J. Mol. Biol., 217, 389±398. Dombkowski,A.A. and Crippen,G.M. (2000) Protein Eng., 13, 679±689. Fiser,A. and Simon,I. (2000) Bioinformatics, 16, 251±256. Fiser,A., CserzoÈ,M., TuÈdoÈs,E. and Simon,I. (1992) FEBS Lett., 302, 117±120. Huang,E.S., Samudrala,R. and Ponder,J.W. (1999) J. Mol. Biol., 290, 267± 281. Kobayashi,Y., Sasabe,H., Akutsu,T. and SaitoÃ,N. (1992) Biophys. Chem., 44, 113±127. Kubo,R. (1962) J. Phys. Soc. Jpn., 17, 1100±1120. Lee,J., Scheraga,H.A. and Rackovsky,S. (1997) J. Comput. Chem., 18, 1222± Lee,J., Liwo,A. and Scheraga,H.A. (1999) Proc. Natl Acad. Sci. USA, 96, 2025±2030. Lee,J., Liwo,A., Ripoll,D.R., Pillardy,J., Saunders,J.A., Gibson,K.D. and Scheraga,H.A. (2000) Int. J. Quantum Chem., 71, 90±117. Lee,J., Ripoll,D.R., Czaplewski,C., Pillardy,J., Wedemeyer,W.J. and Scheraga,H.A. (2001) J. Phys. Chem. B, 105, 7291±7298. Lester,C.C., u,., Laity,J.H., Shimotakahara,S. and Scheraga,H.A. (1997) Biochemistry, 36, 13068± Liepinsh,E., Andersson,M., Ruysschaert,J.M. and Otting,G. (1997) Nat. Struct. Biol., 4, 793±795. Liwo,A., Pincus,M.R., Wawak,R.J., Rackovsky,S. and Scheraga,H.A. (1993) Protein Sci., 2, 1715±1731. Liwo,A., Oødziej,S., Pincus,M.R., Wawak,R.J., Rackovsky,S. and Scheraga,H.A. (1997a) J. Comput. Chem., 18, 849±873. Liwo,A., Pincus,M.R., Wawak,R.J., Rackovsky,S., Oødziej,S. and Scheraga,H.A. (1997b) J. Comput. Chem., 18, 874±887. Liwo,A., Lee,J., Ripoll,D.R., Pillardy,J. and Scheraga,H.A. (1999a) Proc. Natl Acad. Sci. USA, 96, 5482±5485. Liwo,A., Pillardy,J., Kazmierkiewicz,R., Wawak,R.J., Groth,M., 36 Czaplewski,C., Oødziej,S. and Scheraga,H.A. (1999b) Theor. Chem. Acc., 101, 16±20. Liwo,A., Czaplewski,C., Pillardy,J. and Scheraga,H.A. (2001) J. Chem. Phys., 115, 2323±2347. Liwo,A., Arøukowicz,P., Czaplewski,C., Oødziej,S., Pillardy,J. and Scheraga,H.A. (2002) Proc. Natl Acad. Sci. USA, 99, 1937±1942. Liwo,A., Oødziej,S., Czaplewski,C., Kozøowska,U. and Scheraga,H.A. (2003) J. Phys. Chem. B, in press. Martelli,P.L., Fariselli,P., Malaguti,L. and Casadio,R. (2002) Protein Eng., 15, 951±953. Miyazawa,S. and Jernigan,R.L. (1985) Macromolecules, 18, 534±552. Mucchielli-Giorgi,M.H., Hazout,S. and Tuffery,P. (2002) Proteins: Struct. Funct. Genet., 46, 243±249. Muskal,S.M., Holbrook,S.R. and Kim,S.H. (1990) Protein Eng., 3, 667±672. NeÂmethy,G., Gibson,K.D., Palmer,K.A., Yoon,C.N., Paterlini,G., Zagari,A., Rumsey,S. and Scheraga,H.A. (1992) J. Phys. Chem., 96, 6472±6484. Orengo,C.A., Bray,J.E., Hubbard,T., LoConte,L. and Sillitoe,I. (1999) Proteins: Struct. Funct. Genet., Suppl. 3, 149±170. Petersen,M.T.N., Jonson,P.H. and Petersen,S.B. (1999) Protein Eng., 12, 535± 548. Pillardy,J. et al. (2001a) Proc. Natl Acad. Sci. USA, 98, 2329±2333. Pillardy,J., Czaplewski,C., Liwo,A., Wedemeyer,W.J., Lee,J., Ripoll,D.R., Arøukowicz,P., Oødziej,S., Arnautova,Y.A. and Scheraga,H.A. (2001b) J. Phys. Chem. B, 105, 7299±7311. Rey,A. and Skolnick,J. (1994) J. Chem. Phys., 100, 2267±2276. Romagnoli,S., Ugolini,R., Fogolari,F., Schaller,G., Urech,K., Giannattasio,M., Ragona,L. and Molinari,H. (2000) Biochem. J., 350, 569±577. Skolnick,J., Kolinski,A. and Ortiz,A.R. (1997) J. Mol. Biol., 265, 217±241. Staley,J.P. and Kim,P.S. (1992) Proc. Natl Acad. Sci. USA, 89, 1519±1523. Watanabe,K., Nakamura,A., Fukuda,Y. and SaitoÃ,N. (1991) Biophys. Chem., 40, 293±301. Wedemeyer,W.J., Welker,E., Narayan,M. and Scheraga,H.A. (2000) Biochemistry, 39, 4207±4216. Weissman,J.S. and Kim,P.S. (1992) Science, 256, 112±114. Welker,E., Wedemeyer,W.J., Narayan,M. and Scheraga,H.A. (2001) Biochemistry, 40, 9059±9064. Zhou,N.E., Kay,C.M. and Hodges,R.S. (1993) Biochemistry, 32, 3178±3187. Accepted October 16, 2003 Edited by Valerie Daggett

Ab-initio protein structure prediction

Ab-initio protein structure prediction Ab-initio protein structure prediction Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center, Cornell University Ithaca, NY USA Methods for predicting protein structure 1. Homology

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information

Folding of small proteins using a single continuous potential

Folding of small proteins using a single continuous potential JOURNAL OF CHEMICAL PHYSICS VOLUME 120, NUMBER 17 1 MAY 2004 Folding of small proteins using a single continuous potential Seung-Yeon Kim School of Computational Sciences, Korea Institute for Advanced

More information

JOOYOUNG LEE*, ADAM LIWO*, AND HAROLD A. SCHERAGA*

JOOYOUNG LEE*, ADAM LIWO*, AND HAROLD A. SCHERAGA* Proc. Natl. Acad. Sci. USA Vol. 96, pp. 2025 2030, March 1999 Biophysics Energy-based de novo protein folding by conformational space annealing and an off-lattice united-residue force field: Application

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein

More information

Design of a Protein Potential Energy Landscape by Parameter Optimization

Design of a Protein Potential Energy Landscape by Parameter Optimization J. Phys. Chem. B 2004, 108, 4525-4534 4525 Design of a Protein Potential Energy Landscape by Parameter Optimization Julian Lee,,, Seung-Yeon Kim, and Jooyoung Lee*, Department of Bioinformatics and Life

More information

The protein folding problem consists of two parts:

The protein folding problem consists of two parts: Energetics and kinetics of protein folding The protein folding problem consists of two parts: 1)Creating a stable, well-defined structure that is significantly more stable than all other possible structures.

More information

Computer simulations of protein folding with a small number of distance restraints

Computer simulations of protein folding with a small number of distance restraints Vol. 49 No. 3/2002 683 692 QUARTERLY Computer simulations of protein folding with a small number of distance restraints Andrzej Sikorski 1, Andrzej Kolinski 1,2 and Jeffrey Skolnick 2 1 Department of Chemistry,

More information

Motif Prediction in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions

More information

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure

More information

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Protein Dynamics The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Below is myoglobin hydrated with 350 water molecules. Only a small

More information

BIOINFORMATICS. Limited conformational space for early-stage protein folding simulation

BIOINFORMATICS. Limited conformational space for early-stage protein folding simulation BIOINFORMATICS Vol. 20 no. 2 2004, pages 199 205 DOI: 10.1093/bioinformatics/btg391 Limited conformational space for early-stage protein folding simulation M. Bryliński 1,3, W. Jurkowski 1,3, L. Konieczny

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling

More information

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions: Van der Waals Interactions

More information

Useful background reading

Useful background reading Overview of lecture * General comment on peptide bond * Discussion of backbone dihedral angles * Discussion of Ramachandran plots * Description of helix types. * Description of structures * NMR patterns

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html

More information

Protein Folding Prof. Eugene Shakhnovich

Protein Folding Prof. Eugene Shakhnovich Protein Folding Eugene Shakhnovich Department of Chemistry and Chemical Biology Harvard University 1 Proteins are folded on various scales As of now we know hundreds of thousands of sequences (Swissprot)

More information

Protein Folding. I. Characteristics of proteins. C α

Protein Folding. I. Characteristics of proteins. C α I. Characteristics of proteins Protein Folding 1. Proteins are one of the most important molecules of life. They perform numerous functions, from storing oxygen in tissues or transporting it in a blood

More information

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Biophysical Journal, Volume 98 Supporting Material Molecular dynamics simulations of anti-aggregation effect of ibuprofen Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Supplemental

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

NIH Public Access Author Manuscript Phys Rev Lett. Author manuscript; available in PMC 2013 April 16.

NIH Public Access Author Manuscript Phys Rev Lett. Author manuscript; available in PMC 2013 April 16. NIH Public Access Author Manuscript Published in final edited form as: Phys Rev Lett. 2013 March 1; 110(9): 098101. Mean-field interactions between nucleic-acid-base dipoles can drive the formation of

More information

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a Lecture 11: Protein Folding & Stability Margaret A. Daugherty Fall 2004 How do we go from an unfolded polypeptide chain to a compact folded protein? (Folding of thioredoxin, F. Richards) Structure - Function

More information

Introduction to" Protein Structure

Introduction to Protein Structure Introduction to" Protein Structure Function, evolution & experimental methods Thomas Blicher, Center for Biological Sequence Analysis Learning Objectives Outline the basic levels of protein structure.

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Ruhong Zhou 1 and Bruce J. Berne 2 1 IBM Thomas J. Watson Research Center; and 2 Department of Chemistry,

More information

Protein Structure Prediction, Engineering & Design CHEM 430

Protein Structure Prediction, Engineering & Design CHEM 430 Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment

More information

arxiv: v1 [cond-mat.soft] 22 Oct 2007

arxiv: v1 [cond-mat.soft] 22 Oct 2007 Conformational Transitions of Heteropolymers arxiv:0710.4095v1 [cond-mat.soft] 22 Oct 2007 Michael Bachmann and Wolfhard Janke Institut für Theoretische Physik, Universität Leipzig, Augustusplatz 10/11,

More information

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed. Macromolecular Processes 20. Protein Folding Composed of 50 500 amino acids linked in 1D sequence by the polypeptide backbone The amino acid physical and chemical properties of the 20 amino acids dictate

More information

Identifying the Protein Folding Nucleus Using Molecular Dynamics

Identifying the Protein Folding Nucleus Using Molecular Dynamics doi:10.1006/jmbi.1999.3534 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 296, 1183±1188 COMMUNICATION Identifying the Protein Folding Nucleus Using Molecular Dynamics Nikolay V.

More information

arxiv:cond-mat/ v1 [cond-mat.soft] 19 Mar 2001

arxiv:cond-mat/ v1 [cond-mat.soft] 19 Mar 2001 Modeling two-state cooperativity in protein folding Ke Fan, Jun Wang, and Wei Wang arxiv:cond-mat/0103385v1 [cond-mat.soft] 19 Mar 2001 National Laboratory of Solid State Microstructure and Department

More information

Lecture 34 Protein Unfolding Thermodynamics

Lecture 34 Protein Unfolding Thermodynamics Physical Principles in Biology Biology 3550 Fall 2018 Lecture 34 Protein Unfolding Thermodynamics Wednesday, 21 November c David P. Goldenberg University of Utah goldenberg@biology.utah.edu Clicker Question

More information

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding?

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding? The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation By Jun Shimada and Eugine Shaknovich Bill Hawse Dr. Bahar Elisa Sandvik and Mehrdad Safavian Outline Background on protein

More information

Basics of protein structure

Basics of protein structure Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu

More information

From Amino Acids to Proteins - in 4 Easy Steps

From Amino Acids to Proteins - in 4 Easy Steps From Amino Acids to Proteins - in 4 Easy Steps Although protein structure appears to be overwhelmingly complex, you can provide your students with a basic understanding of how proteins fold by focusing

More information

Lecture 11: Protein Folding & Stability

Lecture 11: Protein Folding & Stability Structure - Function Protein Folding: What we know Lecture 11: Protein Folding & Stability 1). Amino acid sequence dictates structure. 2). The native structure represents the lowest energy state for a

More information

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding Lecture 11: Protein Folding & Stability Margaret A. Daugherty Fall 2003 Structure - Function Protein Folding: What we know 1). Amino acid sequence dictates structure. 2). The native structure represents

More information

Protein Structure Basics

Protein Structure Basics Protein Structure Basics Presented by Alison Fraser, Christine Lee, Pradhuman Jhala, Corban Rivera Importance of Proteins Muscle structure depends on protein-protein interactions Transport across membranes

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr

More information

Aggregation of the Amyloid-β Protein: Monte Carlo Optimization Study

Aggregation of the Amyloid-β Protein: Monte Carlo Optimization Study John von Neumann Institute for Computing Aggregation of the Amyloid-β Protein: Monte Carlo Optimization Study S. M. Gopal, K. V. Klenin, W. Wenzel published in From Computational Biophysics to Systems

More information

Supersecondary Structures (structural motifs)

Supersecondary Structures (structural motifs) Supersecondary Structures (structural motifs) Various Sources Slide 1 Supersecondary Structures (Motifs) Supersecondary Structures (Motifs): : Combinations of secondary structures in specific geometric

More information

Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates

Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates MARIUSZ MILIK, 1 *, ANDRZEJ KOLINSKI, 1, 2 and JEFFREY SKOLNICK 1 1 The Scripps Research Institute, Department of Molecular

More information

Introduction to Computational Structural Biology

Introduction to Computational Structural Biology Introduction to Computational Structural Biology Part I 1. Introduction The disciplinary character of Computational Structural Biology The mathematical background required and the topics covered Bibliography

More information

ALL LECTURES IN SB Introduction

ALL LECTURES IN SB Introduction 1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL

More information

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur Lecture - 06 Protein Structure IV We complete our discussion on Protein Structures today. And just to recap

More information

Prediction and refinement of NMR structures from sparse experimental data

Prediction and refinement of NMR structures from sparse experimental data Prediction and refinement of NMR structures from sparse experimental data Jeff Skolnick Director Center for the Study of Systems Biology School of Biology Georgia Institute of Technology Overview of talk

More information

Clustering of low-energy conformations near the native structures of small proteins

Clustering of low-energy conformations near the native structures of small proteins Proc. Natl. Acad. Sci. USA Vol. 95, pp. 11158 11162, September 1998 Biophysics Clustering of low-energy conformations near the native structures of small proteins DAVID SHORTLE*, KIM T. SIMONS, AND DAVID

More information

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure Bioch/BIMS 503 Lecture 2 Structure and Function of Proteins August 28, 2008 Robert Nakamoto rkn3c@virginia.edu 2-0279 Secondary Structure Φ Ψ angles determine protein structure Φ Ψ angles are restricted

More information

Protein Structure Determination

Protein Structure Determination Protein Structure Determination Given a protein sequence, determine its 3D structure 1 MIKLGIVMDP IANINIKKDS SFAMLLEAQR RGYELHYMEM GDLYLINGEA 51 RAHTRTLNVK QNYEEWFSFV GEQDLPLADL DVILMRKDPP FDTEFIYATY 101

More information

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram 15 Dana Alsulaibi Jaleel G.Sweis Mamoon Ahram Revision of last lectures: Proteins have four levels of structures. Primary,secondary, tertiary and quaternary. Primary structure is the order of amino acids

More information

Physiochemical Properties of Residues

Physiochemical Properties of Residues Physiochemical Properties of Residues Various Sources C N Cα R Slide 1 Conformational Propensities Conformational Propensity is the frequency in which a residue adopts a given conformation (in a polypeptide)

More information

Oxidative folding is the composite process by which a protein

Oxidative folding is the composite process by which a protein Structural determinants of oxidative folding in proteins Ervin Welker*, Mahesh Narayan*, William J. Wedemeyer, and Harold A. Scheraga Baker Laboratory of Chemistry and Chemical Biology, Cornell University,

More information

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES Protein Structure W. M. Grogan, Ph.D. OBJECTIVES 1. Describe the structure and characteristic properties of typical proteins. 2. List and describe the four levels of structure found in proteins. 3. Relate

More information

Biology Chemistry & Physics of Biomolecules. Examination #1. Proteins Module. September 29, Answer Key

Biology Chemistry & Physics of Biomolecules. Examination #1. Proteins Module. September 29, Answer Key Biology 5357 Chemistry & Physics of Biomolecules Examination #1 Proteins Module September 29, 2017 Answer Key Question 1 (A) (5 points) Structure (b) is more common, as it contains the shorter connection

More information

Energy Minimization of Protein Tertiary Structure by Parallel Simulated Annealing using Genetic Crossover

Energy Minimization of Protein Tertiary Structure by Parallel Simulated Annealing using Genetic Crossover Minimization of Protein Tertiary Structure by Parallel Simulated Annealing using Genetic Crossover Tomoyuki Hiroyasu, Mitsunori Miki, Shinya Ogura, Keiko Aoi, Takeshi Yoshida, Yuko Okamoto Jack Dongarra

More information

Universal Similarity Measure for Comparing Protein Structures

Universal Similarity Measure for Comparing Protein Structures Marcos R. Betancourt Jeffrey Skolnick Laboratory of Computational Genomics, The Donald Danforth Plant Science Center, 893. Warson Rd., Creve Coeur, MO 63141 Universal Similarity Measure for Comparing Protein

More information

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models Protein Modeling Generating, Evaluating and Refining Protein Homology Models Troy Wymore and Kristen Messinger Biomedical Initiatives Group Pittsburgh Supercomputing Center Homology Modeling of Proteins

More information

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation Jakob P. Ulmschneider and William L. Jorgensen J.A.C.S. 2004, 126, 1849-1857 Presented by Laura L. Thomas and

More information

Protein structure forces, and folding

Protein structure forces, and folding Harvard-MIT Division of Health Sciences and Technology HST.508: Quantitative Genomics, Fall 2005 Instructors: Leonid Mirny, Robert Berwick, Alvin Kho, Isaac Kohane Protein structure forces, and folding

More information

Computer simulation of polypeptides in a confinement

Computer simulation of polypeptides in a confinement J Mol Model (27) 13:327 333 DOI 1.17/s894-6-147-6 ORIGINAL PAPER Computer simulation of polypeptides in a confinement Andrzej Sikorski & Piotr Romiszowski Received: 3 November 25 / Accepted: 27 June 26

More information

arxiv:cond-mat/ v1 2 Feb 94

arxiv:cond-mat/ v1 2 Feb 94 cond-mat/9402010 Properties and Origins of Protein Secondary Structure Nicholas D. Socci (1), William S. Bialek (2), and José Nelson Onuchic (1) (1) Department of Physics, University of California at San

More information

Central Dogma. modifications genome transcriptome proteome

Central Dogma. modifications genome transcriptome proteome entral Dogma DA ma protein post-translational modifications genome transcriptome proteome 83 ierarchy of Protein Structure 20 Amino Acids There are 20 n possible sequences for a protein of n residues!

More information

Simulating Folding of Helical Proteins with Coarse Grained Models

Simulating Folding of Helical Proteins with Coarse Grained Models 366 Progress of Theoretical Physics Supplement No. 138, 2000 Simulating Folding of Helical Proteins with Coarse Grained Models Shoji Takada Department of Chemistry, Kobe University, Kobe 657-8501, Japan

More information

Ab initio protein structure prediction Corey Hardin*, Taras V Pogorelov and Zaida Luthey-Schulten*

Ab initio protein structure prediction Corey Hardin*, Taras V Pogorelov and Zaida Luthey-Schulten* 176 Ab initio protein structure prediction Corey Hardin*, Taras V Pogorelov and Zaida Luthey-Schulten* Steady progress has been made in the field of ab initio protein folding. A variety of methods now

More information

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome

More information

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC The precise definition of a dihedral or torsion angle can be found in spatial geometry Angle between to planes Dihedral

More information

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Supporting Information Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Christophe Schmitz, Robert Vernon, Gottfried Otting, David Baker and Thomas Huber Table S0. Biological Magnetic

More information

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds Protein Structure Hierarchy of Protein Structure 2 3 Structural element Primary structure Secondary structure Super-secondary structure Domain Tertiary structure Quaternary structure Description amino

More information

Conformational Geometry of Peptides and Proteins:

Conformational Geometry of Peptides and Proteins: Conformational Geometry of Peptides and Proteins: Before discussing secondary structure, it is important to appreciate the conformational plasticity of proteins. Each residue in a polypeptide has three

More information

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig part of Bioinformatik von RNA- und Proteinstrukturen Computational EvoDevo University Leipzig Leipzig, SS 2011 Protein Structure levels or organization Primary structure: sequence of amino acids (from

More information

Monte Carlo simulation of proteins through a random walk in energy space

Monte Carlo simulation of proteins through a random walk in energy space JOURNAL OF CHEMICAL PHYSICS VOLUME 116, NUMBER 16 22 APRIL 2002 Monte Carlo simulation of proteins through a random walk in energy space Nitin Rathore and Juan J. de Pablo a) Department of Chemical Engineering,

More information

Introduction The gramicidin A (ga) channel forms by head-to-head association of two monomers at their amino termini, one from each bilayer leaflet. Th

Introduction The gramicidin A (ga) channel forms by head-to-head association of two monomers at their amino termini, one from each bilayer leaflet. Th Abstract When conductive, gramicidin monomers are linked by six hydrogen bonds. To understand the details of dissociation and how the channel transits from a state with 6H bonds to ones with 4H bonds or

More information

Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials

Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials THE JOURNAL OF CHEMICAL PHYSICS 122, 024904 2005 Monte Carlo simulations of polyalanine using a reduced model and statistics-based interaction potentials Alan E. van Giessen and John E. Straub Department

More information

1. What is an ångstrom unit, and why is it used to describe molecular structures?

1. What is an ångstrom unit, and why is it used to describe molecular structures? 1. What is an ångstrom unit, and why is it used to describe molecular structures? The ångstrom unit is a unit of distance suitable for measuring atomic scale objects. 1 ångstrom (Å) = 1 10-10 m. The diameter

More information

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Zhong Chen Dept. of Biochemistry and Molecular Biology University of Georgia, Athens, GA 30602 Email: zc@csbl.bmb.uga.edu

More information

Simulation of mutation: Influence of a side group on global minimum structure and dynamics of a protein model

Simulation of mutation: Influence of a side group on global minimum structure and dynamics of a protein model JOURNAL OF CHEMICAL PHYSICS VOLUME 111, NUMBER 8 22 AUGUST 1999 Simulation of mutation: Influence of a side group on global minimum structure and dynamics of a protein model Benjamin Vekhter and R. Stephen

More information

Novel Monte Carlo Methods for Protein Structure Modeling. Jinfeng Zhang Department of Statistics Harvard University

Novel Monte Carlo Methods for Protein Structure Modeling. Jinfeng Zhang Department of Statistics Harvard University Novel Monte Carlo Methods for Protein Structure Modeling Jinfeng Zhang Department of Statistics Harvard University Introduction Machines of life Proteins play crucial roles in virtually all biological

More information

Computer design of idealized -motifs

Computer design of idealized -motifs Computer design of idealized -motifs Andrzej Kolinski a) University of Warsaw, Department of Chemistry, Pasteura 1, 02-093 Warsaw, Poland and The Scripps Research Institute, Department of Molecular Biology,

More information

A Method for the Improvement of Threading-Based Protein Models

A Method for the Improvement of Threading-Based Protein Models PROTEINS: Structure, Function, and Genetics 37:592 610 (1999) A Method for the Improvement of Threading-Based Protein Models Andrzej Kolinski, 1,2 * Piotr Rotkiewicz, 1,2 Bartosz Ilkowski, 1,2 and Jeffrey

More information

Bioinformatics: Secondary Structure Prediction

Bioinformatics: Secondary Structure Prediction Bioinformatics: Secondary Structure Prediction Prof. David Jones d.jones@cs.ucl.ac.uk LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem? Entries

More information

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015, Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015, Course,Informa5on, BIOC%530% GraduateAlevel,discussion,of,the,structure,,func5on,,and,chemistry,of,proteins,and, nucleic,acids,,control,of,enzyma5c,reac5ons.,please,see,the,course,syllabus,and,

More information

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO! Identification of Protein-model accuracy Why is it important? What is accuracy RMSD, fraction correct, Protein model correctness/quality

More information

All-atom ab initio folding of a diverse set of proteins

All-atom ab initio folding of a diverse set of proteins All-atom ab initio folding of a diverse set of proteins Jae Shick Yang 1, William W. Chen 2,1, Jeffrey Skolnick 3, and Eugene I. Shakhnovich 1, * 1 Department of Chemistry and Chemical Biology 2 Department

More information

Problem Set 1

Problem Set 1 2006 7.012 Problem Set 1 Due before 5 PM on FRIDAY, September 15, 2006. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. For each of the following parts, pick

More information

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia

More information

Packing of Secondary Structures

Packing of Secondary Structures 7.88 Lecture Notes - 4 7.24/7.88J/5.48J The Protein Folding and Human Disease Professor Gossard Retrieving, Viewing Protein Structures from the Protein Data Base Helix helix packing Packing of Secondary

More information

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE Examples of Protein Modeling Protein Modeling Visualization Examination of an experimental structure to gain insight about a research question Dynamics To examine the dynamics of protein structures To

More information

Supplementary Figures:

Supplementary Figures: Supplementary Figures: Supplementary Figure 1: The two strings converge to two qualitatively different pathways. A) Models of active (red) and inactive (blue) states used as end points for the string calculations

More information

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions Van der Waals Interactions

More information

Homework Problem Set 4 Solutions

Homework Problem Set 4 Solutions Chemistry 380.37 Dr. Jean M. Standard omework Problem Set 4 Solutions 1. A conformation search is carried out on a system and four low energy stable conformers are obtained. Using the MMFF force field,

More information

Protein Structure Prediction

Protein Structure Prediction Protein Structure Prediction Michael Feig MMTSB/CTBP 2006 Summer Workshop From Sequence to Structure SEALGDTIVKNA Ab initio Structure Prediction Protocol Amino Acid Sequence Conformational Sampling to

More information

Protein structure analysis. Risto Laakso 10th January 2005

Protein structure analysis. Risto Laakso 10th January 2005 Protein structure analysis Risto Laakso risto.laakso@hut.fi 10th January 2005 1 1 Summary Various methods of protein structure analysis were examined. Two proteins, 1HLB (Sea cucumber hemoglobin) and 1HLM

More information

New Optimization Method for Conformational Energy Calculations on Polypeptides: Conformational Space Annealing

New Optimization Method for Conformational Energy Calculations on Polypeptides: Conformational Space Annealing New Optimization Method for Conformational Energy Calculations on Polypeptides: Conformational Space Annealing JOOYOUNG LEE, 1,2 HAROLD A. SCHERAGA, 2 S. RACKOVSKY 1 1 Department of Biomathematical Sciences,

More information

Section Week 3. Junaid Malek, M.D.

Section Week 3. Junaid Malek, M.D. Section Week 3 Junaid Malek, M.D. Biological Polymers DA 4 monomers (building blocks), limited structure (double-helix) RA 4 monomers, greater flexibility, multiple structures Proteins 20 Amino Acids,

More information

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach Author manuscript, published in "Journal of Computational Intelligence in Bioinformatics 2, 2 (2009) 131-146" Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach Omar GACI and Stefan

More information