TetraBASE: A Side Chain-Independent Statistical Energy for Designing Realistically Packed Protein Backbones

Size: px
Start display at page:

Download "TetraBASE: A Side Chain-Independent Statistical Energy for Designing Realistically Packed Protein Backbones"

Transcription

1 This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes. Cite This: pubs.acs.org/jcim TetraBASE: A Side Chain-Independent Statistical Energy for Designing Realistically Packed Protein Backbones Huanyu Chu, and Haiyan Liu*,,, School of Life Sciences, University of Science and Technology of China, Hefei, Anhui China Hefei National Laboratory for Physical Sciences at the Microscales, Hefei, Anhui China Collaborative Innovation Center of Chemistry for Life Sciences, Hefei, Anhui China Downloaded via on October 17, 2018 at 16:26:14 (UTC). See for options on how to legitimately share published articles. ABSTRACT: To construct backbone structures of high designability is a primary aspect of computational protein design. We report here a side chain-independent statistical energy that aims at realistic modeling of through-space packing of polypeptide backbones. To mitigate the lack of explicit amino acid side chains, the model treats the interbackbone site packing as being dependent on peptide local conformation. In addition, new variables suitable for statistical analysis, one for relative orientation and another for distance, have been introduced to represent the intersite geometry based on the asymmetrical tetrahedron organization of distinct chemical groups surrounding the Cα-carbon atoms. The resulting tetrahedron-based backbone statistical energy (tetrabase) model has been used to optimize the tertiary organizations of secondary structure elements (SSEs) of designated types with Monte Caro simulated annealing, starting from artificial initial configurations. The tetrabase minimum energy structures can reproduce SSE packing frequently observed in native proteins with atomic root-mean-square deviations of 1 2 Å. The model has also been tested by examining the stability of native SSE arrangements under tetrabase. The results suggest that tetrabase model can be used to effectively represent interbackbone packing when designing backbone structures without explicitly knowing side chain types. INTRODUCTION Recently, computational protein design has achieved remarkable successes in engineering de novo proteins of non-native backbones To a large extent, these successes have relied on innovative ways to identify designable backbone structures that fulfilled respective design goals. So far, only a restrictive number of approaches have been proven to be effective to define designable backbones. They mostly considered geometric constraints on backbone structures, such as sequenceindependent rules about preferred lengths of secondary structure elements and loops linking them 3,4 or parametric equations about relative geometries between helices or other structural units. 5 8 These constraints, usually in heuristic forms, were used to build blueprints, which were then used to guide the assembling of peptide fragments into complete backbones. 12 With this strategy, most secondary structure elements and local structural features in the resulting backbones are of ideal forms. 3 6,9,10 Apart from a few examples, it is still challenging to design de novo proteins that are of the rich, nonideal structural features as manifested by native proteins. 11,13 One type of approach to this end is to recombine substructures from different native protein structures. 11 Other than this, a general approach to design peptide backbones of unrestricted diversity would be to consider direct structure modeling of varied trial sequences. However, if complete side chains were to be modeled with atomistic models, the energy surface would be highly frustrated, making searching in the backbone conformational space computationally expensive The searching in the sequence space by itself being already computational very demanding, 17 incorporating large backbone adjustments using such simultaneous searching of both the sequence space and the backbone conformational space may remain impractical even with the rapid growth of computer power, at least for larger proteins without internal symmetries. An alternative way of de novo backbone design is to develop a general sequence-independent energy function that can be used to generate realistic backbone structures with usual conformational sampling and optimization techniques. The main goal of such an energy function would be to make the decoupled search in the backbone conformational space and in the sequence space a better approximation than that can be achieved with simple homopolypeptide or pseudoresidue models. Compared with a model that includes atomistic side chains, such a side chain-unspecialized energy function, as some of the coarse-grained side chain models, 18 may also bring about Received: November 26, 2017 Published: January 9, American Chemical Society 430

2 the benefits of a less-frustrated energy landscape, such as swift convergence of conformation optimization. There have been previous researches on whether sampling or optimization using side chain-independent general energy functions could yield meaningful free energy minima of polypeptide backbones As an earlier example, Hoang et al. considered a simple physical model of inter-residue interactions including steric interaction, hydrogen bond, and hydrophobicity. 19 Results of conformational sampling led them to propose that these interactions place geometrical and energetic constraints that presculpt the free energy landscape of protein structures, giving rise to a limited number of broad free energy minima to which the actual protein folds belong. Using generic simplified or all-atom models, Zhang et al. generated hydrogen-bonded, secondary structure-containing and compact structures of homopolypeptides. 20 They proposed that at certain resolutions, there were one-to-one correspondences between conformations in the generated set and folds of known single domain proteins. More recently, Cossio et al. explored the free energy landscape of the Val60 homopolypeptide with atomistic molecular dynamics simulations. 21 They reported that the sampled set of conformations contained observed protein folds of similar lengths, although the sampled and observed sets could not be considered as equivalent. Kukic et al. also explored the conformational space of Val60 with molecular dynamics simulations, albeit with a coarse-grained model similar to that used by Hoang et al. 22 They came to the conclusion that just as the fully atomistic homopolymer model, the free energy minima of this simplified CamTube model could recapitulate the diversity of protein folds observed in native protein structures. Rather than aiming at producing realistic backbone structures susceptible to sequence design, these previous studies have aimed at coarsely contouring the free energy landscape of polypeptides. Thus, they have focused on global topologies or overall folds of the computationally constructed structures. Being similar to real protein backbones at TM-score 23,24 levels of only 0.4 to , most of these structures were probably not realistic enough to serve as target backbones with current sequence design algorithms. Aiming at generating de novo scaffolds for protein design, MacDonald et al. developed an α-carbon potential energy function, in which realistic modeling of backbone local conformations (namely, the backbone conformation of a few consecutive residues) was emphasized. 25,26 Decoy backbones sampled with this energy function were shown to reproduce realistic protein backbone local conformations. In a loop modeling experiments, the model was shown to be able to compete with methods that used known protein backbone fragments with similar sequences or used residue specific φ,ψmaps to restrict the search space. Recently, this model was used to design de novo loops projecting from the scaffold core of synthetic beta-solenoid repeat proteins. In an experimentally solved structure containing a designed loop, the loop was found to closely match the designed structure. 27 These results suggest that sequence-independent model could indeed be used to design atomistically realistic backbones. In the model of MacDonald et al., while the realistic modeling of peptide local conformations has been emphasized, the through-space packing between different parts of a polypeptide backbone was described only with a simple soft steric repulsive term together with a pseudohydrogen bond term To maintain intersecondary structure element packing, extra heuristic restraints were applied. In the current paper, we report a statistical energy for the realistic modeling of through-space packing of polypeptide backbones. Statistical energies are derived from known sequence and structural data of native proteins and their complexes They are computable models of molecular interactions and have been widely used in protein structure modeling and design Most commonly, statistical energies have been defined to be dependent on one or two geometric parameters, such as a single interatomic distance or one or two (torsional) angles. 30,32,33,36 Recently, for the purpose of designing sequences for given backbone structures, we have developed and experimentally verified a statistical energy function in which the effects of peptide local conformation, local structural environment as well as inter-residue geometries on amino-acid sequences have been considered jointly. 31,39 In the current work, for the purpose of designing backbones without prespecified sequences, a statistical energy function to characterize the packing between two backbone sites has been defined to depend on peptide local conformations to mitigate the lack of explicit sequence information. Besides this, a new representation of interbackbone site packing geometries is introduced, so that the resulting packing energy can reflect not only distance but also orientation dependencies. This is achieved by using the asymmetrical tetrahedron configuration surrounding the α-carbon atoms to define reduced geometrical variables for statistical analysis. We refer to this model as tetrabase (standing for tetrahedron-backbone-arrangement statistical energy) and test it with two types of calculation. In one type, tertiary organizations of SSEs of designated types and lengths are optimized with simulated annealing, starting from artificial initial configurations. The resulting minimum energy structures are compared with existing backbone structures in Protein Data Bank (PDB) using structure alignment. In the second type, the spatial arrangements of secondary structure elements (SSEs) in native protein structures are examined for stability under tetrabase. The results suggest that the tetrabase minima correspond to atomistically realistic backbone structures. RESULTS tetrabase Statistical Energy Describes Local Conformation, Orientation, and Distance-Dependent Packing. Each tetrabase term is intended to be used to describe the through-space packing between two backbone sites that are separated from each other in sequence. In the current work, we focus on the intersecondary structure element (SSE) packing in a backbone architecture that contains a set of M SSEs without any linking loops. Here, a SSE refers to a α-helix or a β-strand, different strands in a single β-sheet considered as separate SSEs. For SSE m, its secondary structure is noted as SS m, its length noted as l m. The system s SSE composition is defined by the number of SSEs (M), the secondary structure types of the SSEs ({SS m, m = 1,2,...,M}) and the lengths of the SSEs ({l m, m = 1,2,...,M}). Each SSE is represented by all the backbone sites it contains. A backbone site i is represented by the main chain nonhydrogen atoms in residue i, namely, the N, Cα, C and O atoms. We collectively note the coordinates of these atoms as r i. The configuration of the system, r N, is completely specified as 431

3 Figure 1. Examples of the orientation categories and the corresponding distance-dependent energy curves. Backbone position pairs are contained in the following types of interacting secondary structure elements: (a) two antiparallel strands, (b) a helix and an antiparallel strand, and (c) two antiparallel helices. The positions are identified with integer numbers. The energy is in unit of log e e. In (a), the tetrahedrons surrounding the central Cα positions of one strand and the interstrand hydrogen bonds are indicated. In this and in all other figures in this paper, graphs of molecular structures have been prepared using the PyMol program. 40 N 1 2 l1 l1+ 1 l1+ 2 l1+ l2 element 1 element 2 r {{ r, r,..., r }, { r, r,..., r },..., { r,..., r,... l1+ l lm 1+ 1 l1+ l lm 1+ lm element m,{ r,..., r }} l1+ l lm 1+ 1 l1+ l lm 1+ lm last element In the current work, each SSE has been treated as a rigid body whose internal structure has been (randomly) taken from a native protein structure. This allowed us to consider only inter-sse packing as the target for optimization. For this purpose, the total tetrabase energy has been defined as the sum of pairwise inter-sse contributions, which are in turn sums of interbackbone site pairwise terms, namely, M 1 M N SS = mssn E( r ) e ( r, r) i j m= 1 n= m+ 1 i SSE j SSE m As explained, eq 1 does not contain terms that describe intra- SSE interactions. n } (1) The individual terms e SS mss n (r i,r j )ineq 1 have been defined to be dependent not only on the atomic coordinates of backbone sites i and j but also on their respective local conformation contexts (here the SS type combination SS m and SS n ). Given the SS type combination, e SS mss n (r i,r j ) is further simplified to depend first on a discrete category of the relative orientation and then on a distance variable. The definition of the orientation category has been inspired by the asymmetric tetrahedron arrangements of atoms/functional groups of different chemical nature surrounding the Cα atoms. More specifically, it is defined by the closest pair of vertex atoms combined with the furthest pair of vertex atoms between the two tetrahedrons. Given the orientation category, the distance variable is chosen to be the distance between the closest pair of vertex atoms (r min ). The distributions of the orientation category and the distance variable have been estimated from backbone site pairs in a set of training proteins. Finally, e SS mss n (r i,r j ) for any relative geometries of two backbone sites are derived from these probabilities and distributions. More details of this process are given in Methods. Figure 1 shows the orientation categories of some backbone position pairs in contacting SSEs of varied types and relative arrangements. Clearly, the orientation category for two 432

4 backbone sites contains information about the overall relative arrangement between the SSEs that contain the two sites. In addition, it also contains information about the relative location of one site in one SSE with respect to the other SSE. Because of this, different SS type combinations as well as different orientation categories lead to different dependences of the statistical energy on the distance variable. In Figure 1, some of the distance-dependent energy curves are shown. As expected, these energy curves exhibit large variations upon varied SS type combinations and orientation categories. Thus, correlations between local structure types (SS types), relative orientations, and distances have been considered in the tetrabase energy form, making the total energy sensitive to variations in both the overall spatial arrangements and the detailed packing between SSEs. By definition, the statistical energy is associated with negative logarithms of the respective distributions in native structures, thus inter-sse packing modes preferred in native protein structures should be associated with lower energies. There is no special treatment of hydrogen bond in the tetrabase packing term. The quality of hydrogen bond geometries in the tetrabase energy-minimized backbone structures can serve as an indicator of the ability of the energy to recapitulate backbone packing in atomic details. In Figure 2, Figure 2. Distributions of hydrogen bond geometries in the tetrabase energy-minimized β-sheets and in native backbones. In total, 896 hydrogen bonds in the tetrabase energy-minimized-structures for the SSE compositions H 16 (E 7 ) 3 and (H 16 ) 2 (E 7 ) 4 (red circles) and 8960 hydrogen bonds randomly extracted form native proteins (black diamonds) are shown. inter-β strand hydrogen bond geometries in tetrabase energyminimized β-sheets (obtained with Monte Carlo simulated annealing started from artificially defined structures, see below) are compared with those in native backbone structures. Minima of tetrabase Energy Reproduce Native Backbone Packing at Atomic Resolutions. For several test systems, minima on their tetrabase energy surfaces defined in eq 1 have been searched with a Monte Carlo (MC) simulated annealing protocol as described in Methods. The SSE composition of each test system, namely, the number of SSEs and the SS type and length of each SSE, have been predefined and are listed in Table 1. For a given SSE composition, different coarse or general architectures can be defined based on the approximate directions of the SSEs relative to each other. For example, for the (H 16 ) 3 composition (see footnotes of Table 1 for the meaning of the notation), there can be two general architectures, one with two helices running in approximately the same direction and the remaining helix running in approximately the opposite direction and the other with all Table 1. Compositions and Directions of Secondary Structure Elements (SSEs) of Different Architectures a Letters H and E indicate secondary structure types (H for helix and E for strand). Superscripts indicate lengths of SSEs. Subscripts indicate numbers of SSEs of given types and lengths. b For each architecture, types and approximate directions of the SSEs are graphically represented. Each triangle represents a β-strand. Each circle represents an α-helix. Upward triangles or dots in circles indicate the outward direction. Downward triangles or crosses in circles indicate the inward direction. the three helices running in approximately the same direction. Similarly, six different architectures can be considered for the SSE composition H 16 (E 7 ) 3. There are far more possible architectures for the composition (H 16 ) 2 (E 7 ) 4, and only three of them have been considered here as examples. For convenience, each architecture is given a name in Table 1, and the corresponding arrangements of SSE are represented graphically in the same table. These SSE compositions associated with corresponding architectures have been selected to cover different combinations of SS types as well as different relative orientations between SSEs. They can comprise a reasonable set to test the ability of tetrabase to describe different types of inter-sse packing. For each architecture in Table 1, 50 independent MC simulated annealing runs have been carried out, all started from artificially constructed structures (see Methods). The lowest energy configuration found in each simulation was extracted. The configuration of the lowest energy among all 50 simulations has been considered as the configuration to represent a minimum. To examine convergence of the simulated annealing simulations, the lowest energy configurations found in the individual MC runs are compared with the corresponding representative configurations. Figure 2 shows that for most architectures, either the resulting rootmean-square deviations (RMSD) of Cα atom positions are small or the associated energies are much higher than the overall lowest energy. This indicates acceptable convergence of the MC simulated annealing protocol. One exception is the h3_1 architecture (Table 1), for which two structurally dissimilar representative configurations (mutual RMSD > 2.5 Å) have been found with similarly low energies (Figure 3). For this architecture, both configurations have been considered for further analysis. To examine whether the representative configurations found above correspond to designable backbone structures, they were separately used as queries to search the PDB database for similar structures. This has been carried out with the program Phyrestorm, 41 which can rapidly and comprehensively compare a given protein structure to the entire PDB through structural alignments of backbones. The algorithm of Phyrestorm requires specific ordering of the SSEs in the primary sequence. This 433

5 Figure 3. Energy variations of the lowest energy configurations in individual MC runs. The tetrabase energy values are relative to the lowest energy configuration from all MC runs of an architecture. The red plus signs correspond to configurations similar to the overall lowest energy one (RMSD < 2.5 Å). The black x signs correspond to the remaining configurations. ordering is not relevant in our minimized architectures. Thus, before a search, the SSEs are arbitrarily reordered to generate a query backbone. After searching with the query, nonredundant top hits with TM-scores above 0.6 were retained. If the accumulated number of retained hits is less than five, a new permutation of the SSE order was considered for another search. Otherwise the search is stopped, and the remaining SSE orders are not considered further. During this process, hits returned by Phyrestorm have been manually filtered to eliminate redundancy and to exclude PDB entries that correspond to NMR or Cryo-EM-determined models. In Figures 4 6, we list the PDB IDs and chain IDs of the top five hits for each representative configuration, together with the TM-scores, the number of aligned residues, and the RMSD of aligned Cα positions. Structures of the representative configurations aligned with the corresponding best hits were also shown. For all the representative configurations, five or more nonredundant PDB entries can be found to match the respective queries with TM-scores higher than The TMscores associated with the best hits are often above 0.7. These results suggest that tetrabase minima can recapitulate preferred inter-sse packing with atomic accuracy. Because of the small sizes of the tested SSE architectures, the tetrabase minimum configurations are usually aligned only to a small part of a known protein structures. On the other hand, the alignments are able to cover % of backbone sites in the tetrabase minimum configurations. In addition, the overall structures of the top hits for a given minimum configurations varied greatly. Such recurrent presence of the tetrabase minimum configurations in proteins of different overall folds support that more favorable (lower) tetrabase packing energies may be associated with higher designability. Figure 4. Comparisons between the tetrabase-energy minimized backbone configurations and backbone structures of native proteins. Panels (a) to (c) correspond to different representative configurations obtained for the SSE composition (H 16 ) 3. For each representative configuration, the left part shows the architecture name, the PDB IDs, and chain IDs of the top five hits obtained by using Phyrestorm, together with the TM-score, the number of aligned residues, and the RMSD of aligned Cα positions for each hit; the middle part shows the structures of the tetrabase-energyminimized configuration aligned with the best hit (gray) in ribbon form. The right part shows a more detailed stereoview of the aligned backbone structures, with the tetrabase-energy-minimized configuration in orange and the native backbone in light purple. 434

6 Journal of Chemical Information and Modeling Figure 5. Same as Figure 4, but for the SSE composition H16(E7)3. Panels (a) to (f) correspond to different representative configurations obtained for the SSE composition H16(E7)3. SSE in several natural proteins have been examined for their stability under the tetrabase inter-sse packing interaction. Again, the native structures have been chosen to cover different SSE compositions and relative arrangements. MC simulated annealing runs were carried out using initial SSE arrangements extracted from corresponding native structures. In these initial arrangements, loops and side chains from the original PDB structures were simply removed, leaving only backbone segments that correspond to regular SSEs. From an initial structure, 5.5 cycles of MC simulated annealing were executed. In these cycles, the effective temperature for MC has been changed periodically between an upper and a lower bound. The total tetrabase energies and the RMSD deviations from the The amino acid sequences of the structurally aligned parts of respective natural proteins are shown in Figure 7. They exhibit significant variations despite the highly similar backbone arrangements. This observation is in support of the main hypothesis underlying our approach, that is, polypeptide backbone conformational minima can be reconstructed with 1 2 Å resolution with models in which sequence specialization is not explicitly considered. This result is consistent with previous analyses on the none one-to-one correspondence between protein structures and sequences, such as the analysis carried out by Rackovsky.42 Stability of Native SSE Arrangements Can Be Maintained under tetrabase. The native arrangements of 435

7 Journal of Chemical Information and Modeling Figure 6. Same as Figure 4, but for the SSE composition (H16)2(E7)4. Panels (a) to (c) correspond to different representative configurations obtained for the SSE composition(h16)2(e7)4. Figure 7. Amino acid sequences of the structurally aligned parts of natural proteins. These proteins have been found as top hits in searches with respective tetrabase minimum configurations as queries. The architecture names, the number of residues contained in each architecture, and the PDB IDs of the natural proteins are given. For the architecture h3_1, two representation configurations have been obtained from MC minimization. starting native configurations are monitored. The monitored values are shown in Figure 8. In addition, the lowest energy configurations encountered in the MC simulated annealing runs are compared with the native configurations in Figure

8 Figure 8. Total tetrabase energies and the RMSD deviations during MC simulated annealing simulations. The simulations have been started from native SSE arrangements. The energy values are relative to respective lowest energies encountered in the simulations. The RMSDs are from the initial configurations. The native SSE arrangements considered included the following: (a) the α helices extracted from PDB 1A36, (b) the α helices extracted from PDB 1VCT, (c) the β strands extracted from PDB 1B33, (d) the β strands extracted from PDB 1TRE, (e) the α helices and β strands extracted from PDB 1EW4, (f) the α helices and β strands extracted from PDB 1OBB, (g) the α helices and β strands extracted from PDB 1CY5, and (h) the α helices and β strands extracted from PDB 1J24. For panels (a) to (f), the y-axis is labeled on the left side of the panel. For 1CY5 and 1J24, results of both unrestrained (black) and RMSD-restrained MC simulated annealing runs (red) are shown. For these two systems, the RMSD deviations shown in panels (g) and (h) are labeled on the right side of the y-axis. From Figure 8a f, for six out of the eight examined natural proteins, the native SSE arrangements have been stably maintained during MC simulated annealing; the RMSD values of the low energy configurations ranged between 1 to 2 Å. Even though the configurations could drift further away from the native configurations during the high temperature excitation phases of the annealing cycles, they returned closer to the native configuration in subsequent low temperature relaxation phases. This suggest that stable minima on the tetrabase energy surface exist close to the native configurations. To compare these minima with other possible minima of the same SSE compositions, MC simulated annealing started from artificially constructed initial structures (see Methods) were carried out on systems with the same SSE compositions as the examined natural proteins 1A36, 1VCT, 1B33, 1TRE, 1EW4, and 1OBB. Except for the 1EW4 composition, the lowest energy configurations in 10 independent MC runs of a composition included one or more configurations that are closely similar (RMD < 2.5 Å) to the configuration optimized from the respective native SSE arrangements. In addition, the lowest energies obtained from the simulations started from the native arrangements fell within the ranges covered by the lowest energies obtained from the simulations started from artificially constructed structures (Figure 10). For the two examined natural proteins 1CY5 and 1J24, low energy configurations visited during the MC simulated annealing were further away from the corresponding native configurations. For 1CY5, the RMSD was around 4 Å (Figure 8g). For 1J24, the RMSD was around 2.5 Å (Figure 8h). Despite this, closer inspections of the lowest energy configurations suggested that these configurations may still represent well-packed SSEs. In fact, in these relaxed configurations, SSEs, especially helices, tend to be packed more tightly with each other than in respective native structures (Figure 9), which may explain their lower tetrabase energies. Restrained MC simulated annealing on these two native SSE arrangements, in which RMSDs from the initial native configurations were restrained to be within 2 Å, were carried out. As expected, the restraints led to higher energies (Figure 8g and h). The energy increases, however, are comparable to the energy variations shown in Figure 9, with more residues in 437

9 Journal of Chemical Information and Modeling Figure 9. Lowest energy configurations encountered in MC compared with respective native SSE arrangements. The native configurations are shown in yellow, and the lowest energy configurations encountered in the MC simulated annealing runs are shown in green cyan. All configurations are shown in cross-eyed stereo. The corresponding native proteins are (a) PDB 1A36, with an SSE composition of H32H33; (b) PDB 1VCT, with an SSE composition of H25H28H33; (c) PDB 1B33, with an SSE composition of (E7)3; (d) PDB 1TRE, with an SSE composition of (E5)2E4; (e) PDB 1EW4, with and SSE composition of H21(E5)2E7E6E8E4H13; (f) PDB 1OBB, with an SSE composition of H13H18(E5)2; (g) PDB 1CY5, with an SSE composition of H17H11H9 H13H14H7; and (h) PDB 1J24, with an SSE composition of H9H12H14H18(E5)2E2(E7)2E4. the SSEs of 1CY5 and 1J24 than the proteins shown in Figure 9 taken into consideration. Results in Figures 8 10 suggest that the SSE arrangements in native protein backbones are likely to be stable under tetrabase, being atomistically similar to minima on the tetrabase energy surface. On the other hand, given the statistical nature of tetrabase, it is understandable that SSE packing in some natural proteins may have larger deviations from the tetrabase minima. Even after taking this latter point into consideration, a reasonably low tetrabase energy can probably still be used as a useful criterion to define acceptable packing geometries in a backbone design protocol. DISCUSSION For de novo protein design, it is desirable to identify designable backbone structures with usual computational sampling/ optimization schemes without the need to prespecify a sequence. For this purpose, one needs a general sequenceindependent energy function. The tetrabase energy introduced here is a new form of statistical energy to describe the through-space packing between polypeptide backbone sites. Side chain types are not explicitly considered. Instead, the interaction has been made to explicitly depend on the local conformation (here, the secondary structure type). In addition, Figure 10. Energy variations of MC-minimized SSE arrangements with the same SSE compositions as natural proteins. The MC runs have been started either from the native SSE arrangements (circles) or from artificially constructed SSE arrangements (triangles and plus signs, results of 10 independent MC runs for each SSE compositions are given). The energy values are relative to the averaged minimum energies of the 10 MC runs. Configurations similar to the respective native SSE arrangements (RMSD < 2.5 Å) are shown as plus signs, and the remaining configurations are shown as triangles. 438

10 the packing energy depends not only on distances but also on relative orientations. The tetrahedron representation of the relative orientation has been designed as a statistically easy to estimate and chemically sensible way to capture the anisotropic nature of the coarse-grained packing interactions. Thus, the resulting total tetrabase energy contained correlations between local peptide conformation, relative orientation, and distance, making it sensitive to variations in spatial arrangements between SSEs. In packing models depending on simple intersite distance, such correlations would have been averaged out. Previously, we have reported the ABACUS (a backbonebased amino acid usage survey) model for sequence design with given backbones. 31,43 Using fixed native backbones as design targets, sequences have been designed successfully using ABACUS for different fold classes. The RMSD deviations between the actual structures and the respective design targets are around 2 Å. 31,39 Thus, it may be reasonable to aim at constructing designable backbones with about 2 Å RMSD accuracy for subsequent sequence design. In previous studies, polypeptide backbones have also been generated de novo, with packing mostly modeled using simple steric interactions. The top TM-scores between the structures generated and actual protein structures were usually Although in some studies a few generated structured with higher TM-scores ( 0.6) have been reported, 20,21 such structures were posteriorly selected from a large number of generated structures using the TM-scores as criteria and not with an energetic criterion. The level of resemblance between the generated backbones to natural ones in these previous studies, although may be recognized as statistically significant and reflecting good correspondences between the overall topologies of generated and natural proteins, may not be high enough for the generated backbones to be used as viable targets for sequence design. On the other hand, the structures generated by minimizing tetrabase energy have much higher TM-scores with natural proteins (Figures 4 6). The small RMSDs of 1 2 Å suggest that backbones optimized using tetrabase may be realistic enough to be used as input for sequence design programs such as ABACUS. 31 A number of previous theoretical studies with side chain-free modeling have already suggested that the interplay between a number of side chain type-independent factors including backbone geometry, backbone hydrogen bonding, and (de)- solvation have strong presculpting effects on the free energy landscape of polypeptides Here, we observe that even at the atomistic resolution, the free energy minima of a statistical potential omitting residue types can still correspond to realistic backbone structures. This suggests that these backboneassociated factors can shape the free energy landscape of proteins at a resolution higher than crude or coarse contouring. This point may have implications not only for protein design but also for protein structure, function, and evolution in general, which may worth exploration in future studies. To further verify the contribution of the tetrabase packing term to the reaching of this accuracy, we have also performed controlling calculations in which the inter-sse arrangements have been optimized with the tetrabase packing terms replaced by simple steric plus attractive interactions. In these calculations, the interaction between a pair of Cα atoms are considered to be repulsive (associated with a positive energy of 10) if their distance is less than 3.5 Å and attractive (assigned a negative energy of 0.5) if their distance is between 3.5 and 8 Å. Top TM scores between configurations generated in these controlling calculations and native backbones are all below 0.6 (results not shown). Even though the tetrabase statistical energy does not explicitly depend on side chain types, it may still carry implicit sequence dependences that have been encoded in the backbone structure, including the local backbone conformations and the relative orientations between backbone sites. A side chain indiscriminative energy function that simply depends on inter Cα distances may not encode such implicit dependences. Our results support that general approaches to constructing realistic backbones at atomistic resolutions without explicit differentiation of side chain types is possible. In the current work, the local conformation types that have been considered included only regular secondary structure types. This allowed us to search the SSE packing space effectively, so that the effectiveness of tetrabase could be examined by comparing results with SSE packing in native backbones. In our ongoing work, tetrabase is being extended to cover backbone sites in loops by using a discrete local structure alphabet (for example, the ProteinBlock model 44,45 ) to represent local conformational states of loops as well as regular SSEs. In addition, to reach a complete model that will allow side chain-unspecialized backbone design with atomistic authenticity and full range of flexibility, the resulting tetrabase statistical energy that describes through-space packing between sequentially separated backbone sites, either in SSEs or in loops, is being integrated with statistical energy models developed to describe realistic peptide local conformations. METHODS tetrabase Packing Energy. The tetrabase packing term between two sites (generally indexed with numbers 1 and 2, respectively), e SS 1SS 2 (r 1,r 2 ), can be determined from the conditional probability distributionp(r 1,r 2 SS 1,SS 2 ), namely, e SS1SS2 ( r, r ) ln p( r, r SS, SS ) As previously stated, r 1 and r 2 refer to coordinates of main chain heavy atoms, SS 1 and SS 2 refer to secondary structure types. The distributionp(r 1,r 2 SS 1,SS 2 ) in eq 2 needs to be estimated using known protein structures. If we ignore structure variations within each backbone site, it is a joint distribution of at least six dimensions. To overcome the problem brought about by the multidimensionality, we lump together the dimensions that define relative orientation and represent the relative orientation as discrete categories. To define the orientation category O 1,2 from the atomic coordinates r 1 and r 2, each of the two backbone sites 1 and 2 is considered as a tetrahedron with the respective N, C, C β, and H α atoms as its vertices (the coordinates of C β and H α are determined using standard covalent internal coordinates). The orientation category is defined by the closest pair of vertices combined with the furthest pair of vertices between the two tetrahedrons. For example, if the closest pair of vertices are atom N of site 1 and atom C of site 2 (i.e., r min = r NC ) and the furthest pair are the C β atoms of both sites (i.e., r max = r Cβ C β ), O 1,2 would be assigned to category O β β NC,C C. Similarly, another interbackbone site orientation with r min = r Cβ C β and r max = r NC would be assigned to category O β β C C,NC and so on. With this categorization scheme, possible relative orientations of two backbone sites of the same (different) secondary structure type(s) are divided into 78 (144) distinct categories. (2) 439

11 Within each orientation category, we choose r min, the distance between the closest vertices, as the variable to further describe the packing geometry. Formally, this treatment is equivalent to approximating the overall distribution, up to a normalization constant, with the following form: p( r1, r2 SS 1, SS 2) p ( r1, r2 SS 1, SS2) 1,2 1,2 ρ( rmin O, SS 1, SS 2) 1,2 = PO ( SS, SS ) ref 1,2 1 2 ρ ( r ) min min (3) ref In eq 3, ρ min is a reference distribution of r min for two uniformly distributed, noninteracting tetrahedrons. It is needed because of the nonunitary nature of the transform from the complete coordinates r 1 and r 2 to the reduced representation O 1,2 and r min. In the current work, the probabilities of different orientation categories P(O 1,2 SS 1,SS 2 ) and the distributions of r min associated with different orientation categories ρ(r 1,2 min O 1,2,SS 1,SS 2 ) have been estimated from backbone site pairs in training proteins whose Cα Cα distances are within a cutoff of r cut = 13 Å. The conditional distributions for r min have been estimated by dividing r min between 0 and 11.5 Å into bins 0.25 Å in width and then counting the frequency of occurrence in ref each bin. The reference distribution ρ min has been estimated with the same resolution of bins, albeit with configurations computationally sampled for two uniformly distributed tetrahedrons within the same α-carbon distance cutoff. The set of training proteins is the same as that used in Xiong et al., comprising 12,465 nonredundant peptide chains (pairwise sequence identity between any two chains below 50%) with structures determined at resolutions of 2.5 Å or above by X-ray crystallography. 31,46 The secondary structure types have been assigned with the STRIDE program. 47 Finally, we define SS1SS2 SS1SS e ( r, r ) = ln p ( r, r SS, SS ) + e SS In eq 4, the reference energy e 1 SS 2 0 has been introduced to control the relative strength of interactions between different secondary structure types. In the current work, the value of SS e 1 SS 2 0 has been chosen so that the averages of e SS 1SS 2 (r 1,r 2 ) over the different orientation categories at the last bin of r min (centered around Å) are zero. The energies as defined in eq 4 are stored as separated tables of numerical values at bin centers of r min. Energies at r min values off the bin centers are obtained through linear interpolation. Monte Carlo Simulated Annealing. Monte Carlo (MC) simulated annealing has been employed to minimize the total tetrabase energy for systems of given SSE compositions. In MC, two types of configurational moves have been considered. The first type is rigid body moves of a subset of SSEs relative to the remaining SSEs. The chosen SSEs are randomly translated and rotated by uniformly distributed amounts with maximum step sizes of 3 Å and 0.5, respectively. In the second type pf MC move, one SSE in the current configuration is replaced with another SSE of the same type and length randomly taken from the training protein structures. New atomic coordinates of the substituting SSE is obtained by minimum RMSD fitting to the substituted SSE. In MC runs started from the native SSE arrangements, only the first type of MC moves have been considered. As different SSEs are not connected to each other, they may diffuse away from each other simply because of the effects of (4) translational entropy. To avoid this, the SSEs are confined within a finite simulation region, which is simply a cubic box that is large enough to accommodate free rearrangements of the contained SSEs. MC moves are rejected if they lead any of the SSEs, either in part or as a whole, to move out of the simulation region. At each MC step, an attempted move or structure change as described above is either accepted or rejected based on the energy change ΔE, the decision being made according to the Metropolis criterion, namely, βδ accept E (5) P ( Δ E) = min(1, e ) where 1/β is an effective temperature. Step-dependent effective temperatures were used to search the configurational space for energy minima. In simulations started from the artificially constructed structures, the effective temperature takes the form 1 2πk β ( k) = 2.5 cos( ) ,000 where k is the step number. A maximum number of 50,000 steps would be carried out, during which 1/β would drop from a starting highest value of 5 to the lowest value of 0. During the course, if the lowest energy visited has not been changes in the previous 5000 steps, the MC run was finished. In simulations started from the native SSE arrangements, the effective temperature was changed according 1 2πk to β ( k) = 2.5 cos( ) A total of 55,000 MC steps were executed, during which the effective temperature periodically oscillated between 5 and 0 for five full cycles and then dropped from 5 to 0 in the final half cycle. We did not try to fine-tune the parameters used in the Monte Carlo approach, although it may be possible to improve its efficiency by tuning parameters such as the step sizes and the temperature varying schemes. With the current scheme, the overall acceptance ratio is around 25%. In the high temperature phase (the first 10,000 steps of a simulated annealing cycle), the ratio is usually above 30%, while in the low temperature phase (the last 10,000 steps of a simulated annealing cycle) the ratio is usually lower than 20%. Construct Artificial Initial Configurations. Two different schemes have been used to construct artificial arrangements of the SSEs as initial configurations for subsequent MC simulated annealing. The first scheme is to randomly choose SSEs of designated types and lengths from the training backbone structures and then insert them at random positions with random orientations into the simulation box. This scheme worked well for SSE compositions containing only α-helices, similar sets of lowest energy configurations being found in the MC simulations starting from different initial structures. However, for SSE compositions containing β-strands, initial structures constructed with this scheme often did not lead to the formation of complete β-sheets in subsequent MC simulated annealing. Instead, the system frequently got trapped into configurations in which individual strands were packed against α-helices without forming β-sheets. This is purely a sampling problem, as such trapped configurations are always of significantly higher energies than configurations with wellformed β-sheets. The problem arose probably because that energy wells corresponding to well-formed β-sheets, despite being relatively deep, are entropically highly disfavored, spanning only narrow regions in the configurational space. When the system is outside of the narrow energy wells, the short-ranged favorable inter-β-strands-packing terms are not 440

12 effective to guide the MC moves, making the moves essentially random. If the initial configuration of the system is far from the β-sheet forming wells, it would become highly likely that the simulation got trapped in those broader or entropically more favored metastable energy wells before the more stable but narrower minima could be reached. To overcome this problem and to increase the efficiency of locating minima associated with β-sheets, a second scheme was introduced to construct initial structures. This scheme employs the idea of Taylor et al., in which SSEs are organized on lattice points in mutually parallel or antiparallel orientations. 48,49 For the SSE architectures considered in this work, at most two layers of lattice points in a 2-D plane were used as the (end) positions of SSEs. Both layers run horizontally, one of them containing lattice points for β-strands and the other containing points for α-helices. Within the β-strand layer, the horizontal distance between two neighboring lattice points or β-strands was set to 5 Å. Within the α-helix layer, this distance was set to 11 Å. The vertical distance between the two layers was set to 8 Å. The first lattice point of the α-helix layer was vertically aligned to the second lattice point of the β-strand layer, fixing the relative horizontal shift between the two layers. Each SSE of a given type and length was first randomly chosen from the training backbone structures. It is then rotated so that its endto-end (from N to C) direction is perpendicular to the lattice plane, pointing to one of two opposite directions (one defined as positive or the other negative). The remaining rotational degree of freedom, which is the rotation around the end-to-end axis, was left as random. SSEs pointing to the positive direction were translated to have their N-terminal Cα atoms at the corresponding lattice points, while SSEs pointing to the negative direction were translated to have their C-terminal Cα atoms at corresponding lattice points. Besides using initial configurations close to the intended β- sheet forming states, a coarse restraining energy was applied to restrict the β-strands from drifting too far away from the intended states during MC simulated annealing, especially in the higher temperature phase. This restraining function has the form strand pair A,B E = ε( n + n ) A B B A (6) where A and B each refers to one of two strands intended to become immediate neighbors in a β-sheet, ε is a positive constant, n A B is the number of backbone sites in strand A whose closest backbone site in strand B is within 7.5 Å (measured by the Cα Cα distance), and n B A is similarly defined for strand B. Equation 6 has been chosen to restrict the intrastrand conformational space so that it can be more efficiently searched during the Monte Caro simulation. As the restraint can be easily fulfilled by two strands spatially approaching each other to lead to zero restraining energies and forces, it does not affect the finer details of the relative arrangements between neighboring strands. The positive constant ε only needs to be large enough so that the system cannot move out of the restricted search region once it has moved into it. Here, we have used a large enough value of 10 for ε without trying to readjust it. We note that both the initial lattice-based arrangements of the β-strands and the strand-pair restraining energy (eq 6) have been introduced to increase the efficiency of the MC simulated annealing by avoiding it being trapped into metastable configuration states. Neither of these treatments should have 441 affected the atomistic details of interstrand packing in the finally identified stable states. Given the definition of our model, these details should have been determined solely by the tetrabase energy function. AUTHOR INFORMATION Corresponding Author * hyliu@ustc.edu.cn. ORCID Huanyu Chu: Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding This work has been supported by the National Natural Science Foundation of China (Grants and ). Notes The authors declare no competing financial interest. ACKNOWLEDGMENTS This work has been supported by the National Natural Science Foundation of China (Grants and ). REFERENCES (1) Dahiyat, B. I.; Mayo, S. L. De Novo Protein Design: Fully Automated Sequence Selection. Science 1997, 278, (2) Kuhlman, B.; Dantas, G.; Ireton, G. C.; Varani, G.; Stoddard, B. L.; Baker, D. Design of a Novel Globular Protein Fold with Atomic- Level Accuracy. Science 2003, 302, (3) Koga, N.; Tatsumi-Koga, R.; Liu, G.; Xiao, R.; Acton, T. B.; Montelione, G. T.; Baker, D. Principles for Designing Ideal Protein Structures. Nature 2012, 491, (4) Lin, Y. R.; Koga, N.; Tatsumi-Koga, R.; Liu, G.; Clouser, A. F.; Montelione, G. T.; Baker, D. Control Over Overall Shape and Size in de novo Designed Proteins. Proc. Natl. Acad. Sci. U. S. A. 2015, 112, E (5) Grigoryan, G.; Degrado, W. F. Probing Designability via a Generalized Model of Helical Bundle Geometry. J. Mol. Biol. 2011, 405, (6) Huang, P. S.; Oberdorfer, G.; Xu, C.; Pei, X. Y.; Nannenga, B. L.; Rogers, J. M.; DiMaio, F.; Gonen, T.; Luisi, B.; Baker, D. High Thermodynamic Stability of Parametrically Designed Helical Bundles. Science 2014, 346, (7) Thomson, A. R.; Wood, C. W.; Burton, A. J.; Bartlett, G. J.; Sessions, R. B.; Brady, R. L.; Woolfson, D. N. Computational Design of Water-Soluble Alpha-Helical Barrels. Science 2014, 346, (8) Marcos, E.; Basanta, B.; Chidyausiku, T. M.; Tang, Y.; Oberdorfer, G.; Liu, G.; Swapna, G. V.; Guan, R.; Silva, D. A.; Dou, J.; Pereira, J. H.; Xiao, R.; Sankaran, B.; Zwart, P. H.; Montelione, G. T.; Baker, D. Principles for Designing Proteins with Cavities formed by Curved Beta Sheets. Science 2017, 355, (9) Brunette, T. J.; Parmeggiani, F.; Huang, P. S.; Bhabha, G.; Ekiert, D. C.; Tsutakawa, S. E.; Hura, G. L.; Tainer, J. A.; Baker, D. Exploring the Repeat Protein Universe through Computational Protein Design. Nature 2015, 528, (10) Park, K.; Shen, B. W.; Parmeggiani, F.; Huang, P. S.; Stoddard, B. L.; Baker, D. Control of Repeat-Protein Curvature by Computational Protein Design. Nat. Struct. Mol. Biol. 2015, 22, (11) Jacobs, T. M.; Williams, B.; Williams, T.; Xu, X.; Eletsky, A.; Federizon, J. F.; Szyperski, T.; Kuhlman, B. Design of Structurally Distinct Proteins using Strategies Inspired by Evolution. Science 2016, 352, (12) Huang, P. S.; Boyken, S. E.; Baker, D. The Coming of Ageof de novo Protein Design. Nature 2016, 537,

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information

Protein Structure Prediction, Engineering & Design CHEM 430

Protein Structure Prediction, Engineering & Design CHEM 430 Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr

More information

Computer simulations of protein folding with a small number of distance restraints

Computer simulations of protein folding with a small number of distance restraints Vol. 49 No. 3/2002 683 692 QUARTERLY Computer simulations of protein folding with a small number of distance restraints Andrzej Sikorski 1, Andrzej Kolinski 1,2 and Jeffrey Skolnick 2 1 Department of Chemistry,

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Basics of protein structure

Basics of protein structure Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

Motif Prediction in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling

More information

07/30/2012 Rose=a Conference. Principle for designing! ideal protein structures! Nobuyasu & Rie Koga University of Washington

07/30/2012 Rose=a Conference. Principle for designing! ideal protein structures! Nobuyasu & Rie Koga University of Washington 07/30/2012 Rose=a Conference Principle for designing! ideal protein structures! Nobuyasu & Rie Koga University of Washington Naturally occurring protein structures are complicated Functional site or junks

More information

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES Protein Structure W. M. Grogan, Ph.D. OBJECTIVES 1. Describe the structure and characteristic properties of typical proteins. 2. List and describe the four levels of structure found in proteins. 3. Relate

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html

More information

Presenter: She Zhang

Presenter: She Zhang Presenter: She Zhang Introduction Dr. David Baker Introduction Why design proteins de novo? It is not clear how non-covalent interactions favor one specific native structure over many other non-native

More information

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Supporting Information Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Christophe Schmitz, Robert Vernon, Gottfried Otting, David Baker and Thomas Huber Table S0. Biological Magnetic

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Biophysical Journal, Volume 98 Supporting Material Molecular dynamics simulations of anti-aggregation effect of ibuprofen Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Supplemental

More information

arxiv: v1 [cond-mat.soft] 22 Oct 2007

arxiv: v1 [cond-mat.soft] 22 Oct 2007 Conformational Transitions of Heteropolymers arxiv:0710.4095v1 [cond-mat.soft] 22 Oct 2007 Michael Bachmann and Wolfhard Janke Institut für Theoretische Physik, Universität Leipzig, Augustusplatz 10/11,

More information

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure

More information

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation Jakob P. Ulmschneider and William L. Jorgensen J.A.C.S. 2004, 126, 1849-1857 Presented by Laura L. Thomas and

More information

Introduction to" Protein Structure

Introduction to Protein Structure Introduction to" Protein Structure Function, evolution & experimental methods Thomas Blicher, Center for Biological Sequence Analysis Learning Objectives Outline the basic levels of protein structure.

More information

Packing of Secondary Structures

Packing of Secondary Structures 7.88 Lecture Notes - 4 7.24/7.88J/5.48J The Protein Folding and Human Disease Professor Gossard Retrieving, Viewing Protein Structures from the Protein Data Base Helix helix packing Packing of Secondary

More information

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC The precise definition of a dihedral or torsion angle can be found in spatial geometry Angle between to planes Dihedral

More information

Protein Structure Prediction

Protein Structure Prediction Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on

More information

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Protein Dynamics The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Below is myoglobin hydrated with 350 water molecules. Only a small

More information

Distance Constraint Model; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 11

Distance Constraint Model; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 11 Distance Constraint Model; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 11 Taking the advice of Lord Kelvin, the Father of Thermodynamics, I describe the protein molecule and other

More information

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Ruhong Zhou 1 and Bruce J. Berne 2 1 IBM Thomas J. Watson Research Center; and 2 Department of Chemistry,

More information

NMR, X-ray Diffraction, Protein Structure, and RasMol

NMR, X-ray Diffraction, Protein Structure, and RasMol NMR, X-ray Diffraction, Protein Structure, and RasMol Introduction So far we have been mostly concerned with the proteins themselves. The techniques (NMR or X-ray diffraction) used to determine a structure

More information

Why Do Protein Structures Recur?

Why Do Protein Structures Recur? Why Do Protein Structures Recur? Dartmouth Computer Science Technical Report TR2015-775 Rebecca Leong, Gevorg Grigoryan, PhD May 28, 2015 Abstract Protein tertiary structures exhibit an observable degeneracy

More information

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions: Van der Waals Interactions

More information

FlexPepDock In a nutshell

FlexPepDock In a nutshell FlexPepDock In a nutshell All Tutorial files are located in http://bit.ly/mxtakv FlexPepdock refinement Step 1 Step 3 - Refinement Step 4 - Selection of models Measure of fit FlexPepdock Ab-initio Step

More information

SUPPLEMENTARY MATERIALS

SUPPLEMENTARY MATERIALS SUPPLEMENTARY MATERIALS Enhanced Recognition of Transmembrane Protein Domains with Prediction-based Structural Profiles Baoqiang Cao, Aleksey Porollo, Rafal Adamczak, Mark Jarrell and Jaroslaw Meller Contact:

More information

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS TASKQUARTERLYvol.20,No4,2016,pp.353 360 PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS MARTIN ZACHARIAS Physics Department T38, Technical University of Munich James-Franck-Str.

More information

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Zhong Chen Dept. of Biochemistry and Molecular Biology University of Georgia, Athens, GA 30602 Email: zc@csbl.bmb.uga.edu

More information

Folding of small proteins using a single continuous potential

Folding of small proteins using a single continuous potential JOURNAL OF CHEMICAL PHYSICS VOLUME 120, NUMBER 17 1 MAY 2004 Folding of small proteins using a single continuous potential Seung-Yeon Kim School of Computational Sciences, Korea Institute for Advanced

More information

Physiochemical Properties of Residues

Physiochemical Properties of Residues Physiochemical Properties of Residues Various Sources C N Cα R Slide 1 Conformational Propensities Conformational Propensity is the frequency in which a residue adopts a given conformation (in a polypeptide)

More information

Assignment 2 Atomic-Level Molecular Modeling

Assignment 2 Atomic-Level Molecular Modeling Assignment 2 Atomic-Level Molecular Modeling CS/BIOE/CME/BIOPHYS/BIOMEDIN 279 Due: November 3, 2016 at 3:00 PM The goal of this assignment is to understand the biological and computational aspects of macromolecular

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Ab-initio protein structure prediction

Ab-initio protein structure prediction Ab-initio protein structure prediction Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center, Cornell University Ithaca, NY USA Methods for predicting protein structure 1. Homology

More information

Supplementary Figure 1 Crystal contacts in COP apo structure (PDB code 3S0R)

Supplementary Figure 1 Crystal contacts in COP apo structure (PDB code 3S0R) Supplementary Figure 1 Crystal contacts in COP apo structure (PDB code 3S0R) Shown in cyan and green are two adjacent tetramers from the crystallographic lattice of COP, forming the only unique inter-tetramer

More information

Importance of chirality and reduced flexibility of protein side chains: A study with square and tetrahedral lattice models

Importance of chirality and reduced flexibility of protein side chains: A study with square and tetrahedral lattice models JOURNAL OF CHEMICAL PHYSICS VOLUME 121, NUMBER 1 1 JULY 2004 Importance of chirality and reduced flexibility of protein side chains: A study with square and tetrahedral lattice models Jinfeng Zhang Department

More information

Protein Folding Prof. Eugene Shakhnovich

Protein Folding Prof. Eugene Shakhnovich Protein Folding Eugene Shakhnovich Department of Chemistry and Chemical Biology Harvard University 1 Proteins are folded on various scales As of now we know hundreds of thousands of sequences (Swissprot)

More information

Prediction and refinement of NMR structures from sparse experimental data

Prediction and refinement of NMR structures from sparse experimental data Prediction and refinement of NMR structures from sparse experimental data Jeff Skolnick Director Center for the Study of Systems Biology School of Biology Georgia Institute of Technology Overview of talk

More information

Novel Monte Carlo Methods for Protein Structure Modeling. Jinfeng Zhang Department of Statistics Harvard University

Novel Monte Carlo Methods for Protein Structure Modeling. Jinfeng Zhang Department of Statistics Harvard University Novel Monte Carlo Methods for Protein Structure Modeling Jinfeng Zhang Department of Statistics Harvard University Introduction Machines of life Proteins play crucial roles in virtually all biological

More information

arxiv:cond-mat/ v1 2 Feb 94

arxiv:cond-mat/ v1 2 Feb 94 cond-mat/9402010 Properties and Origins of Protein Secondary Structure Nicholas D. Socci (1), William S. Bialek (2), and José Nelson Onuchic (1) (1) Department of Physics, University of California at San

More information

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure Bioch/BIMS 503 Lecture 2 Structure and Function of Proteins August 28, 2008 Robert Nakamoto rkn3c@virginia.edu 2-0279 Secondary Structure Φ Ψ angles determine protein structure Φ Ψ angles are restricted

More information

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed. Macromolecular Processes 20. Protein Folding Composed of 50 500 amino acids linked in 1D sequence by the polypeptide backbone The amino acid physical and chemical properties of the 20 amino acids dictate

More information

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome

More information

Simulating Folding of Helical Proteins with Coarse Grained Models

Simulating Folding of Helical Proteins with Coarse Grained Models 366 Progress of Theoretical Physics Supplement No. 138, 2000 Simulating Folding of Helical Proteins with Coarse Grained Models Shoji Takada Department of Chemistry, Kobe University, Kobe 657-8501, Japan

More information

Integrated Math 1. Course Standards & Resource Guide

Integrated Math 1. Course Standards & Resource Guide Integrated Math 1 Course Standards & Resource Guide Integrated Math 1 Unit Overview Fall Spring Unit 1: Unit Conversion Unit 2: Creating and Solving Equations Unit 3: Creating and Solving Inequalities

More information

A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics

A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics JOURNAL OF CHEMICAL PHYSICS VOLUME 115, NUMBER 3 15 JULY 2001 A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics Dominik Gront Department of

More information

Dublin City Schools Mathematics Graded Course of Study Algebra I Philosophy

Dublin City Schools Mathematics Graded Course of Study Algebra I Philosophy Philosophy The Dublin City Schools Mathematics Program is designed to set clear and consistent expectations in order to help support children with the development of mathematical understanding. We believe

More information

Computational Protein Design

Computational Protein Design 11 Computational Protein Design This chapter introduces the automated protein design and experimental validation of a novel designed sequence, as described in Dahiyat and Mayo [1]. 11.1 Introduction Given

More information

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics. Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond

More information

Model Mélange. Physical Models of Peptides and Proteins

Model Mélange. Physical Models of Peptides and Proteins Model Mélange Physical Models of Peptides and Proteins In the Model Mélange activity, you will visit four different stations each featuring a variety of different physical models of peptides or proteins.

More information

Docking. GBCB 5874: Problem Solving in GBCB

Docking. GBCB 5874: Problem Solving in GBCB Docking Benzamidine Docking to Trypsin Relationship to Drug Design Ligand-based design QSAR Pharmacophore modeling Can be done without 3-D structure of protein Receptor/Structure-based design Molecular

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Section II Understanding the Protein Data Bank

Section II Understanding the Protein Data Bank Section II Understanding the Protein Data Bank The focus of Section II of the MSOE Center for BioMolecular Modeling Jmol Training Guide is to learn about the Protein Data Bank, the worldwide repository

More information

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this

More information

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror Please interrupt if you have questions, and especially if you re confused! Assignment

More information

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions Van der Waals Interactions

More information

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a Lecture 11: Protein Folding & Stability Margaret A. Daugherty Fall 2004 How do we go from an unfolded polypeptide chain to a compact folded protein? (Folding of thioredoxin, F. Richards) Structure - Function

More information

Course Notes: Topics in Computational. Structural Biology.

Course Notes: Topics in Computational. Structural Biology. Course Notes: Topics in Computational Structural Biology. Bruce R. Donald June, 2010 Copyright c 2012 Contents 11 Computational Protein Design 1 11.1 Introduction.........................................

More information

Orientational degeneracy in the presence of one alignment tensor.

Orientational degeneracy in the presence of one alignment tensor. Orientational degeneracy in the presence of one alignment tensor. Rotation about the x, y and z axes can be performed in the aligned mode of the program to examine the four degenerate orientations of two

More information

Protein structure analysis. Risto Laakso 10th January 2005

Protein structure analysis. Risto Laakso 10th January 2005 Protein structure analysis Risto Laakso risto.laakso@hut.fi 10th January 2005 1 1 Summary Various methods of protein structure analysis were examined. Two proteins, 1HLB (Sea cucumber hemoglobin) and 1HLM

More information

ALL LECTURES IN SB Introduction

ALL LECTURES IN SB Introduction 1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL

More information

Useful background reading

Useful background reading Overview of lecture * General comment on peptide bond * Discussion of backbone dihedral angles * Discussion of Ramachandran plots * Description of helix types. * Description of structures * NMR patterns

More information

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure

Announcements. Primary (1 ) Structure. Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure Announcements TA Office Hours: Brian Eckenroth Monday 3-4 pm Thursday 11 am-12 pm Lecture 7 & 8: PROTEIN ARCHITECTURE IV: Tertiary and Quaternary Structure Margaret Daugherty Fall 2003 Homework II posted

More information

Lipid Regulated Intramolecular Conformational Dynamics of SNARE-Protein Ykt6

Lipid Regulated Intramolecular Conformational Dynamics of SNARE-Protein Ykt6 Supplementary Information for: Lipid Regulated Intramolecular Conformational Dynamics of SNARE-Protein Ykt6 Yawei Dai 1, 2, Markus Seeger 3, Jingwei Weng 4, Song Song 1, 2, Wenning Wang 4, Yan-Wen 1, 2,

More information

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004 CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004 Lecture #2: 1 April 2004 Topics: Kinematics : Concepts and Results Kinematics of Ligands and

More information

Universal Similarity Measure for Comparing Protein Structures

Universal Similarity Measure for Comparing Protein Structures Marcos R. Betancourt Jeffrey Skolnick Laboratory of Computational Genomics, The Donald Danforth Plant Science Center, 893. Warson Rd., Creve Coeur, MO 63141 Universal Similarity Measure for Comparing Protein

More information

Introduction to Computational Structural Biology

Introduction to Computational Structural Biology Introduction to Computational Structural Biology Part I 1. Introduction The disciplinary character of Computational Structural Biology The mathematical background required and the topics covered Bibliography

More information

Nature Structural and Molecular Biology: doi: /nsmb.2938

Nature Structural and Molecular Biology: doi: /nsmb.2938 Supplementary Figure 1 Characterization of designed leucine-rich-repeat proteins. (a) Water-mediate hydrogen-bond network is frequently visible in the convex region of LRR crystal structures. Examples

More information

F. Piazza Center for Molecular Biophysics and University of Orléans, France. Selected topic in Physical Biology. Lecture 1

F. Piazza Center for Molecular Biophysics and University of Orléans, France. Selected topic in Physical Biology. Lecture 1 Zhou Pei-Yuan Centre for Applied Mathematics, Tsinghua University November 2013 F. Piazza Center for Molecular Biophysics and University of Orléans, France Selected topic in Physical Biology Lecture 1

More information

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy

Presentation Outline. Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy Prediction of Protein Secondary Structure using Neural Networks at Better than 70% Accuracy Burkhard Rost and Chris Sander By Kalyan C. Gopavarapu 1 Presentation Outline Major Terminology Problem Method

More information

Protein Structure Determination

Protein Structure Determination Protein Structure Determination Given a protein sequence, determine its 3D structure 1 MIKLGIVMDP IANINIKKDS SFAMLLEAQR RGYELHYMEM GDLYLINGEA 51 RAHTRTLNVK QNYEEWFSFV GEQDLPLADL DVILMRKDPP FDTEFIYATY 101

More information

From Amino Acids to Proteins - in 4 Easy Steps

From Amino Acids to Proteins - in 4 Easy Steps From Amino Acids to Proteins - in 4 Easy Steps Although protein structure appears to be overwhelmingly complex, you can provide your students with a basic understanding of how proteins fold by focusing

More information

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP) Joana Pereira Lamzin Group EMBL Hamburg, Germany Small molecules How to identify and build them (with ARP/wARP) The task at hand To find ligand density and build it! Fitting a ligand We have: electron

More information

Supersecondary Structures (structural motifs)

Supersecondary Structures (structural motifs) Supersecondary Structures (structural motifs) Various Sources Slide 1 Supersecondary Structures (Motifs) Supersecondary Structures (Motifs): : Combinations of secondary structures in specific geometric

More information

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years. Structure Determination and Sequence Analysis The vast majority of the experimentally determined three-dimensional protein structures have been solved by one of two methods: X-ray diffraction and Nuclear

More information

Lecture 11: Protein Folding & Stability

Lecture 11: Protein Folding & Stability Structure - Function Protein Folding: What we know Lecture 11: Protein Folding & Stability 1). Amino acid sequence dictates structure. 2). The native structure represents the lowest energy state for a

More information

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding Lecture 11: Protein Folding & Stability Margaret A. Daugherty Fall 2003 Structure - Function Protein Folding: What we know 1). Amino acid sequence dictates structure. 2). The native structure represents

More information

Helix-coil and beta sheet-coil transitions in a simplified, yet realistic protein model

Helix-coil and beta sheet-coil transitions in a simplified, yet realistic protein model Macromol. Theory Simul. 2000, 9, 523 533 523 Full Paper: A reduced model of polypeptide chains and protein stochastic dynamics is employed in Monte Carlo studies of the coil-globule transition. The model

More information

Common Core State Standards for Mathematics - High School

Common Core State Standards for Mathematics - High School to the Common Core State Standards for - High School I Table of Contents Number and Quantity... 1 Algebra... 1 Functions... 3 Geometry... 6 Statistics and Probability... 8 Copyright 2013 Pearson Education,

More information

Effect of protein shape on multibody interactions between membrane inclusions

Effect of protein shape on multibody interactions between membrane inclusions PHYSICAL REVIEW E VOLUME 61, NUMBER 4 APRIL 000 Effect of protein shape on multibody interactions between membrane inclusions K. S. Kim, 1, * John Neu, and George Oster 3, 1 Department of Physics, Graduate

More information

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra September The Building Blocks of Algebra Rates, Patterns and Problem Solving Variables and Expressions The Commutative and Associative Properties The Distributive Property Equivalent Expressions Seeing

More information

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia

More information

Structural Alignment of Proteins

Structural Alignment of Proteins Goal Align protein structures Structural Alignment of Proteins 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE

More information

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds Protein Structure Hierarchy of Protein Structure 2 3 Structural element Primary structure Secondary structure Super-secondary structure Domain Tertiary structure Quaternary structure Description amino

More information

Supporting Information

Supporting Information Supporting Information Ottmann et al. 10.1073/pnas.0907587106 Fig. S1. Primary structure alignment of SBT3 with C5 peptidase from Streptococcus pyogenes. The Matchmaker tool in UCSF Chimera (http:// www.cgl.ucsf.edu/chimera)

More information

A Method for the Improvement of Threading-Based Protein Models

A Method for the Improvement of Threading-Based Protein Models PROTEINS: Structure, Function, and Genetics 37:592 610 (1999) A Method for the Improvement of Threading-Based Protein Models Andrzej Kolinski, 1,2 * Piotr Rotkiewicz, 1,2 Bartosz Ilkowski, 1,2 and Jeffrey

More information

Molecular Mechanics. I. Quantum mechanical treatment of molecular systems

Molecular Mechanics. I. Quantum mechanical treatment of molecular systems Molecular Mechanics I. Quantum mechanical treatment of molecular systems The first principle approach for describing the properties of molecules, including proteins, involves quantum mechanics. For example,

More information

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur Lecture - 06 Protein Structure IV We complete our discussion on Protein Structures today. And just to recap

More information