proteins High Accuracy Assessment Assessment of CASP7 predictions in the high accuracy template-based modeling category

Size: px
Start display at page:

Download "proteins High Accuracy Assessment Assessment of CASP7 predictions in the high accuracy template-based modeling category"

Transcription

1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS High Accuracy Assessment Assessment of CASP7 predictions in the high accuracy template-based modeling category Randy J. Read* and Gayatri Chavali Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom ABSTRACT Models for target domains in the high accuracy templatebased modeling category were assessed according to a number of criteria evaluating the quality of the main-chain prediction (GDT-HA), predicted sequence alignment (AL0), and side-chain rotameric state. A new criterion was introduced, the quality of the model for use in solving a crystal structure by molecular replacement. There is good evidence that modeling adds value to the template structures, particularly when multiple templates are available. However, when there is already a good template, few of the models are better for the purpose of molecular replacement. Proteins 2007; 69(Suppl 8): VC 2007 Wiley-Liss, Inc. Key words: HA/TBM; structure prediction; comparative modeling; molecular replacement. INTRODUCTION Though the Protein Databank 1 continues to grow exponentially, structural biology cannot keep up with the explosion of gene sequence information. On the other hand, to fully exploit the understanding of biochemistry and disease-associated mutations that can be deduced from sequence information, it is essential to build on the framework of structural information. Comparative modeling allows the gap between the sequence and structure databases to be spanned when a suitable template can be identified. However, the quality of the comparative model depends strongly on the quality of the template and, of course, on the quality of the modeling algorithms. A perennial issue in comparative modeling is the question of added value: to what extent does the model add information beyond the statement that the target resembles the template? In CASP6, it was concluded that for the easier targets, the difficulty was with refinement methods to improve on the template, and it was suggested that more attention should be paid to this issue to allow better evaluation of the impact of refinement in such cases. 2 For this reason, the high accuracy template-based modeling (HA/ TBM) category was introduced for CASP7. Targets were assigned to this category after predictions were closed on the basis of two criteria. First, to ensure that there was a good template in the PDB at the time predictions were made, structural superpositions had to identify at least one template with an LGA-S score 3 of greater than 80. Second, to ensure that it was possible to construct a good model, it was required that at least one model must give a GDT-TS score 3 of greater than 80. METHODS Only the first model submitted by each group was evaluated in the group comparisons. As far as possible, we have based our assessments on criteria developed in earlier CASPs, with help from earlier assessors in implementing those criteria. Because the The authors state no conflict of interest. Grant sponsor: Wellcome Trust (UK); Grant number: *Correspondence to: Randy J. Read, Cambridge Institute for Medical Research, Wellcome Trust/MRC Building, Hills Road, Cambridge CB2 0XY, UK. rjr27@cam.ac.uk Received 11 April 2007; Revised 29 May 2007; Accepted 10 June 2007 Published online 25 September 2007 in Wiley InterScience ( DOI: /prot VC 2007 WILEY-LISS, INC. PROTEINS 27

2 R.J. Read and G. Chavali focus is on models of higher accuracy, we have emphasized the more stringent versions of assessment criteria, when there is a choice of stringency. Criteria computed at Protein Structure Prediction Center Raw scores for many of the possible criteria were computed at the Protein Structure Prediction Center in the Genome Center at the University of California (Davis), and these were maintained on a web page accessible to assessors. The scores used in this work were computed using results from the program LGA 3 : AL0 (alignment score based on the superposition obtained with LGA), LGA-S (sequence independent superposition score), and the GDT scores (sequence-dependent superposition scores). The GDT scores come in different varieties depending on the stringency with which Ca atoms must be aligned; we computed Z-scores for the numerical evaluation with the high accuracy version (GDT-HA), which uses threshold distances half the size of those used for the standard version (GDT-TS), and is thus more stringent. Details of the statistics are available from the paper in this issue describing the facilities provided by the Protein Structure Prediction Center. 4 Rotamer prediction quality The quality of side-chain prediction was evaluated by comparing torsion angles between the model and the target. Torsion angles differences were computed using the program LSQMAN. 5 Where side chains were missing from a model, the torsion angles were classified as incorrect. Four raw scores were evaluated: the fraction of residues with v 1 angles predicted within either 158 or 308 and the fraction of residues with both v 1 and v 2 predicted within either 158 or 308. In this case, greater emphasis was placed on the less stringent criteria (308 tolerance) because of the uncertainty in the experimental values of the torsion angles. Although there is greater uncertainty in the torsion angles of surface residues, all residues were included in the analysis to increase the number of observations. Suitability for molecular replacement As a new criterion, we have introduced a measure for how well a model can be used to solve target crystal structures by molecular replacement. The program Phaser 6 uses likelihood methods to solve crystal structures. For each potential solution it reports a log-likelihood-gain (LLG) score, which measures how well the model agrees with the data. Although Phaser can be used to solve structures with multiple components, the computing time rises significantly compared with structures with a single component, so only those targets that were determined by crystallography and have a single copy of a single-domain protein in the asymmetric unit were evaluated. The likelihood function requires, as a parameter, an estimate of the expected RMS deviation of the model from the target; for our tests, we used the value predicted from a correlation between sequence identity and main-chain coordinate error, 7 using the sequence identity from the most closely-related template available in the PDB at the time of prediction. Z-scores For numerical evaluation, the raw scores were converted to Z-scores, as described by Tress et al. 2 The Z-scores were computed in two passes. In a first pass, the mean and standard deviation of the raw scores for the first models submitted by all the groups for a target were evaluated. In a second pass, models worse than two standard deviations below the mean were eliminated in computing a revised mean and standard deviation, which was then used as the basis for the final Z-score, where all negative Z-scores were then assigned as zero. RESULTS AND DISCUSSION Table I presents a summary of the numerical ranking results for all the groups that submitted predictions for HA/TBM targets in CASP7. Targets In CASP7, predictions for a total of 95 targets were evaluated by assessors (Clarke et al., this volume). A number of these had multiple domains, so there was a total of 123 target domains. Of these, 28 domains from 24 targets were assigned to the HA/TBM category. The structure of one of these (T0302) was withdrawn and replaced during the evaluation period, so it was omitted from the analysis presented here. Domain 2 of target T0303 was included in the analysis because it obeyed the strict criteria for entrance into the HA/TBM category, but it was omitted from the official HA/TBM list (Clarke et al., this volume) because only one model had a GDT- TS score above 80. All of the structures analyzed here were determined by X-ray crystallography. Although most of the targets had potential templates with high levels of sequence identity, as one would expect given the criteria for entrance into the HA/TBM category, there was one domain (domain 2 of T0303) where the most closely-related template was only 13% identical in sequence. It is instructive to look at the effect of sequence identity on the probability that a template will show a high level of structural similarity (measured in this work by whether the LGA-S score was greater than 0.8). Figure 1 shows, as a function of sequence identity for the most closely-related template, the fraction of domains assigned to the HA/TBM category. As one 28 PROTEINS DOI /prot

3 Assessment of HA/TBM Category in CASP7 Table I Detailed Results by Group a Group n HA Mean GDT-HA Z-score Mean AL0 Z-score Mean v 1 Z-score Mean v 1 /v 2 Z-score n MR Mean LLG Z-score Sum TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS (Continued ) DOI /prot PROTEINS 29

4 R.J. Read and G. Chavali Table I Continued Group n HA Mean GDT-HA Z-score Mean AL0 Z-score Mean v 1 Z-score Mean v 1 /v 2 Z-score n MR Mean LLG Z-score Sum TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS AL TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS AL TS TS AL TS TS TS AL TS TS TS TS AL TS TS TS TS AL TS AL AL (Continued ) 30 PROTEINS DOI /prot

5 Assessment of HA/TBM Category in CASP7 Table I Continued Group n HA Mean GDT-HA Z-score Mean AL0 Z-score Mean v 1 Z-score Mean v 1 /v 2 Z-score n MR Mean LLG Z-score Sum TS TS TS TS AL AL TS TS TS TS TS TS TS AL TS AL TS TS TS TS TS TS TS AL TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS AL TS TS TS TS a In the group name, AL indicates that only alignment predictions were submitted, whereas TS indicates that atomic coordinates were submitted. Only one group (706; TENETA) submitted predictions in both categories and hence appears twice in this list. The translation from group number to group name can be found on the Protein Structure Prediction Center web page, at n HA is the number of predictions submitted for target domains assigned to the HA/TBM category, and n MR is the number of predictions submitted for targets tested for molecular replacement. The computation of Z-scores for GDT- HA, AL0, correct v 1, correct v 1 / v 2 pairs, and molecular replacement LLG score is explained in the text. Groups are sorted by the sum of their mean Z-scores for GDT- HA, correct v 1 /v 2 pairs and LLG. would expect, there is a good correlation between sequence identity and the probability of a high LGA-S score. Strikingly, there is a big jump at about 30% sequence identity, which agrees well with anecdotal evidence that with a model 30% identical in sequence or better, there is an excellent chance of solving a crystal structure by molecular replacement. Similarly, it has been suggested that template-based modeling generally increases substantially in accuracy at the level of 30% sequence identity. 8 Nonetheless, there is a significant number of target domains with good templates at lower sequence identities, agreeing with the finding that molecular replacement will succeed in at least some cases with distantly-related models. 9 DOI /prot PROTEINS 31

6 R.J. Read and G. Chavali Figure 1 Fraction of domains assigned to HA/TBM category, as a function of sequence identity for most closely-related template. Alignment accuracy Alignment accuracy correlates very strongly with the other measures of model quality (results not shown), which is not surprising as a good alignment is an essential prerequisite to building a good model. What is more notable is that the groups submitting alignment-only (AL) predictions did much more poorly in the AL0 alignment score than groups submitting full structural predictions. It is conceivable that this reflects use of less sophisticated methods by groups restricting their efforts to alignment. However, some of the difference will arise from incompleteness of the AL models, which by their nature lack residues arising from insertions relative to the template. Another explanation is that the attempt to build a plausible model provides a good test for the hypothesis that a particular sequence alignment is correct. sequence alignment. The sequence alignment from the LGA structural superposition between the target and the template was used to construct a model by replacing the side chains on the template. A GDT-TS score was then computed for this model using LGA. Scores computed on this basis were kindly supplied by Michael Tress. Second, we looked at the LGA-S (sequence independent) scores for the models. The second method is preferred, as it does not penalize models for the imperfection of alignment algorithms. Both methods penalize models for which the best possible template was not identified. The data in Figure 3 demonstrate that the best model consistently improves on the best template. In general, a roughly constant fraction of the difference between the template and the target is removed in the best models. Value could be added to the best single template in a number of ways. One would be through refinement methods, which would be difficult to assess from these data. A second would be by assembling the model from multiple templates. One indication that this is an important factor is that the targets with a single template (highlighted in Fig. 3) show less improvement in LGA-S score on average than those with more than one template. In addition, there is good evidence that the use of multiple templates had a significant impact in the construction of the best models for target T0315 (also highlighted in Fig. 3). The best overall template for this target is chain A of PDB entry 1J6O, although an examination of local Ca deviations shows that the region around residues is poorly conserved. (An analysis of deviations from templates and models can be viewed on the CASP7 web site at Casp7.html). Chains A and B of PDB entry 1YIX show much smaller deviations for residues 20 25, although they model residues more poorly. The best models Quality of fold prediction The evaluation of fold prediction concentrated on the GDT-HA score, a variant of GDT-TS with lowered thresholds that make it more sensitive to fine details. Judged by mean Z-values of GDT-HA scores, group 556 (LEE) had the best overall performance (Fig. 2). Value added to fold prediction One way to assess the value added to the template is to consider whether the fold prediction scores are better for the models than for the templates on which they were based. A complication is that the model is based on an explicit sequence alignment, whereas the sequence alignment for the template must be inferred, preferably from the structural alignment. We investigated two ways of dealing with this. First, we looked at the GDT-TS scores that would be assigned to the template given a perfect Figure 2 Top 20 mean Z-scores for GDT-HA criterion. 32 PROTEINS DOI /prot

7 Assessment of HA/TBM Category in CASP7 Figure 3 Measuring value added by fraction of potential improvement in LGA-S that was achieved. A perfect model would have an LGA-S score of 100, so the fraction of potential improvement is defined as (LGA-S model 2 LGA-S template )/ (100 2 LGA-S template ). This is plotted as function of LGA-S for the most closelyrelated template. Points corresponding to targets with a single good template are highlighted with circles, while the point corresponding to target T0315 is highlighted with a diamond. Figure 4 Comparison of the structures of the target T0315 (green), the model from group 137 (3Dpro; grey) and two possible templates, PDB entry 1J6O (cyan) and PDB entry 1YIX (magenta), in the region of residues for this target resemble 1J6O for residues and 1YIX for residues 20 25, and annotations indicate that both templates have indeed been used simultaneously in each of the models. Figure 4 compares the structures of the target, the two templates, and the model with the highest GDT-TS score, which was submitted by group 137 (3Dpro). Model ranking Successful structure prediction is a combination of two factors: model generation, in which the space of possible conformations is sampled, and model ranking, in which the possible conformations are scored to find the best solution. Since most participating groups submitted five ranked predictions for each target, it is possible to gain some insight into how well they do in ranking those submissions. In general, there is a good correlation between the rank assigned to a model and the quality of the model. Table IIa shows the mean Z-scores for GDT-HA as a function of model rank, for all groups who submitted five models for at least one target, and for the subsets of the 20 or 50 groups with the highest mean GDT-HA scores for their first model. However, there is still room for improvement in the rankings. Table IIb compares the mean GDT-HA Z- scores for the first model and the best model, for the 10 groups with highest GDT-HA scores. Perfect ranking would have improved the results for all groups. Torsion angle accuracy As expected, the accuracy with which the side-chain rotamers can be predicted depends on the level of sequence identity with the available templates, as the rotamers for residues identical between template and target are very likely to be the same, particularly for closelyrelated structures. Figure 5 shows the correlation between the fraction of v 1 angles predicted within 308 for the best model for each target and the sequence identity for the most closely-related template. Extrapolating the trend to a sequence identity of zero suggests that over 60% of v 1 angles would still be predicted within 308, probably because of a combination of the information from Table II Correlation of GDT-HA Score with Model Rank (a) Mean GDT-HA Z-scores by model rank for cases in which 5 models were submitted Selection Model 1 Model 2 Model 3 Model 4 Model 5 Top 20 groups Top 50 groups All groups (b) Mean GDT-HA Z-scores for model 1 and best submitted model for each target Group Model 1 Best Model Ratio of 1 to best TS556 (LEE) TS024 (Zhang) TS025 (Zhang-Server) TS136 (FOLDpro) TS137 (3Dpro) TS125 (TASSER) TS020 (Baker) TS675 (fams-ace) TS026 (SAMUDRALA) TS671 (fams-multi) DOI /prot PROTEINS 33

8 R.J. Read and G. Chavali There was a similar lack of success in predicting relative orientations of domains. Four targets were split into two domains that were each assigned to the HA/TBM category (T0292, T0295, T0303, and T0324). None of the models submitted for any of these targets improved on the relative domain orientation from the available templates. Quality of models for molecular replacement Figure 5 Correlation between fraction of v 1 angles predicted within 308 for the model with best v 1 accuracy and sequence identity of most closely-related template. A large fraction of crystal structures deposited in the PDB is solved using the molecular replacement method. In molecular replacement, an atomic model is rotated and translated to place it in the unit cell of the crystal of the target protein, allowing the unmeasured phase information to be estimated by phases computed from the model. The quality of the atomic model influences suc- rotamer preferences and the constraints introduced by the environment in the predicted fold. Figure 6 shows the mean Z-scores for the prediction of v 1 alone or of both v 1 and v 2. Group 191 (Schomburggroup) has the best results for rotamer accuracy, but it should be noted that this group only submitted predictions for 6 of the 28 target domains (Table I). Predicting relative orientations of domains or monomers For the purposes of assessment, the target proteins were split into domains or prediction units. Predictors were not expected to predict the conformations of N- or C-termini if their conformations appeared to be determined by crystal packing. Nor were they expected to predict the relative orientations of domains if the same relative orientation was not found in any of the available templates. Although they were told that the targets formed multimers when that information was available, evaluation of prediction accuracy focused on the monomers. Nonetheless, most of the models for multi-domain proteins included all the domains, and a small number of predictors submitted oligomeric predictions. We examined these visually, to determine whether any predictors had been able to predict the relative orientations of domains within a monomeric protein, or of monomers within a multimeric protein. There were 7 HA/TBM targets for which oligomeric predictions were submitted. Seven groups submitted predictions for five of these, and nine groups submitted predictions for the other two. Unfortunately, the predictions were only correct when the quaternary structure was clear from the templates, which was true for two of the targets (T0332 and T0339). Figure 6 (a) Top 20 mean Z-scores for fraction of residues for which v 1 angle is predicted within 308. (b) Top 20 mean Z-scores for fraction of residues for which both v 1 and v 2 are predicted within PROTEINS DOI /prot

9 Assessment of HA/TBM Category in CASP7 cess in two ways. First, better models give a stronger signal in the rotation and translation searches. Second, better models give more accurate phases, from which clearer electron density maps can be computed so that a final model can be obtained more easily. There is a largely untapped potential for the use of comparative models in molecular replacement. In the past, anecdotal evidence suggested that, rather than adding value, modeling often reduced the value of homologous protein structures for molecular replacement. For this reason, most crystallographers have been conservative about modeling, restricting themselves to trimming out poorly-conserved loops or side chains. Such editing operations can indeed improve models significantly for use in molecular replacement. 9 However, there have been signs that modeling algorithms have improved to the point that they can now be useful for molecular replacement. For instance, the CaspR web server 10 generates a number of potential models by producing alternative sequence alignments that are then used as input to MODELLER, 11 and it is often found that at least one of these models is better than the original template. Similarly, the Tramontano group 12 has reported that a number of models submitted to previous CASPs provide better molecular replacement models than the best single template. We therefore examined the models submitted for a number of HA/TBM targets, to assess which were best for molecular replacement and whether or not they improved on the best available template. Models were tested in Phaser 6 and scored using LLG. Of the 24 targets contributing HA/TBM domains, 12 are single-domain proteins with single copies and have diffraction data available through the PDB. All models (for control experiments and templates) were trimmed to contain only the residues in the domain definitions used for assessment, to avoid flexible termini and loops. It was disappointing to find that only 33 of 1588 models that were evaluated gave a higher LLG score than the best single template. For seven of the 12 targets, none of the models were better than the best single template. In contrast, the Tramontano group 12 found improvements on the template for five of seven selected targets from CASP5 and CASP6. The difference is likely to be in the selection criteria imposed for entry into the HA/TBM category, where it was required that there be a good template. This leaves less room for improvement in modeling. All 12 CASP7 targets tested here with molecular replacement calculations could be solved using template structures, and indeed, the PDB entries report that 15 of the 24 targets contributing HA/TBM domains were solved using molecular replacement. Since cases with poor templates but good models are excluded, the failure to see dramatic improvements in molecular replacement success is at least partly an artefact of the entry criteria for the HA/TBM category. Figure 7 Top 20 mean Z-scores for LLG criterion, for groups predicting at least 10 of 12 targets used for molecular replacement tests. The highest rate of success in improving on the best template was for group 249 (taylor), which provided a model better than the best template for 3 of the 4 targets for which they submitted models. Figure 7 shows that of the groups that submitted predictions for at least 10 of the 12 targets, the best overall performance was from group 338 (UCB-SHI), which also improved on the template for two cases. For target T0290, only group 249 (taylor) submitted a prediction that gave better molecular replacement results than the best template (PDB entry 1ihg, which is about 62% identical in sequence). The results in Table III show that there is little correlation of LLG with the conventional scores for model accuracy. Other groups submitted models with better GDT-HA (groups 556 LEE and 020 Baker) or LGA-S (groups 556, 020, and 536 Chen-Tan-Kihara) scores or even an equivalent RMS-ALL score (group 020), but their models gave worse results in molecular replacement than the template. This is probably because the effect of model error on molecular replacement success has a very different functional form than the common measures of model accuracy. The contribution of a structure factor to the LLG score depends on the complex correlation between the true structure factor and the one calculated from the model. We can simplify equation 34a of Read 13 by assuming that all atoms are equivalent, to obtain the following expression for structure factor correlation: " 8 X exp 2p2 jdr j j 2 9#, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi >: >; 3d 2 n total n model j In this expression, Dr j is the error in position of atom j, d is the Bragg spacing (resolution) of the structure factor, n total is the total number of atoms in the true structure DOI /prot PROTEINS 35

10 R.J. Read and G. Chavali Table III Comparison of Fold Quality Scores with Molecular Replacement LLG Scores for Best Models Submitted for Target T0290 Model LLG GDT-HA RMS-ALL LGA-S 1ihg TS249 (taylor) TS556 (LEE) TS020 (Baker) TS111 (panther) TS536 (Chen-Tan-Kihara) and n model is the number of atoms in the model. From this equation we can see that the positional error should ideally be much smaller than the Bragg spacing and that including an atom with a large positional error is worse than leaving it out entirely. The GDT scores, for instance, will be entirely insensitive to errors smaller than the smallest threshold, whereas the structure factor correlation will only be optimal for very small errors. So modeling algorithms that allow many small errors to accumulate could reproduce the fold well, according to scores such as GDT-HA, but reproduce the structure factors more poorly. A visual comparison of the models for T0290 suggests that group 249 (taylor) may have been more conservative in introducing changes to the core of the template than the other groups, particularly in the rotamers of conserved or conservatively substituted side chains. As discussed above, the general lack of improvement on the best templates can be blamed, at least in part, on the criteria for entry into the HA/TBM category. Nonetheless, it would be better if the modeling algorithms did no harm to good templates. The results suggest that the methods in general could afford to be more conservative in introducing changes to regions of high sequence identity. Predictors were asked to use the B-factor column of the PDB files to provide error estimates for individual atoms, but few groups did so. However, a measure of relative confidence in different parts of the model would be extremely useful for molecular replacement calculations. The structure factor correlation is optimized if the atomic B-factors are increased by an amount equal to the expected RMS error squared times the factor 8p 2 /3, 13 which has the effect of smearing the atoms over their distribution of possible positions. If the errors in the models could be estimated reasonably well and used to adjust the atomic B-factors, molecular replacement would be significantly more successful. CONCLUSION A number of groups did well in the HA/TBM category. Group 556 (LEE) stood out as the only group that performed near the top according to all criteria investigated: fold quality (particularly GDT-HA), side-chain rotamer quality, and molecular replacement model quality. There is good evidence that modeling adds value to the starting templates, at least for predicting the overall fold. The fold prediction scores (either GDT-TS or LGA- S) are almost always better for the best models than for the best single templates. A large part of this improvement appears to come from effective use of multiple templates. First, the improvement in main-chain prediction is systematically lower for targets with single available templates. Second, the best models for targets with multiple templates appear to contain pieces derived from the more closely-related parts of different templates. There is less evidence that modeling adds value to templates for the purpose of molecular replacement, although this is partly an artefact of the selection criteria for entry into the HA/TBM category. There is real room for improvement in this application of comparative modeling, particularly if coordinate error estimates can be used to apply relative weights to different atoms in the model, through changes in the atomic B-factors. The attempt to use models for molecular replacement highlights an area that could potentially be improved: the modeling algorithms allow atoms in the template to move more than one might expect for closely-related structures, so that core side chains in the template often superimpose better on the target than on a model derived from that template. Perhaps Bayesian reasoning should play a greater role in modeling; if the probability that the positions of common atoms would change were taken into account, the score functions would include a penalty term for changes in conformation or the rotameric state of conserved residues. Traditionally, the CASP evaluation criteria focus on isolated monomers or even isolated domains. An examination of multimers and of proteins in which the domains differ in relative orientation from any available templates suggests that there is much room for improvement in the methods to pack domains and multimers. Some impetus for improvement might come from placing more weight on these aspects of the models in future CASPs. Finally, we wish to suggest that the criteria for entry into the HA/TBM category should be modified, so that the only criterion is the quality of the best submitted models, not the quality of the best available template. We appreciate the desire to isolate those structures with good templates to see how refinement methods could improve the small details, but the effect has been to eliminate the potentially more impressive cases where highly accurate models could be generated from poorer templates. This is the most likely explanation for the failure to see the improvements in models for molecular replacement that have been found in other studies. 10,12 We also wish to suggest that greater emphasis should be placed on the prediction of local and global model accuracy. It was possible to generate an excellent model when the sequence identity for the best template was as low as 36 PROTEINS DOI /prot

11 Assessment of HA/TBM Category in CASP7 13%, but it is not clear whether any of the predictors were aware that they had identified an exceptionally good template for that level of sequence identity. ACKNOWLEDGMENTS This work would not have been possible without the invaluable web-based facilities at the Protein Structure Prediction Center and the support provided by Andriy Kryshtafovych. Torsten Schwede and Anna Tramontano provided a gentle introduction to the philosophy behind CASP assessment. Iakes Ezkurdia and Michael Tress provided advice on evaluation of torsion angle differences. Michael Tress supplied GDT-TS scores for templates computed assuming a perfect sequence alignment. Addendum. The authors note that Table III and the discussion thereof includes only the subset of models with the highest LLG scores for molecular replacement trials on target T0290. There are other models with higher scores for the more conventional measures. REFERENCES 1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res 2000;28: Tress M, Ezkurdia I, Graña O,López G, Valencia A. Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins 2005;61(Suppl 7): Zemla A. LGA a method for finding 3D similarities in protein structures. Nucleic Acids Res 2003;31: Kryshtafovych A, Prlic A, Dmytriv Z, Daniluk P, Milostan M, Eyrich V, Hubbard T, Fidelis K. New tools and expanded data analysis capabilities at the Protein Structure Prediction Center. Proteins 2007;69(Suppl 8): Kleywegt GJ. Use of non-crystallographic symmetry in protein structure refinement. Acta Crystallogr Sect D 1996;52: McCoy AJ, Grosse-Kunstleve RW, Storoni LC, Read RJ. Likelihoodenhanced fast translation functions. Acta Crystallogr Sect D 2005; 61: Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J 1986;5: Baker D, Sali A. Protein structure prediction and structural genomics. Science 2001;294: Schwarzenbacher R, Godzik A, Grzechnik SK, Jaroszewski L. The importance of alignment accuracy for molecular replacement. Acta Crystallogr Sect D 2004;60: Claude J-B, Suhre K, Notredame C, Claverie J-M, Abergel C. CaspR: a web server for automated molecular replacement using homology modelling. Nucleic Acids Res 2004;32:W606 W Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993;234: Giorgetti A, Raimondo D, Miele AE, Tramontano A. Evaluating the usefulness of protein structure models for molecular replacement. Bioinformatics 2005;21:ii72 ii Read RJ. Structure factor probabilities for related structures. Acta Crystallogr Sect A 1990;46: DOI /prot PROTEINS 37

proteins CASP Progress Report Progress from CASP6 to CASP7 Andriy Kryshtafovych, 1 Krzysztof Fidelis, 1 and John Moult 2 *

proteins CASP Progress Report Progress from CASP6 to CASP7 Andriy Kryshtafovych, 1 Krzysztof Fidelis, 1 and John Moult 2 * proteins STRUCTURE O FUNCTION O BIOINFORMATICS CASP Progress Report Progress from CASP6 to CASP7 Andriy Kryshtafovych, 1 Krzysztof Fidelis, 1 and John Moult 2 * 1 Genome Center, University of California,

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

Template-Based Modeling of Protein Structure

Template-Based Modeling of Protein Structure Template-Based Modeling of Protein Structure David Constant Biochemistry 218 December 11, 2011 Introduction. Much can be learned about the biology of a protein from its structure. Simply put, structure

More information

RMS/Coverage Graphs: A Qualitative Method for Comparing Three-Dimensional Protein Structure Predictions

RMS/Coverage Graphs: A Qualitative Method for Comparing Three-Dimensional Protein Structure Predictions PROTEINS: Structure, Function, and Genetics Suppl 3:15 21 (1999) RMS/Coverage Graphs: A Qualitative Method for Comparing Three-Dimensional Protein Structure Predictions Tim J.P. Hubbard* Sanger Centre,

More information

ProcessingandEvaluationofPredictionsinCASP4

ProcessingandEvaluationofPredictionsinCASP4 PROTEINS: Structure, Function, and Genetics Suppl 5:13 21 (2001) DOI 10.1002/prot.10052 ProcessingandEvaluationofPredictionsinCASP4 AdamZemla, 1 ČeslovasVenclovas, 1 JohnMoult, 2 andkrzysztoffidelis 1

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

The typical end scenario for those who try to predict protein

The typical end scenario for those who try to predict protein A method for evaluating the structural quality of protein models by using higher-order pairs scoring Gregory E. Sims and Sung-Hou Kim Berkeley Structural Genomics Center, Lawrence Berkeley National Laboratory,

More information

Protein Structure Prediction

Protein Structure Prediction Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on

More information

Measuring quaternary structure similarity using global versus local measures.

Measuring quaternary structure similarity using global versus local measures. Supplementary Figure 1 Measuring quaternary structure similarity using global versus local measures. (a) Structural similarity of two protein complexes can be inferred from a global superposition, which

More information

Better Bond Angles in the Protein Data Bank

Better Bond Angles in the Protein Data Bank Better Bond Angles in the Protein Data Bank C.J. Robinson and D.B. Skillicorn School of Computing Queen s University {robinson,skill}@cs.queensu.ca Abstract The Protein Data Bank (PDB) contains, at least

More information

Molecular replacement. New structures from old

Molecular replacement. New structures from old Molecular replacement New structures from old The Phase Problem phase amplitude Phasing by molecular replacement Phases can be calculated from atomic model Rotate and translate related structure Models

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6

TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6 PROTEINS: Structure, Function, and Bioinformatics Suppl 7:91 98 (2005) TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6 Yang Zhang, Adrian K. Arakaki, and Jeffrey

More information

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007 Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline

More information

As of December 30, 2003, 23,000 solved protein structures

As of December 30, 2003, 23,000 solved protein structures The protein structure prediction problem could be solved using the current PDB library Yang Zhang and Jeffrey Skolnick* Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington Street,

More information

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE Examples of Protein Modeling Protein Modeling Visualization Examination of an experimental structure to gain insight about a research question Dynamics To examine the dynamics of protein structures To

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Prediction and refinement of NMR structures from sparse experimental data

Prediction and refinement of NMR structures from sparse experimental data Prediction and refinement of NMR structures from sparse experimental data Jeff Skolnick Director Center for the Study of Systems Biology School of Biology Georgia Institute of Technology Overview of talk

More information

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society 1 of 5 1/30/00 8:08 PM Protein Science (1997), 6: 246-248. Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society FOR THE RECORD LPFC: An Internet library of protein family

More information

research papers Detecting outliers in non-redundant diffraction data 1. Introduction Randy J. Read

research papers Detecting outliers in non-redundant diffraction data 1. Introduction Randy J. Read Acta Crystallographica Section D Biological Crystallography ISSN 0907-4449 Detecting outliers in non-redundant diffraction data Randy J. Read Department of Haematology, University of Cambridge, Cambridge

More information

FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach

FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach Shirley Hui and Forbes J. Burkowski University of Waterloo, 200 University Avenue W., Waterloo, Canada ABSTRACT A topic

More information

proteins Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * INTRODUCTION

proteins Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * INTRODUCTION proteins STRUCTURE O FUNCTION O BIOINFORMATICS Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * 1 Department of Biological Sciences, College

More information

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Supporting Information Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Christophe Schmitz, Robert Vernon, Gottfried Otting, David Baker and Thomas Huber Table S0. Biological Magnetic

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

More information

Protein structure analysis. Risto Laakso 10th January 2005

Protein structure analysis. Risto Laakso 10th January 2005 Protein structure analysis Risto Laakso risto.laakso@hut.fi 10th January 2005 1 1 Summary Various methods of protein structure analysis were examined. Two proteins, 1HLB (Sea cucumber hemoglobin) and 1HLM

More information

PDBe TUTORIAL. PDBePISA (Protein Interfaces, Surfaces and Assemblies)

PDBe TUTORIAL. PDBePISA (Protein Interfaces, Surfaces and Assemblies) PDBe TUTORIAL PDBePISA (Protein Interfaces, Surfaces and Assemblies) http://pdbe.org/pisa/ This tutorial introduces the PDBePISA (PISA for short) service, which is a webbased interactive tool offered by

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years. Structure Determination and Sequence Analysis The vast majority of the experimentally determined three-dimensional protein structures have been solved by one of two methods: X-ray diffraction and Nuclear

More information

GC and CELPP: Workflows and Insights

GC and CELPP: Workflows and Insights GC and CELPP: Workflows and Insights Xianjin Xu, Zhiwei Ma, Rui Duan, Xiaoqin Zou Dalton Cardiovascular Research Center, Department of Physics and Astronomy, Department of Biochemistry, & Informatics Institute

More information

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology

Tools for Cryo-EM Map Fitting. Paul Emsley MRC Laboratory of Molecular Biology Tools for Cryo-EM Map Fitting Paul Emsley MRC Laboratory of Molecular Biology April 2017 Cryo-EM model-building typically need to move more atoms that one does for crystallography the maps are lower resolution

More information

TLS and all that. Ethan A Merritt. CCP4 Summer School 2011 (Argonne, IL) Abstract

TLS and all that. Ethan A Merritt. CCP4 Summer School 2011 (Argonne, IL) Abstract TLS and all that Ethan A Merritt CCP4 Summer School 2011 (Argonne, IL) Abstract We can never know the position of every atom in a crystal structure perfectly. Each atom has an associated positional uncertainty.

More information

Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling

Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling 63:644 661 (2006) Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling Brajesh K. Rai and András Fiser* Department of Biochemistry

More information

Validation of Experimental Crystal Structures

Validation of Experimental Crystal Structures Validation of Experimental Crystal Structures Aim This use case focuses on the subject of validating crystal structures using tools to analyse both molecular geometry and intermolecular packing. Introduction

More information

Some Problems from Enzyme Families

Some Problems from Enzyme Families Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems

More information

A profile-based protein sequence alignment algorithm for a domain clustering database

A profile-based protein sequence alignment algorithm for a domain clustering database A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling

More information

Likelihood and SAD phasing in Phaser. R J Read, Department of Haematology Cambridge Institute for Medical Research

Likelihood and SAD phasing in Phaser. R J Read, Department of Haematology Cambridge Institute for Medical Research Likelihood and SAD phasing in Phaser R J Read, Department of Haematology Cambridge Institute for Medical Research Concept of likelihood Likelihood with dice 4 6 8 10 Roll a seven. Which die?? p(4)=p(6)=0

More information

Universal Similarity Measure for Comparing Protein Structures

Universal Similarity Measure for Comparing Protein Structures Marcos R. Betancourt Jeffrey Skolnick Laboratory of Computational Genomics, The Donald Danforth Plant Science Center, 893. Warson Rd., Creve Coeur, MO 63141 Universal Similarity Measure for Comparing Protein

More information

Preparing a PDB File

Preparing a PDB File Figure 1: Schematic view of the ligand-binding domain from the vitamin D receptor (PDB file 1IE9). The crystallographic waters are shown as small spheres and the bound ligand is shown as a CPK model. HO

More information

Motif Prediction in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions

More information

Electronic Supplementary Information (ESI) for Chem. Commun. Unveiling the three- dimensional structure of the green pigment of nitrite- cured meat

Electronic Supplementary Information (ESI) for Chem. Commun. Unveiling the three- dimensional structure of the green pigment of nitrite- cured meat Electronic Supplementary Information (ESI) for Chem. Commun. Unveiling the three- dimensional structure of the green pigment of nitrite- cured meat Jun Yi* and George B. Richter- Addo* Department of Chemistry

More information

Protein structures and comparisons ndrew Torda Bioinformatik, Mai 2008

Protein structures and comparisons ndrew Torda Bioinformatik, Mai 2008 Protein structures and comparisons ndrew Torda 67.937 Bioinformatik, Mai 2008 Ultimate aim how to find out the most about a protein what you can get from sequence and structure information On the way..

More information

Summary of Experimental Protein Structure Determination. Key Elements

Summary of Experimental Protein Structure Determination. Key Elements Programme 8.00-8.20 Summary of last week s lecture and quiz 8.20-9.00 Structure validation 9.00-9.15 Break 9.15-11.00 Exercise: Structure validation tutorial 11.00-11.10 Break 11.10-11.40 Summary & discussion

More information

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics. Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond

More information

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy 7.91 Amy Keating Solving structures using X-ray crystallography & NMR spectroscopy How are X-ray crystal structures determined? 1. Grow crystals - structure determination by X-ray crystallography relies

More information

Protein structure similarity based on multi-view images generated from 3D molecular visualization

Protein structure similarity based on multi-view images generated from 3D molecular visualization Protein structure similarity based on multi-view images generated from 3D molecular visualization Chendra Hadi Suryanto, Shukun Jiang, Kazuhiro Fukui Graduate School of Systems and Information Engineering,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature11054 Supplementary Fig. 1 Sequence alignment of Na v Rh with NaChBac, Na v Ab, and eukaryotic Na v and Ca v homologs. Secondary structural elements of Na v Rh are indicated above the

More information

Analysis and Prediction of Protein Structure (I)

Analysis and Prediction of Protein Structure (I) Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng

More information

PROTEIN STRUCTURE PREDICTION II

PROTEIN STRUCTURE PREDICTION II PROTEIN STRUCTURE PREDICTION II Jeffrey Skolnick 1,2 Yang Zhang 1 Because the molecular function of a protein depends on its three dimensional structure, which is often unknown, protein structure prediction

More information

Prediction of Protein Backbone Structure by Preference Classification with SVM

Prediction of Protein Backbone Structure by Preference Classification with SVM Prediction of Protein Backbone Structure by Preference Classification with SVM Kai-Yu Chen #, Chang-Biau Yang #1 and Kuo-Si Huang & # National Sun Yat-sen University, Kaohsiung, Taiwan & National Kaohsiung

More information

Supplementing information theory with opposite polarity of amino acids for protein contact prediction

Supplementing information theory with opposite polarity of amino acids for protein contact prediction Supplementing information theory with opposite polarity of amino acids for protein contact prediction Yancy Liao 1, Jeremy Selengut 1 1 Department of Computer Science, University of Maryland - College

More information

Direct Method. Very few protein diffraction data meet the 2nd condition

Direct Method. Very few protein diffraction data meet the 2nd condition Direct Method Two conditions: -atoms in the structure are equal-weighted -resolution of data are higher than the distance between the atoms in the structure Very few protein diffraction data meet the 2nd

More information

Maximum Likelihood. Maximum Likelihood in X-ray Crystallography. Kevin Cowtan Kevin Cowtan,

Maximum Likelihood. Maximum Likelihood in X-ray Crystallography. Kevin Cowtan Kevin Cowtan, Maximum Likelihood Maximum Likelihood in X-ray Crystallography Kevin Cowtan cowtan@ysbl.york.ac.uk Maximum Likelihood Inspired by Airlie McCoy's lectures. http://www-structmed.cimr.cam.ac.uk/phaser/publications.html

More information

proteins Effect of using suboptimal alignments in template-based protein structure prediction Hao Chen 1 and Daisuke Kihara 1,2,3 * INTRODUCTION

proteins Effect of using suboptimal alignments in template-based protein structure prediction Hao Chen 1 and Daisuke Kihara 1,2,3 * INTRODUCTION proteins STRUCTURE O FUNCTION O BIOINFORMATICS Effect of using suboptimal alignments in template-based protein structure prediction Hao Chen 1 and Daisuke Kihara 1,2,3 * 1 Department of Biological Sciences,

More information

An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures

An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures Protein Engineering vol.10 no.6 pp.737 741, 1997 PROTOCOL An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures Lawrence A.Kelley, Stephen P.Gardner

More information

proteins SHORT COMMUNICATION MALIDUP: A database of manually constructed structure alignments for duplicated domain pairs

proteins SHORT COMMUNICATION MALIDUP: A database of manually constructed structure alignments for duplicated domain pairs J_ID: Z7E Customer A_ID: 21783 Cadmus Art: PROT21783 Date: 25-SEPTEMBER-07 Stage: I Page: 1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS SHORT COMMUNICATION MALIDUP: A database of manually constructed

More information

Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates

Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates MARIUSZ MILIK, 1 *, ANDRZEJ KOLINSKI, 1, 2 and JEFFREY SKOLNICK 1 1 The Scripps Research Institute, Department of Molecular

More information

Principal Moderator s Report

Principal Moderator s Report Principal Moderator s Report Centres are reminded that the deadline for coursework marks (and scripts if there are 10 or fewer from the centre) is December 10 for this specification. Moderators were pleased

More information

proteins Prediction Methods and Reports

proteins Prediction Methods and Reports proteins STRUCTURE O FUNCTION O BIOINFORMATICS Prediction Methods and Reports Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based

More information

Generalized Method of Determining Heavy-Atom Positions Using the Difference Patterson Function

Generalized Method of Determining Heavy-Atom Positions Using the Difference Patterson Function Acta Cryst. (1987). A43, 1- Generalized Method of Determining Heavy-Atom Positions Using the Difference Patterson Function B THOMAS C. TERWILLIGER* AND SUNG-Hou KIM Department of Chemistry, University

More information

STRUCTURAL BIOINFORMATICS I. Fall 2015

STRUCTURAL BIOINFORMATICS I. Fall 2015 STRUCTURAL BIOINFORMATICS I Fall 2015 Info Course Number - Classification: Biology 5411 Class Schedule: Monday 5:30-7:50 PM, SERC Room 456 (4 th floor) Instructors: Vincenzo Carnevale - SERC, Room 704C;

More information

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein

More information

Introducing Hippy: A visualization tool for understanding the α-helix pair interface

Introducing Hippy: A visualization tool for understanding the α-helix pair interface Introducing Hippy: A visualization tool for understanding the α-helix pair interface Robert Fraser and Janice Glasgow School of Computing, Queen s University, Kingston ON, Canada, K7L3N6 {robert,janice}@cs.queensu.ca

More information

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO! Identification of Protein-model accuracy Why is it important? What is accuracy RMSD, fraction correct, Protein model correctness/quality

More information

Analyzing six types of protein-protein interfaces. Yanay Ofran and Burkhard Rost

Analyzing six types of protein-protein interfaces. Yanay Ofran and Burkhard Rost Analyzing six types of protein-protein interfaces Yanay Ofran and Burkhard Rost Goal of the paper To check 1. If there is significant difference in amino acid composition in various interfaces of protein-protein

More information

NMR, X-ray Diffraction, Protein Structure, and RasMol

NMR, X-ray Diffraction, Protein Structure, and RasMol NMR, X-ray Diffraction, Protein Structure, and RasMol Introduction So far we have been mostly concerned with the proteins themselves. The techniques (NMR or X-ray diffraction) used to determine a structure

More information

Acta Crystallographica Section D

Acta Crystallographica Section D Supporting information Acta Crystallographica Section D Volume 70 (2014) Supporting information for article: Structural characterization of the virulence factor Nuclease A from Streptococcus agalactiae

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html

More information

High-resolution structure prediction and the crystallographic phase problem

High-resolution structure prediction and the crystallographic phase problem Vol 450 8 November 2007 doi:10.1038/nature06249 ARTICLES High-resolution structure prediction and the crystallographic phase problem Bin Qian 1 *, Srivatsan Raman 1 *, Rhiju Das 1 *, Philip Bradley 1,

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information

Crystal Structure Prediction using CRYSTALG program

Crystal Structure Prediction using CRYSTALG program Crystal Structure Prediction using CRYSTALG program Yelena Arnautova Baker Laboratory of Chemistry and Chemical Biology, Cornell University Problem of crystal structure prediction: - theoretical importance

More information

The use of Refmac crystallographic refinement program for the detection of alternative conformations in biological macromolecules

The use of Refmac crystallographic refinement program for the detection of alternative conformations in biological macromolecules Mathematical Biology and Bioinformatics. 2012. V. 7. 2. P. t16-t24. URL: http://www.matbio.org/2012/sobolev_7_t16.pdf Translation from original Russian text Sobolev O.V., Lunin V.Y., 2012 published in

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

Modeling for 3D structure prediction

Modeling for 3D structure prediction Modeling for 3D structure prediction What is a predicted structure? A structure that is constructed using as the sole source of information data obtained from computer based data-mining. However, mixing

More information

Structural Bioinformatics

Structural Bioinformatics arxiv:1712.00425v1 [q-bio.bm] 1 Dec 2017 Structural Bioinformatics Sanne Abeln K. Anton Feenstra Centre for Integrative Bioinformatics (IBIVU), and Department of Computer Science, Vrije Universiteit, De

More information

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment Molecular Modeling 2018-- Lecture 7 Homology modeling insertions/deletions manual realignment Homology modeling also called comparative modeling Sequences that have similar sequence have similar structure.

More information

Assessment of the model refinement category in CASP12

Assessment of the model refinement category in CASP12 Received: 19 June 2017 Revised: 3 October 2017 Accepted: 24 October 2017 DOI: 10.1002/prot.25409 RESEARCH ARTICLE Assessment of the model refinement category in CASP12 Ladislav Hovan 1 * Vladimiras Oleinikovas

More information

Ab initio molecular-replacement phasing for symmetric helical membrane proteins

Ab initio molecular-replacement phasing for symmetric helical membrane proteins Acta Crystallographica Section D Biological Crystallography ISSN 0907-4449 Editors: E. N. Baker and Z. Dauter Ab initio molecular-replacement phasing for symmetric helical membrane proteins Pavel Strop,

More information

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB) Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein

More information

Reconstruction of Protein Backbone with the α-carbon Coordinates *

Reconstruction of Protein Backbone with the α-carbon Coordinates * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 26, 1107-1119 (2010) Reconstruction of Protein Backbone with the α-carbon Coordinates * JEN-HUI WANG, CHANG-BIAU YANG + AND CHIOU-TING TSENG Department of

More information

MR model selection, preparation and assessing the solution

MR model selection, preparation and assessing the solution Ronan Keegan CCP4 Group MR model selection, preparation and assessing the solution DLS-CCP4 Data Collection and Structure Solution Workshop 2018 Overview Introduction Step-by-step guide to performing Molecular

More information

Basics of protein structure

Basics of protein structure Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu

More information

Contact map guided ab initio structure prediction

Contact map guided ab initio structure prediction Contact map guided ab initio structure prediction S M Golam Mortuza Postdoctoral Research Fellow I-TASSER Workshop 2017 North Carolina A&T State University, Greensboro, NC Outline Ab initio structure prediction:

More information

SUPPLEMENTARY FIGURES. Structure of the cholera toxin secretion channel in its. closed state

SUPPLEMENTARY FIGURES. Structure of the cholera toxin secretion channel in its. closed state SUPPLEMENTARY FIGURES Structure of the cholera toxin secretion channel in its closed state Steve L. Reichow 1,3, Konstantin V. Korotkov 1,3, Wim G. J. Hol 1$ and Tamir Gonen 1,2$ 1, Department of Biochemistry

More information

VC 2009 Wiley-Liss, Inc. INTRODUCTION

VC 2009 Wiley-Liss, Inc. INTRODUCTION proteins STRUCTURE O FUNCTION O BIOINFORMATICS TEMPLATE BASED ASSESSMENT The other 90% of the protein: Assessment beyond the Cas for CASP8 template-based and high-accuracy models Daniel A. Keedy, 1 Christopher

More information

proteins Prediction Report Template-based modeling and free modeling by I-TASSER in CASP7 Yang Zhang* 108 PROTEINS VC 2007 WILEY-LISS, INC.

proteins Prediction Report Template-based modeling and free modeling by I-TASSER in CASP7 Yang Zhang* 108 PROTEINS VC 2007 WILEY-LISS, INC. proteins STRUCTURE O FUNCTION O BIOINFORMATICS Prediction Report Template-based modeling and free modeling by I-TASSER in CASP7 Yang Zhang* Center for Bioinformatics, Department of Molecular Biosciences,

More information

Supporting Information. Synthesis of Aspartame by Thermolysin : An X-ray Structural Study

Supporting Information. Synthesis of Aspartame by Thermolysin : An X-ray Structural Study Supporting Information Synthesis of Aspartame by Thermolysin : An X-ray Structural Study Gabriel Birrane, Balaji Bhyravbhatla, and Manuel A. Navia METHODS Crystallization. Thermolysin (TLN) from Calbiochem

More information

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models Protein Modeling Generating, Evaluating and Refining Protein Homology Models Troy Wymore and Kristen Messinger Biomedical Initiatives Group Pittsburgh Supercomputing Center Homology Modeling of Proteins

More information

Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization

Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization Biophysical Journal Volume 101 November 2011 2525 2534 2525 Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization Dong Xu and Yang Zhang

More information

NGF - twenty years a-growing

NGF - twenty years a-growing NGF - twenty years a-growing A molecule vital to brain growth It is twenty years since the structure of nerve growth factor (NGF) was determined [ref. 1]. This molecule is more than 'quite interesting'

More information