proteins High Accuracy Assessment Assessment of CASP7 predictions in the high accuracy template-based modeling category

Size: px

Start display at page:

Download "proteins High Accuracy Assessment Assessment of CASP7 predictions in the high accuracy template-based modeling category"

Agatha Shaw
5 years ago
Views:

1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS High Accuracy Assessment Assessment of CASP7 predictions in the high accuracy template-based modeling category Randy J. Read* and Gayatri Chavali Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom ABSTRACT Models for target domains in the high accuracy templatebased modeling category were assessed according to a number of criteria evaluating the quality of the main-chain prediction (GDT-HA), predicted sequence alignment (AL0), and side-chain rotameric state. A new criterion was introduced, the quality of the model for use in solving a crystal structure by molecular replacement. There is good evidence that modeling adds value to the template structures, particularly when multiple templates are available. However, when there is already a good template, few of the models are better for the purpose of molecular replacement. Proteins 2007; 69(Suppl 8): VC 2007 Wiley-Liss, Inc. Key words: HA/TBM; structure prediction; comparative modeling; molecular replacement. INTRODUCTION Though the Protein Databank 1 continues to grow exponentially, structural biology cannot keep up with the explosion of gene sequence information. On the other hand, to fully exploit the understanding of biochemistry and disease-associated mutations that can be deduced from sequence information, it is essential to build on the framework of structural information. Comparative modeling allows the gap between the sequence and structure databases to be spanned when a suitable template can be identified. However, the quality of the comparative model depends strongly on the quality of the template and, of course, on the quality of the modeling algorithms. A perennial issue in comparative modeling is the question of added value: to what extent does the model add information beyond the statement that the target resembles the template? In CASP6, it was concluded that for the easier targets, the difficulty was with refinement methods to improve on the template, and it was suggested that more attention should be paid to this issue to allow better evaluation of the impact of refinement in such cases. 2 For this reason, the high accuracy template-based modeling (HA/ TBM) category was introduced for CASP7. Targets were assigned to this category after predictions were closed on the basis of two criteria. First, to ensure that there was a good template in the PDB at the time predictions were made, structural superpositions had to identify at least one template with an LGA-S score 3 of greater than 80. Second, to ensure that it was possible to construct a good model, it was required that at least one model must give a GDT-TS score 3 of greater than 80. METHODS Only the first model submitted by each group was evaluated in the group comparisons. As far as possible, we have based our assessments on criteria developed in earlier CASPs, with help from earlier assessors in implementing those criteria. Because the The authors state no conflict of interest. Grant sponsor: Wellcome Trust (UK); Grant number: *Correspondence to: Randy J. Read, Cambridge Institute for Medical Research, Wellcome Trust/MRC Building, Hills Road, Cambridge CB2 0XY, UK. rjr27@cam.ac.uk Received 11 April 2007; Revised 29 May 2007; Accepted 10 June 2007 Published online 25 September 2007 in Wiley InterScience ( DOI: /prot VC 2007 WILEY-LISS, INC. PROTEINS 27

2 R.J. Read and G. Chavali focus is on models of higher accuracy, we have emphasized the more stringent versions of assessment criteria, when there is a choice of stringency. Criteria computed at Protein Structure Prediction Center Raw scores for many of the possible criteria were computed at the Protein Structure Prediction Center in the Genome Center at the University of California (Davis), and these were maintained on a web page accessible to assessors. The scores used in this work were computed using results from the program LGA 3 : AL0 (alignment score based on the superposition obtained with LGA), LGA-S (sequence independent superposition score), and the GDT scores (sequence-dependent superposition scores). The GDT scores come in different varieties depending on the stringency with which Ca atoms must be aligned; we computed Z-scores for the numerical evaluation with the high accuracy version (GDT-HA), which uses threshold distances half the size of those used for the standard version (GDT-TS), and is thus more stringent. Details of the statistics are available from the paper in this issue describing the facilities provided by the Protein Structure Prediction Center. 4 Rotamer prediction quality The quality of side-chain prediction was evaluated by comparing torsion angles between the model and the target. Torsion angles differences were computed using the program LSQMAN. 5 Where side chains were missing from a model, the torsion angles were classified as incorrect. Four raw scores were evaluated: the fraction of residues with v 1 angles predicted within either 158 or 308 and the fraction of residues with both v 1 and v 2 predicted within either 158 or 308. In this case, greater emphasis was placed on the less stringent criteria (308 tolerance) because of the uncertainty in the experimental values of the torsion angles. Although there is greater uncertainty in the torsion angles of surface residues, all residues were included in the analysis to increase the number of observations. Suitability for molecular replacement As a new criterion, we have introduced a measure for how well a model can be used to solve target crystal structures by molecular replacement. The program Phaser 6 uses likelihood methods to solve crystal structures. For each potential solution it reports a log-likelihood-gain (LLG) score, which measures how well the model agrees with the data. Although Phaser can be used to solve structures with multiple components, the computing time rises significantly compared with structures with a single component, so only those targets that were determined by crystallography and have a single copy of a single-domain protein in the asymmetric unit were evaluated. The likelihood function requires, as a parameter, an estimate of the expected RMS deviation of the model from the target; for our tests, we used the value predicted from a correlation between sequence identity and main-chain coordinate error, 7 using the sequence identity from the most closely-related template available in the PDB at the time of prediction. Z-scores For numerical evaluation, the raw scores were converted to Z-scores, as described by Tress et al. 2 The Z-scores were computed in two passes. In a first pass, the mean and standard deviation of the raw scores for the first models submitted by all the groups for a target were evaluated. In a second pass, models worse than two standard deviations below the mean were eliminated in computing a revised mean and standard deviation, which was then used as the basis for the final Z-score, where all negative Z-scores were then assigned as zero. RESULTS AND DISCUSSION Table I presents a summary of the numerical ranking results for all the groups that submitted predictions for HA/TBM targets in CASP7. Targets In CASP7, predictions for a total of 95 targets were evaluated by assessors (Clarke et al., this volume). A number of these had multiple domains, so there was a total of 123 target domains. Of these, 28 domains from 24 targets were assigned to the HA/TBM category. The structure of one of these (T0302) was withdrawn and replaced during the evaluation period, so it was omitted from the analysis presented here. Domain 2 of target T0303 was included in the analysis because it obeyed the strict criteria for entrance into the HA/TBM category, but it was omitted from the official HA/TBM list (Clarke et al., this volume) because only one model had a GDT- TS score above 80. All of the structures analyzed here were determined by X-ray crystallography. Although most of the targets had potential templates with high levels of sequence identity, as one would expect given the criteria for entrance into the HA/TBM category, there was one domain (domain 2 of T0303) where the most closely-related template was only 13% identical in sequence. It is instructive to look at the effect of sequence identity on the probability that a template will show a high level of structural similarity (measured in this work by whether the LGA-S score was greater than 0.8). Figure 1 shows, as a function of sequence identity for the most closely-related template, the fraction of domains assigned to the HA/TBM category. As one 28 PROTEINS DOI /prot

3 Assessment of HA/TBM Category in CASP7 Table I Detailed Results by Group a Group n HA Mean GDT-HA Z-score Mean AL0 Z-score Mean v 1 Z-score Mean v 1 /v 2 Z-score n MR Mean LLG Z-score Sum TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS (Continued ) DOI /prot PROTEINS 29

4 R.J. Read and G. Chavali Table I Continued Group n HA Mean GDT-HA Z-score Mean AL0 Z-score Mean v 1 Z-score Mean v 1 /v 2 Z-score n MR Mean LLG Z-score Sum TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS AL TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS AL TS TS AL TS TS TS AL TS TS TS TS AL TS TS TS TS AL TS AL AL (Continued ) 30 PROTEINS DOI /prot

5 Assessment of HA/TBM Category in CASP7 Table I Continued Group n HA Mean GDT-HA Z-score Mean AL0 Z-score Mean v 1 Z-score Mean v 1 /v 2 Z-score n MR Mean LLG Z-score Sum TS TS TS TS AL AL TS TS TS TS TS TS TS AL TS AL TS TS TS TS TS TS TS AL TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS TS AL TS TS TS TS a In the group name, AL indicates that only alignment predictions were submitted, whereas TS indicates that atomic coordinates were submitted. Only one group (706; TENETA) submitted predictions in both categories and hence appears twice in this list. The translation from group number to group name can be found on the Protein Structure Prediction Center web page, at n HA is the number of predictions submitted for target domains assigned to the HA/TBM category, and n MR is the number of predictions submitted for targets tested for molecular replacement. The computation of Z-scores for GDT- HA, AL0, correct v 1, correct v 1 / v 2 pairs, and molecular replacement LLG score is explained in the text. Groups are sorted by the sum of their mean Z-scores for GDT- HA, correct v 1 /v 2 pairs and LLG. would expect, there is a good correlation between sequence identity and the probability of a high LGA-S score. Strikingly, there is a big jump at about 30% sequence identity, which agrees well with anecdotal evidence that with a model 30% identical in sequence or better, there is an excellent chance of solving a crystal structure by molecular replacement. Similarly, it has been suggested that template-based modeling generally increases substantially in accuracy at the level of 30% sequence identity. 8 Nonetheless, there is a significant number of target domains with good templates at lower sequence identities, agreeing with the finding that molecular replacement will succeed in at least some cases with distantly-related models. 9 DOI /prot PROTEINS 31

R.J. Read and G. Chavali Figure 1 Fraction of domains assigned to HA/TBM category, as a function of sequence identity for most closely-related template.

6 R.J. Read and G. Chavali Figure 1 Fraction of domains assigned to HA/TBM category, as a function of sequence identity for most closely-related template. Alignment accuracy Alignment accuracy correlates very strongly with the other measures of model quality (results not shown), which is not surprising as a good alignment is an essential prerequisite to building a good model. What is more notable is that the groups submitting alignment-only (AL) predictions did much more poorly in the AL0 alignment score than groups submitting full structural predictions. It is conceivable that this reflects use of less sophisticated methods by groups restricting their efforts to alignment. However, some of the difference will arise from incompleteness of the AL models, which by their nature lack residues arising from insertions relative to the template. Another explanation is that the attempt to build a plausible model provides a good test for the hypothesis that a particular sequence alignment is correct. sequence alignment. The sequence alignment from the LGA structural superposition between the target and the template was used to construct a model by replacing the side chains on the template. A GDT-TS score was then computed for this model using LGA. Scores computed on this basis were kindly supplied by Michael Tress. Second, we looked at the LGA-S (sequence independent) scores for the models. The second method is preferred, as it does not penalize models for the imperfection of alignment algorithms. Both methods penalize models for which the best possible template was not identified. The data in Figure 3 demonstrate that the best model consistently improves on the best template. In general, a roughly constant fraction of the difference between the template and the target is removed in the best models. Value could be added to the best single template in a number of ways. One would be through refinement methods, which would be difficult to assess from these data. A second would be by assembling the model from multiple templates. One indication that this is an important factor is that the targets with a single template (highlighted in Fig. 3) show less improvement in LGA-S score on average than those with more than one template. In addition, there is good evidence that the use of multiple templates had a significant impact in the construction of the best models for target T0315 (also highlighted in Fig. 3). The best overall template for this target is chain A of PDB entry 1J6O, although an examination of local Ca deviations shows that the region around residues is poorly conserved. (An analysis of deviations from templates and models can be viewed on the CASP7 web site at Casp7.html). Chains A and B of PDB entry 1YIX show much smaller deviations for residues 20 25, although they model residues more poorly. The best models Quality of fold prediction The evaluation of fold prediction concentrated on the GDT-HA score, a variant of GDT-TS with lowered thresholds that make it more sensitive to fine details. Judged by mean Z-values of GDT-HA scores, group 556 (LEE) had the best overall performance (Fig. 2). Value added to fold prediction One way to assess the value added to the template is to consider whether the fold prediction scores are better for the models than for the templates on which they were based. A complication is that the model is based on an explicit sequence alignment, whereas the sequence alignment for the template must be inferred, preferably from the structural alignment. We investigated two ways of dealing with this. First, we looked at the GDT-TS scores that would be assigned to the template given a perfect Figure 2 Top 20 mean Z-scores for GDT-HA criterion. 32 PROTEINS DOI /prot

$Assessment of HA/TBM Category in CASP7 Figure 3 Measuring value added by fraction of potential improvement in LGA-S that was achieved.$

7 Assessment of HA/TBM Category in CASP7 Figure 3 Measuring value added by fraction of potential improvement in LGA-S that was achieved. A perfect model would have an LGA-S score of 100, so the fraction of potential improvement is defined as (LGA-S model 2 LGA-S template )/ (100 2 LGA-S template ). This is plotted as function of LGA-S for the most closelyrelated template. Points corresponding to targets with a single good template are highlighted with circles, while the point corresponding to target T0315 is highlighted with a diamond. Figure 4 Comparison of the structures of the target T0315 (green), the model from group 137 (3Dpro; grey) and two possible templates, PDB entry 1J6O (cyan) and PDB entry 1YIX (magenta), in the region of residues for this target resemble 1J6O for residues and 1YIX for residues 20 25, and annotations indicate that both templates have indeed been used simultaneously in each of the models. Figure 4 compares the structures of the target, the two templates, and the model with the highest GDT-TS score, which was submitted by group 137 (3Dpro). Model ranking Successful structure prediction is a combination of two factors: model generation, in which the space of possible conformations is sampled, and model ranking, in which the possible conformations are scored to find the best solution. Since most participating groups submitted five ranked predictions for each target, it is possible to gain some insight into how well they do in ranking those submissions. In general, there is a good correlation between the rank assigned to a model and the quality of the model. Table IIa shows the mean Z-scores for GDT-HA as a function of model rank, for all groups who submitted five models for at least one target, and for the subsets of the 20 or 50 groups with the highest mean GDT-HA scores for their first model. However, there is still room for improvement in the rankings. Table IIb compares the mean GDT-HA Z- scores for the first model and the best model, for the 10 groups with highest GDT-HA scores. Perfect ranking would have improved the results for all groups. Torsion angle accuracy As expected, the accuracy with which the side-chain rotamers can be predicted depends on the level of sequence identity with the available templates, as the rotamers for residues identical between template and target are very likely to be the same, particularly for closelyrelated structures. Figure 5 shows the correlation between the fraction of v 1 angles predicted within 308 for the best model for each target and the sequence identity for the most closely-related template. Extrapolating the trend to a sequence identity of zero suggests that over 60% of v 1 angles would still be predicted within 308, probably because of a combination of the information from Table II Correlation of GDT-HA Score with Model Rank (a) Mean GDT-HA Z-scores by model rank for cases in which 5 models were submitted Selection Model 1 Model 2 Model 3 Model 4 Model 5 Top 20 groups Top 50 groups All groups (b) Mean GDT-HA Z-scores for model 1 and best submitted model for each target Group Model 1 Best Model Ratio of 1 to best TS556 (LEE) TS024 (Zhang) TS025 (Zhang-Server) TS136 (FOLDpro) TS137 (3Dpro) TS125 (TASSER) TS020 (Baker) TS675 (fams-ace) TS026 (SAMUDRALA) TS671 (fams-multi) DOI /prot PROTEINS 33

8 R.J. Read and G. Chavali There was a similar lack of success in predicting relative orientations of domains. Four targets were split into two domains that were each assigned to the HA/TBM category (T0292, T0295, T0303, and T0324). None of the models submitted for any of these targets improved on the relative domain orientation from the available templates. Quality of models for molecular replacement Figure 5 Correlation between fraction of v 1 angles predicted within 308 for the model with best v 1 accuracy and sequence identity of most closely-related template. A large fraction of crystal structures deposited in the PDB is solved using the molecular replacement method. In molecular replacement, an atomic model is rotated and translated to place it in the unit cell of the crystal of the target protein, allowing the unmeasured phase information to be estimated by phases computed from the model. The quality of the atomic model influences suc- rotamer preferences and the constraints introduced by the environment in the predicted fold. Figure 6 shows the mean Z-scores for the prediction of v 1 alone or of both v 1 and v 2. Group 191 (Schomburggroup) has the best results for rotamer accuracy, but it should be noted that this group only submitted predictions for 6 of the 28 target domains (Table I). Predicting relative orientations of domains or monomers For the purposes of assessment, the target proteins were split into domains or prediction units. Predictors were not expected to predict the conformations of N- or C-termini if their conformations appeared to be determined by crystal packing. Nor were they expected to predict the relative orientations of domains if the same relative orientation was not found in any of the available templates. Although they were told that the targets formed multimers when that information was available, evaluation of prediction accuracy focused on the monomers. Nonetheless, most of the models for multi-domain proteins included all the domains, and a small number of predictors submitted oligomeric predictions. We examined these visually, to determine whether any predictors had been able to predict the relative orientations of domains within a monomeric protein, or of monomers within a multimeric protein. There were 7 HA/TBM targets for which oligomeric predictions were submitted. Seven groups submitted predictions for five of these, and nine groups submitted predictions for the other two. Unfortunately, the predictions were only correct when the quaternary structure was clear from the templates, which was true for two of the targets (T0332 and T0339). Figure 6 (a) Top 20 mean Z-scores for fraction of residues for which v 1 angle is predicted within 308. (b) Top 20 mean Z-scores for fraction of residues for which both v 1 and v 2 are predicted within PROTEINS DOI /prot

9 Assessment of HA/TBM Category in CASP7 cess in two ways. First, better models give a stronger signal in the rotation and translation searches. Second, better models give more accurate phases, from which clearer electron density maps can be computed so that a final model can be obtained more easily. There is a largely untapped potential for the use of comparative models in molecular replacement. In the past, anecdotal evidence suggested that, rather than adding value, modeling often reduced the value of homologous protein structures for molecular replacement. For this reason, most crystallographers have been conservative about modeling, restricting themselves to trimming out poorly-conserved loops or side chains. Such editing operations can indeed improve models significantly for use in molecular replacement. 9 However, there have been signs that modeling algorithms have improved to the point that they can now be useful for molecular replacement. For instance, the CaspR web server 10 generates a number of potential models by producing alternative sequence alignments that are then used as input to MODELLER, 11 and it is often found that at least one of these models is better than the original template. Similarly, the Tramontano group 12 has reported that a number of models submitted to previous CASPs provide better molecular replacement models than the best single template. We therefore examined the models submitted for a number of HA/TBM targets, to assess which were best for molecular replacement and whether or not they improved on the best available template. Models were tested in Phaser 6 and scored using LLG. Of the 24 targets contributing HA/TBM domains, 12 are single-domain proteins with single copies and have diffraction data available through the PDB. All models (for control experiments and templates) were trimmed to contain only the residues in the domain definitions used for assessment, to avoid flexible termini and loops. It was disappointing to find that only 33 of 1588 models that were evaluated gave a higher LLG score than the best single template. For seven of the 12 targets, none of the models were better than the best single template. In contrast, the Tramontano group 12 found improvements on the template for five of seven selected targets from CASP5 and CASP6. The difference is likely to be in the selection criteria imposed for entry into the HA/TBM category, where it was required that there be a good template. This leaves less room for improvement in modeling. All 12 CASP7 targets tested here with molecular replacement calculations could be solved using template structures, and indeed, the PDB entries report that 15 of the 24 targets contributing HA/TBM domains were solved using molecular replacement. Since cases with poor templates but good models are excluded, the failure to see dramatic improvements in molecular replacement success is at least partly an artefact of the entry criteria for the HA/TBM category. Figure 7 Top 20 mean Z-scores for LLG criterion, for groups predicting at least 10 of 12 targets used for molecular replacement tests. The highest rate of success in improving on the best template was for group 249 (taylor), which provided a model better than the best template for 3 of the 4 targets for which they submitted models. Figure 7 shows that of the groups that submitted predictions for at least 10 of the 12 targets, the best overall performance was from group 338 (UCB-SHI), which also improved on the template for two cases. For target T0290, only group 249 (taylor) submitted a prediction that gave better molecular replacement results than the best template (PDB entry 1ihg, which is about 62% identical in sequence). The results in Table III show that there is little correlation of LLG with the conventional scores for model accuracy. Other groups submitted models with better GDT-HA (groups 556 LEE and 020 Baker) or LGA-S (groups 556, 020, and 536 Chen-Tan-Kihara) scores or even an equivalent RMS-ALL score (group 020), but their models gave worse results in molecular replacement than the template. This is probably because the effect of model error on molecular replacement success has a very different functional form than the common measures of model accuracy. The contribution of a structure factor to the LLG score depends on the complex correlation between the true structure factor and the one calculated from the model. We can simplify equation 34a of Read 13 by assuming that all atoms are equivalent, to obtain the following expression for structure factor correlation: " 8 X exp 2p2 jdr j j 2 9#, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi >: >; 3d 2 n total n model j In this expression, Dr j is the error in position of atom j, d is the Bragg spacing (resolution) of the structure factor, n total is the total number of atoms in the true structure DOI /prot PROTEINS 35

10 R.J. Read and G. Chavali Table III Comparison of Fold Quality Scores with Molecular Replacement LLG Scores for Best Models Submitted for Target T0290 Model LLG GDT-HA RMS-ALL LGA-S 1ihg TS249 (taylor) TS556 (LEE) TS020 (Baker) TS111 (panther) TS536 (Chen-Tan-Kihara) and n model is the number of atoms in the model. From this equation we can see that the positional error should ideally be much smaller than the Bragg spacing and that including an atom with a large positional error is worse than leaving it out entirely. The GDT scores, for instance, will be entirely insensitive to errors smaller than the smallest threshold, whereas the structure factor correlation will only be optimal for very small errors. So modeling algorithms that allow many small errors to accumulate could reproduce the fold well, according to scores such as GDT-HA, but reproduce the structure factors more poorly. A visual comparison of the models for T0290 suggests that group 249 (taylor) may have been more conservative in introducing changes to the core of the template than the other groups, particularly in the rotamers of conserved or conservatively substituted side chains. As discussed above, the general lack of improvement on the best templates can be blamed, at least in part, on the criteria for entry into the HA/TBM category. Nonetheless, it would be better if the modeling algorithms did no harm to good templates. The results suggest that the methods in general could afford to be more conservative in introducing changes to regions of high sequence identity. Predictors were asked to use the B-factor column of the PDB files to provide error estimates for individual atoms, but few groups did so. However, a measure of relative confidence in different parts of the model would be extremely useful for molecular replacement calculations. The structure factor correlation is optimized if the atomic B-factors are increased by an amount equal to the expected RMS error squared times the factor 8p 2 /3, 13 which has the effect of smearing the atoms over their distribution of possible positions. If the errors in the models could be estimated reasonably well and used to adjust the atomic B-factors, molecular replacement would be significantly more successful. CONCLUSION A number of groups did well in the HA/TBM category. Group 556 (LEE) stood out as the only group that performed near the top according to all criteria investigated: fold quality (particularly GDT-HA), side-chain rotamer quality, and molecular replacement model quality. There is good evidence that modeling adds value to the starting templates, at least for predicting the overall fold. The fold prediction scores (either GDT-TS or LGA- S) are almost always better for the best models than for the best single templates. A large part of this improvement appears to come from effective use of multiple templates. First, the improvement in main-chain prediction is systematically lower for targets with single available templates. Second, the best models for targets with multiple templates appear to contain pieces derived from the more closely-related parts of different templates. There is less evidence that modeling adds value to templates for the purpose of molecular replacement, although this is partly an artefact of the selection criteria for entry into the HA/TBM category. There is real room for improvement in this application of comparative modeling, particularly if coordinate error estimates can be used to apply relative weights to different atoms in the model, through changes in the atomic B-factors. The attempt to use models for molecular replacement highlights an area that could potentially be improved: the modeling algorithms allow atoms in the template to move more than one might expect for closely-related structures, so that core side chains in the template often superimpose better on the target than on a model derived from that template. Perhaps Bayesian reasoning should play a greater role in modeling; if the probability that the positions of common atoms would change were taken into account, the score functions would include a penalty term for changes in conformation or the rotameric state of conserved residues. Traditionally, the CASP evaluation criteria focus on isolated monomers or even isolated domains. An examination of multimers and of proteins in which the domains differ in relative orientation from any available templates suggests that there is much room for improvement in the methods to pack domains and multimers. Some impetus for improvement might come from placing more weight on these aspects of the models in future CASPs. Finally, we wish to suggest that the criteria for entry into the HA/TBM category should be modified, so that the only criterion is the quality of the best submitted models, not the quality of the best available template. We appreciate the desire to isolate those structures with good templates to see how refinement methods could improve the small details, but the effect has been to eliminate the potentially more impressive cases where highly accurate models could be generated from poorer templates. This is the most likely explanation for the failure to see the improvements in models for molecular replacement that have been found in other studies. 10,12 We also wish to suggest that greater emphasis should be placed on the prediction of local and global model accuracy. It was possible to generate an excellent model when the sequence identity for the best template was as low as 36 PROTEINS DOI /prot

11 Assessment of HA/TBM Category in CASP7 13%, but it is not clear whether any of the predictors were aware that they had identified an exceptionally good template for that level of sequence identity. ACKNOWLEDGMENTS This work would not have been possible without the invaluable web-based facilities at the Protein Structure Prediction Center and the support provided by Andriy Kryshtafovych. Torsten Schwede and Anna Tramontano provided a gentle introduction to the philosophy behind CASP assessment. Iakes Ezkurdia and Michael Tress provided advice on evaluation of torsion angle differences. Michael Tress supplied GDT-TS scores for templates computed assuming a perfect sequence alignment. Addendum. The authors note that Table III and the discussion thereof includes only the subset of models with the highest LLG scores for molecular replacement trials on target T0290. There are other models with higher scores for the more conventional measures. REFERENCES 1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res 2000;28: Tress M, Ezkurdia I, Graña O,López G, Valencia A. Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins 2005;61(Suppl 7): Zemla A. LGA a method for finding 3D similarities in protein structures. Nucleic Acids Res 2003;31: Kryshtafovych A, Prlic A, Dmytriv Z, Daniluk P, Milostan M, Eyrich V, Hubbard T, Fidelis K. New tools and expanded data analysis capabilities at the Protein Structure Prediction Center. Proteins 2007;69(Suppl 8): Kleywegt GJ. Use of non-crystallographic symmetry in protein structure refinement. Acta Crystallogr Sect D 1996;52: McCoy AJ, Grosse-Kunstleve RW, Storoni LC, Read RJ. Likelihoodenhanced fast translation functions. Acta Crystallogr Sect D 2005; 61: Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J 1986;5: Baker D, Sali A. Protein structure prediction and structural genomics. Science 2001;294: Schwarzenbacher R, Godzik A, Grzechnik SK, Jaroszewski L. The importance of alignment accuracy for molecular replacement. Acta Crystallogr Sect D 2004;60: Claude J-B, Suhre K, Notredame C, Claverie J-M, Abergel C. CaspR: a web server for automated molecular replacement using homology modelling. Nucleic Acids Res 2004;32:W606 W Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993;234: Giorgetti A, Raimondo D, Miele AE, Tramontano A. Evaluating the usefulness of protein structure models for molecular replacement. Bioinformatics 2005;21:ii72 ii Read RJ. Structure factor probabilities for related structures. Acta Crystallogr Sect A 1990;46: DOI /prot PROTEINS 37

proteins CASP Progress Report Progress from CASP6 to CASP7 Andriy Kryshtafovych, 1 Krzysztof Fidelis, 1 and John Moult 2 *

proteins CASP Progress Report Progress from CASP6 to CASP7 Andriy Kryshtafovych, 1 Krzysztof Fidelis, 1 and John Moult 2 * proteins STRUCTURE O FUNCTION O BIOINFORMATICS CASP Progress Report Progress from CASP6 to CASP7 Andriy Kryshtafovych, 1 Krzysztof Fidelis, 1 and John Moult 2 * 1 Genome Center, University of California,