Genki Terashi, Mayuko Takeda-Shitaka, Kazuhiko Kanou, Mitsuo Iwadate, Daisuke Takaya, Akio Hosoi, Kazuhiro Ohta, and Hideaki Umeyama*

Size: px

Start display at page:

Download "Genki Terashi, Mayuko Takeda-Shitaka, Kazuhiko Kanou, Mitsuo Iwadate, Daisuke Takaya, Akio Hosoi, Kazuhiro Ohta, and Hideaki Umeyama*"

Maurice Jenkins
6 years ago
Views:

1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS Prediction Report fams-ace: A combined method to select the best model after remodeling all server models Genki Terashi, Mayuko Takeda-Shitaka, Kazuhiko Kanou, Mitsuo Iwadate, Daisuke Takaya, Akio Hosoi, Kazuhiro Ohta, and Hideaki Umeyama* School of Pharmacy, Kitasato University, Tokyo, Japan ABSTRACT During Critical Assessment of Protein Structure Prediction (CASP7, Pacific Grove, CA, 2006), fams-ace was entered in the 3D coordinate prediction category as a human expert group. The procedure can be summarized by the following three steps. (1) All the server models were refined and rebuilt utilizing our homology modeling method. (2) Representative structures were selected from each server, according to a model quality evaluation, based on a 3D1D profile score (like Verify3D). (3) The top five models were selected and submitted in the order of the consensus-based score (like 3D-Jury). Fams-ace is a fully automated server and does not require human intervention. In this article, we introduce the methodology of fams-ace and discuss the successes and failures of this approach during CASP7. In addition, we discuss possible improvements for the next CASP. Proteins 2007; 69(Suppl 8): VC 2007 Wiley-Liss, Inc. Key words: TBM template based modeling; comparative modeling; homology modeling; protein structure prediction; CHIMERA; FAMS; SKE-CHIMERA; quality assessment; CASP. INTRODUCTION During the sixth round of the Critical Assessment of Protein Structure Prediction (CASP6), our SKE-CHIMERA team successfully predicted targets, particularly fold recognition/homologous (FR/H) targets. 1 At that time, an obvious flaw when selecting the best model among the many constructed models was noted. In the SKE-CHIMERA method, human intervention based on visual inspection and use of biological information is the most important factor. However, the performance level of human intervention was inconsistent and this was reflected in the CASP6 results. Therefore, there is a need for a technique that does not rely upon human intervention to select models which are closest to native structures among the decoy sets. Many scoring functions for evaluating protein structure are founded on knowledge-based potentials, 2 clustering methods, 3 structural energies using molecular mechanics force fields, 4 and structure-based sequence profiles (e.g. Verify3D, 5,6 Inbgu, 7 3D-PSSM, 8 ProQ 9 ). These scoring functions are used to assess model quality and ultimately select the best model among a set of several models. During CAFASP4, 10 the Model Quality Assessment Programs (MQAPs) category 11 assessed the accuracy based on evaluations using predicted models produced by participating CAFASP4 servers. Several scoring functions of MQAP were able to identify correct and incorrect protein models and consistently selected the best models among the candidate models. According to the report from Daniel Fisher, 12 the CAFASP4 organizer, the best-performing MQAPs were Verify3D, Victor/FRST, 13 and MODCHECK. 14 Verify3D methodology uses 18 (6 3 3) discrete environmental classes for each amino acid residue and utilizes the buried area, the fraction of polar area, and secondary structure for assessment. The side-chain environments of the amino acid residues were grouped into six classes according to the buried area and fraction Genki Terashi and Mayuko Takeda-Shitaka contributed equally to this work. The authors state no conflict of interest. *Correspondence to: Hideaki Umeyama, Department of Biomolecular Design, School of Pharmacy, Kitasato University, Shirokane, Minato-ku, Tokyo , Japan. umeyamah@pharm.kitasato-u.ac.jp Received 28 February 2007; Revised 13 August 2007; Accepted 16 August 2007 Published online 25 September 2007 in Wiley InterScience ( DOI: /prot PROTEINS VC 2007 WILEY-LISS, INC.

2 Combined Method After Remodeling of polar area. The secondary structures were separated into three classes. A score for each class was calculated using a log-odds scoring function from the dataset of experimental protein structures. Since it is a simple but powerful program for predicting model accuracy, Verify3D is widely used by human predictors when selecting the best model from many candidate models. In the MQAPs category of CAFASP4, Verify3D performed well in homology modeling and for fold-recognition targets. Victor/ FRST combines four knowledge-based potentials (pairwise, solvation, hydrogen bonds, and torsion angle potentials) and performed consistently well, especially for comparative modeling (CM) targets. MODCHECK is based on threading potentials (pairwise and solvation potentials) and calculates the quality score by summing the pairwise and solvation Z-scores, which are obtained by extensive sequence shuffling trials. The methods mentioned earlier consider only the quality of the target model. In contrast, other research groups use consensus methods (e.g. Pcons, 15 3D-Jury 16 ). Pcons, a neural-networkbased consensus method, combines the confidence score reported by each server and the similarity between models or templates. The 3D-Jury technique is a fully automated protein structure metaprediction system that is efficient in the event of a high correlation between the accuracy of the model and the confidence score based on the similarity between models. All server-generated models are compared by a similarity score. The similarity score of a model pair is equal to the number of C-alpha atom pairs that are within 3.5 Å after superimposition. The confidence score of a model is equal to the sum of the similarity scores for the considered model pairs divided by the number of considered pairs plus one. Moreover, 3D-Jury uses two modes to calculate confidence scores. The best-model-mode selects a single model, with the best similarity score from each server in the consensus score calculation. In contrast, the all-models-mode includes all the server models. Note that consensus methods require various candidate models from the server for comparison and for calculating the consensus value. Thus, in the CASP7 contest, 17 we combined two different scoring functions as a metaselector team: (1) the model quality score based on classification of the sidechain environment for each residue and (2) the consensus score. The model quality score, based on an algorithm inspired by Verify3D, was used as a filter to select a representative from the submitted models in each server. In the fams-ace method, we used an evaluation program, CIRCLE, which is based on knowledge-based potential of the side-chain packing. As mentioned earlier, the Verify3D algorithm uses 18 environmental classes for each amino acid residue. For the purpose of this study, the Verify3D classification was too discrete to analyze side-chain environments. Therefore, the classification of the discontinuities of side-chain environments was improved by increasing the number of classes from 18 to ( ) 5 136,350 and also by using a Gaussian filter. The consensus score, corresponding to 3D-Jury (single-model-mode), was then used for the final selection of the best model. In order to exclude the influence of unstable models when calculating the consensus score, only the model with the best model quality score assigned by CIRCLE from each server was used. Since the consensus method uses only C-alpha coordinates, the quality of side-chain packing was not considered. METHODS The fams-ace method is composed of four steps as illustrated in Figure 1 and is discussed in the following sections. Rebuilding and refinement of server models 3D models and alignments submitted by automatic servers were obtained from the CASP7 website. 17 The obtained 3D models and alignments were rebuilt to full atom three-dimensional models with our fully automatic modeling system (FAMS). Homology modeling was performed using each model as the reference protein (Fig. 1). Detailed information on the FAMS process can be obtained in a previously published article from our laboratory. 18 Short contacts were removed by optimization of the main-chain coordinates, using simulated annealing of the main chain with the conservation of side-chain conformation for each residue. Side-chain atom coordinates were optimized by iterative cycles of side-chain generation and main-chain optimization. The missing regions and discontinuities of the main chain were constructed by FAMS, using a loop search process to obtain an energetically stable structure. Thus, the aforementioned FAMS modeling is an essential step to evaluate models from the viewpoint of energy. If the coordinates of a side chain or a main chain have serious energy errors (e.g., short contacts, unnatural chiral center, or torsion angles), the comparisons between models will not be reasonably performed in the evaluation step. In this step, our scoring method uses the environment of the side chain that is described by the fraction of buried area and fraction of area which is covered by the polar atoms. Consequently, even if the coordinates of the main chain is close to the native, a model which has many short contacts in the side chains will be rejected from the selection. In addition, in the modelselection step (Selection of the Five Best Models by Using Consensus Methods section), the final models are selected according to the consensus value from a comparison of coordinates among models. Therefore, unnatural coordinates of the main chain in each server will cause an error when the consensus value is calculated. Assessment of target difficulty The next step evaluates the feasibility of the model construction with template protein coordinates from the DOI /prot PROTEINS 99

3 G. Terashi et al. Figure 1 A flowchart illustrating the key steps of the fams-ace method and describing the CASP7 original server containing five models from servers from 1 to M, the models refined by FAMS, model selection among five models by CIRCLE after estimation of the difficulty of target and the secondary structure correspondence, and consensus selection of the final five models by 3D-Jury score. experiments. fams-ace employs one of the two scoring functions depending on the target difficulty. In order to predict the target difficulty, the support vector machines (SVMs) program 19 was used. Classifications based on SVMs have been used for several applications in bioinformatics and computational biology. 20 The training datasets consisted of CASP6 targets classified as CM targets (as positive) or not (fold recognition or new fold, as negative). Score and homology (%) values of the best alignment resulting from the SPARKS2 program 21 were used as vectors for SVM classification. SPARKS2 performs alignments using a knowledge-based energy score with sequence-profile and secondary structure information. For CM in CASP6, SPARKS2 performed well in recognizing the best or near-best template. Thus, we assumed that if SPARKS2 cannot find reasonable alignments, the target must be truly difficult. The sensitivity of classification, defined as (TP/(TP1FN)), was 93.0% (40/(40 1 3)) in CASP6 targets. TP, FP, TN, and FN denote true positive, false positive, true negative, and false negative, respectively. The specificity (TN/(TN1FP)) was 93.6% (44/(4413)). The predicted classification for target difficulty was then used to select one of the two scoring functions in the next section. Selection of a representative model from each server model by structure evaluation The top five models submitted by each server were further selected by evaluating their free energy with standard techniques. The purpose of the evaluation is to remove the unstable models for the final model selection step. The best representative model for each server was selected by CIRCLE. CIRCLE considers two terms for the model quality: (1) model quality calculated from the side-chain environment of each residue and (2) similarity between the secondary structure propensities predicted for an amino acid sequence by PSI-PRED 22 and the secondary structures of the three-dimensional model. The side-chain environments for each residue were determined from three parameters: (1) the fraction of the molecular surface area of the side chains exposed to water or covered by the polar atoms, (2) the fraction of the sidechain area buried by any other atoms, and (3) the second- 100 PROTEINS DOI /prot

4 Combined Method After Remodeling Figure 2 Matrix data used in the scoring function of CIRCLE. Sets (a1, b1) show the frequency of residues LYS and LEU observed in the PDB dataset, respectively, as described in the Eq. (2). Sets (a2, b2) show converted data from a1, b1 matrices by using Gaussian weight as described in the Eq. (3). Sets (a3, b3) show scoring matrices of LYS and LEU according to the side-chain environments as described in the Eq. (5). ary structures. The values of (1) and (2) were categorized to 100 classes. Moreover, as shown in Figure 2, the sum of (1) and (2) is always above 1.00 (100%). Therefore, the number of combinations of (1) and (2) was 5050 [( )/ ]. The secondary structures were described using a sliding window around the residue to classify the secondary structures in detail. This was especially effective in CM targets, as the window sizes were three and one amino acid residues having the assignment of helix, sheet, or coil in CM targets and non-cm targets, respectively. For example, the secondary structure classified as CCC represents the residue existing in the center of the coil region traversing through three residues. Therefore, for CM targets (window size 3), 27 classes were used to classify the secondary structures. Finally, for CM and non-cm targets, the side-chain environments of amino acid residues were classified to 136,350 ( ) and 15,150 ( ) classes, respectively. These classifications can describe side-chain environments in more detail than Verify3D (18 classes) but also causes a shortage in the amount of information for each class. To solve this shortage problem, a Gaussian filter was applied instead of using the classification directly. The score of model quality is calculated by the following function.! p 2 b 2 wðpolar; buried; p; bþ ¼exp 2 3 r 2 exp polar 2 3 r 2 buried N ðaajss; polar; buriedþ ð1þ ð2þ N 0 ðaajss; polar; buriedþ ¼ X X wðpolar; buried; m; nþ m n 3 NðAAjss; polar þ m; buried þ nþ ð3þ PðAAjss; polar; buriedþ ¼ N 0 ðaajss; polar; buriedþ P N 0 ðaajss; polar; buriedþ aa SCOREðAAjenvÞ ¼log PðAAjenvÞ PðAAÞ PðAAjss; polar; buriedþ ¼ log PðAAÞ ð4þ ð5þ DOI /prot PROTEINS 101

5 G. Terashi et al. Table I Performance of Refinement by FAMS Original a FAMS model b All Selected All Selected 2,908,231 15,725 2,949,313 (11.4) 15,779 (10.3) Model length c 2,441,472 13,337 2,552,207 (14.5) 13,790 (13.4) Residues in most favored regions d 3,59, ,505 (28.2) 1636 (29.5) Residues in additional allowed regions d 67, ,253 (233.0) 190 (245.7) Residues in generously allowed regions d 40, ,348 (244.3) 163 (229.4) Residues in disallowed regions d 18, ,265 (240.3) 102 (251.9) No of unnatural chiral centers e (298.8) 2 (299.8) No. of short contacts f (0 2.0 Š) 411, (298.9) 4 (299.8) No. of short contacts f (0 2.2 Š) 661, ,880 (297.9) 73 (298.1) No. of short contacts f (0 2.4 Š) 1,559, ,346 (290.3) 814 (291.3) No. of short contacts f (0 2.6 Š) 801, ,797 (15.3) 5217 (11.6) g Side-chain accuracy of v 1 423, ,743 (14.2) 2767 (11.8) h Side-chain accuracy of v (20.3) (20.3) GDT_TS i Summary of model quality from PROCHECK, accuracy of side chain, and GDT_TS in all CASP7 targets. The columns of all represents the all server models. The columns of selected represents the models which were selected from server models as the best model by fams-ace. The value in parentheses (in percentages) is the rate of increase from original to FAMS model. a Original models submitted to CASP7 by automatic servers within 48 h of the target sequence being given. b Models refined by FAMS, the original modeling program. c Total amino acid residues existing in models. d Estimation of Ramachandran plot regions from PROCHECK. For model quality estimation, the number of residues in the most favored regions near the torsion angles around the main-chain C-alpha is a good indicator of the stereochemical quality of a protein structure. If the most favored regions are experimentally the ones with the best torsion, then additional and generously allowed regions are better and normal, and disallowed regions are concomitantly worse. e Number of experimentally unnatural chiral centers surrounding the main-chain C-alpha atom. f Number of short contacts is defined as the number of any pair of nonbonded atoms within the distance indicated. g The number of residues which have both C-alpha atoms within 3.5 Å and side chain with v 1 angles within 408 of native structures. h The number of residues which have C-alpha atoms within 3.5 Å and both v 1 and v 2 must be within 40 of native. i GDT_TS were calculated from 114 domains of native structure which were obtained from CASP7. As shown in Eq. (5), a general approach of our scoring function, which generates the score of each amino acid residue (AA) according to the environment (env), is the use of a statistical potential. Differences between CIRCLE and Verify3D are found in Eqs. (1) (4). Here, w(polar, buried,p,b) represents the Gaussian weight in the particular environment (polar and buried). The letters of b and p designate buried ratio and polar ratio axes, respectively, and describe the distance from the center location (polar and buried) of each Gaussian weight function. The standard deviations (r polar, r buried ) according to the environment of the side chain (fraction of polar area, buried area) in Eq. (1) are calculated from the virtual mutation dataset. We analyzed the variation of environments by considering the mutated amino acid residues in a particular position. A total of 504,716 datasets were constructed by using homology modeling methods. N ðaajss; polar; buriedþ is the number of residues AA observed in environment env from the PDB dataset [Fig. 2(a1,b1)]. As mentioned earlier, since the classification of environments is detailed, NðAAjss; polar; buriedþ could not be used directly to calculate P(AA env) in Eq. (5). Therefore, the Gaussian weight was used as a Gaussian filter in Eq. (3) to generate smoothed data from the raw data N ðaajss; polar; buriedþ. In Eq. (5), P(AA env) is the probability of the amino acid residue AA in the environment env. aa is a variable for the 20 amino acid residues. P(AA) is the probability of finding residue AA in all the amino acid residues. The P(AA env) contains the Gaussian weight function corresponding to Eq. (1) in order to consider the variability of the frequency in the side-chain environment. Examples of the scoring matrixes of hydrophobic (Leucine) and hydrophilic (Lysine) residues are shown in Figure 2. The score corresponding to the side-chain environments, Eq. (5), cannot consider the secondary structure similarity, although PSI-PRED, which predicts secondary structures, achieved an average Q 3 score of about 80%. Therefore, we added the term of secondary structure similarity between the model and prediction to CIRCLE. The measure of similarity in secondary structures is based on the following scoring function. Pði; jjconf Þ SSscoreði; jþ ¼log ð6þ P pre ðijconfþp m ðjjconfþ i represents the secondary structure of target sequence predicted by PSI-PRED. j is the secondary structure observed in the model. conf is one value of the confidences (0, 1, 2,..., 9) calculated by PSI-PRED. P pre (i conf) is the probability of the secondary structure i, which was predicted by PSI-PRED presence when a value of confidence is conf. P m (j conf) is the probability of the secondary structure j observed in the model when the value of confidence is conf. P(i, j conf) is the probability of the secondary structures i and j, mentioned earlier corresponding to conf. This similarity of secondary structures is a useful measure espe- 102 PROTEINS DOI /prot

6 Combined Method After Remodeling cially for difficult targets, that is, when near native structures do not exist or the similarity of secondary structures can find good local structures even though the structure folding is entirely different. According to the target difficulty rating predicted by SVMs, the total score is calculated as TotalScore 8 >< ¼ P >: length P n length n ð0:35 3 SSscore þ 3D1Dscore CM Þ n CM ð0:75 3 SSscore þ 3D1Dscore FRNF Þ n FR or NF ð7þ The coefficients for the measure of similarity of secondary structures (SSscore) were optimized from CASP6 targets. The similarity of the secondary structures is emphasized in difficult targets. Selection of the five best models by using consensus methods All models evaluated as the best for each server were compared using the consensus method in the 3D-Jury system. We modified the confidence score of 3D-Jury as consensusðm a Þ¼ P N i;a6¼i simðm a; M i Þ 1 þ N ð8þ where M a is the representative model of server a. N is the number of servers. sim(m a,m i ) is the similarity score from MAXSUB 23 between models M a and M i, which equal the number of C-alpha atom pairs that are within 3.5 Å. If the sim(m a,m i ) is below 40, it is set to zero (according to the 3D-Jury protocols). The best five models, according to the consensus value, were selected for submission to CASP7. No human intervention occurred during the procedure, and the fams-ace process did not consider server name or server performance. A server with exceptional performance was not rated any higher before commencement of our model selection process. RESULTS AND DISCUSSION Do modifications on server models improve model quality? Table I displays the performance of FAMS refinement in view of (1) stereochemical accuracy, (2) the accuracy of side chains (v 1 and v 112 ), and (3) overall similarity of C-alpha atoms positions (GDT_TS). 24 The side-chain accuracy of v 1 equals the number of residues which have both C-alpha atoms within 3.5 Å and side chains with v 1 angles within 408 of native structures. v 112 equals the number of residues which have C-alpha atoms within 3.5 Å, and both v 1 and v 2 must be within 408 of native. For Table II Top 10 of Input Servers According to the Contribution No. fams-ace No. fams-ace (improved) 22 Zhang-Server 20 Zhang-Server 15 MetaTasser 13 ROBETTA 5 beautshot 9 Pmodeller6 4 SP3 5 Pcons6 3 keasar-server 5 MetaTasser 3 PROTINFO 4 PROTINFO-AB 3 HHpred3 3 PROTINFO 3 HHpred2 3 HHpred2 3 FOLDpro 3 FOLDpro 3 BayesHH 3 FAMSD 2 shub 3 BayesHH 2 beautshotbase 2 SP4 2 UNI-EID_expm 2 FAMS 2 SPARKS2 2 CIRCLE 2 SP4 2 ABIpro 2 Pcons6 1 keasar-server 2 FAMSD 1 forecast-s 2 FAMS 1 UNI-EID_sfst 2 3Dpro 1 UNI-EID_expm 1 nfold 1 SP3 1 mgen-3d 1 SAM_T06_server 1 karypis.srv.2 1 RAPTOR-ACE 1 forecast-s 1 RAPTOR 1 SAM-T99 1 Phyre-2 1 ROKKY 1 Ma-OPUS-server 1 ROBETTA 1 HHpred3 1 RAPTOR-ACE 1 GeneSilicoMetaServer 1 RAPTOR 1 FORTE1 1 NN_PUT_lab 1 CaspIta-FOX 1 Ma-OPUS-server2 1 Bilab-ENABLE 1 Huber-Torda-Server 1 3Dpro 1 FUGUE The number of models selected as the best by fams-ace and fams-ace (improved), respectively, for each CASP7 target. For example, Zhang-Server obtained the best model 22 and 20 times in the fams-ace and fams-ace (improved) team, respectively. Moreover, MetaTasser and ROBETTA obtained the best model 15 and 13 times in the fams-ace and fams-ace (improved) team, respectively. the 25,615 server models obtained from the CASP7 website (represented as all ) and TS1 models of fams-ace (represented as selected ), we compared the models rebuilt by FAMS and the original models. PRO- CHECK 25,26 was used to assess the performance of the models from a stereochemical and geometric point of view. In almost all categories of PROCHECK, the server models which were rebuilt by FAMS had improved model quality. In summary of all server models, Table I shows that 41,000 amino acid residues were newly constructed, and residues in the favored and disallowed regions of the Ramachandran plot improved by about 110,000 and 18,000 residues, respectively. In addition, the number of wrong short contacts within 2.6 Å improved by 1.41 million. In side-chain accuracy, v 1 and v 112 increased 5.25% and 4.16%, respectively. Moreover, improvements of the stereochemical accuracy and accuracy of side chains were observed in selected models. Thus, these results show that FAMS remodeling was effective in improving model quality from a stereochemi- DOI /prot PROTEINS 103

7 G. Terashi et al. Figure 3 (a) Results of fams-ace (solid and broken horizontal lines for averaged GDT_TS and Z-score, respectively) in TBM targets showing the exclusion of one server in CASP7. The horizontal axis represents the excluded server. As several native structures are unpublished, GDT_TS of the original models from the CASP7 website were used. (b) Relative proportions of GDT_TS to best GDT_TS for each target. (c) Comparison of GDT_TS values between fams-ace and fams-ace (improved). High accuracy template-based modeling (HA-TBM) targets, template-based modeling (TBM) targets except for HA and FM targets categories are described by circle, square, and cross, respectively. (d) Comparison of the GDT_TS values between fams-ace and fams-ace (improved). HA, TBM except for HA, and FM categories are described by circle, square, and cross, respectively. cal and geometric basis and accuracy of side chains. On the other hand, improvements of GDT_TS were not observed in the rebuilt models despite an advance in model quality of the side chains, and stereochemical and geometric basis. However, the decrease of GDT_TS was 0.3%. These results indicate that (1) improvement of the side chains do not directly correspond to an improvement of GDT_TS and (2) FAMS can improve side-chain accuracy while keeping the folds of the model intact. The major purposes of the remodeling step are to normalize and improve the model quality for selecting representative models by CIRCLE. When comparing representative models and the first ranked models (TS1 models) for each server, the summed v 1, v 112 and GDT_TS of representative models were 1.48%, 1.32%, and 1.14% better than the first ranked models, respectively. Therefore, FAMS achieved the purpose of the remodeling step. In the future, FAMS should be improved to obtain better side chain and GDT_TS accuracies. Nevertheless, improvement of the three-dimensional model quality is very useful in selecting the best model. The contribution of the individual input servers In the fams-ace protocol, about 250 submitted models in each target were remodeled by FAMS, and then a representative from the five models of each server was selected according to CIRCLE. In the selection of representative models, TS1 and AL1 models comprised 45.7% of all selected models. This indicates that nearly half of the representative models were equal to the best-ranked models selected by the server. As mentioned previously, the representative models were superior to first ranked models. The contributions of each server are shown in Table II. About 23.2% of the final best models chosen by fams-ace 104 PROTEINS DOI /prot

8 Combined Method After Remodeling Table III Total Values of Z-score, GDT_TS, and the Accuracy of Side-Chain fams-ace a fams-ace (improved) b 3D-Jury c CIRCLE-FAMS d Z score HA-TBM (26.0) (12.3) (214.0) TBM (14.3) (27.1) (23.3) FM (1225.2) 2.16 (267.6) (1226.4) ALL (119.8) (211.2) (113.0) GDT_TS HA-TBM (21.0) (10.3) (21.3) TBM (10.4) (20.7) (21.1) FM (115.3) (25.4) (113.6) ALL (11.2) (21.0) (20.2) v 1 HA-TBM (15.2) 2307 (14.2) 2456 (111.0) TBM (15.5) 5154 (21.1) 5679 (19.0) FM (128.2) 174 (27.4) 231 (122.9) ALL (16.2) 5256 (21.1) 5821 (19.5) v 112 HA-TBM (14.1) 1292 (15.2) 1417 (115.4) TBM (19.6) 2751 (21.1) 3150 (113.2) FM (120.6) 75 (222.7) 113 (116.5) ALL (17.3) 2792 (21.7) 3216 (113.2) a CASP7 team using fams-ace. b Virtual team fams-ace (improved) mentioned in Result and Discussion. c Virtual team 3D-Jury (consensus method) using only TS1 models. d CASP7 team using CIRCLE for final model selection. HA-TBM (28 domains), TBM (108 domains), and FM (19 domains) denote high accuracy template based modeling targets, template-based modeling targets including HA, and free modeling targets, respectively. ALL (123 domains) means the total score for all the CASP7 targets. GDT_TS and the native structures for calculating v 1 and v 112 were obtained from CASP7 web site. The value in parentheses (in percentages) is the rate of increase from fams-ace. were refined Zhang-Server models which indicate that refined Zhang-Server models have a large consensus value compared with other representative models. Additionally, 15.8% of the best models selected by fams-ace were refined MetaTasser models. Furthermore, in order to discuss the contribution of each server from a standpoint of model accuracy and influence when calculating the consensus value, we determined the average GDT_TS of fams-ace for TBM targets (total of 108 domains) when a particular server was excluded [Fig. 3(a)]. The contribution of MetaTasser and Zhang-Server is larger than all other servers, as shown in the large decrease of the average GDT_TS value. This indicates that if MetaTasser and Zhang-Server had not participated in CASP7, fams-ace could not have achieved successful or real results. Thus, models from servers with exceptional performance (Zhang-Server and MetaTasser) are essential for a good performance from the fams-ace method. In addition, an increased in performance would occur if fams-ace had thinned out particular servers (e.g., SP3, beautshot), which showed a larger average GDT_TS than fams-ace (represented as horizontal line) in Figure 3(a). In other words, if these servers had not participated in CASP7, the GDT_TS of fams-ace would have improved. However, the average GDT_TS of fams-ace was indifferent to the exclusion of specific servers and could not equal the Zhang-server (67.86), but came near at Example T0371_D2 (template-based modeling target) T0371_D2 is the second domain (residues ) of the target T0371 (now PDB code: 2HX1). In this target, fams-ace submitted the best model in GDT_TS (72.1) and AL0 (82.64) amongst all of the CASP7 teams. The model presented by fams-ace was constructed from a Zhang- Server TS4 model (GDT_TS: 70.66, AL_0: 80.17). There were few differences between the fams-ace TS1 model and the Zhang-Server TS4 model (rms ). In addition, there were no differences in side-chain accuracies (v 1 and v 112 ). This example indicates that fams-ace could select a three-dimensional structure similar to the native structure. For two domains (T0371_D2 and T0321_D2), fams-ace submitted better models than the best-ranked server models. fams-ace considered only the consensus of the representative models instead of the quality of model. Consequently, fams-ace did not select outstanding models in both positive and negative viewpoints. What went right? What went wrong? The fams-ace method selected good models which were relatively high quality from the stand point of over- DOI /prot PROTEINS 105

G. Terashi et al. Figure 4 Two failed examples of the refinement step by FAMS. (a, b) Models of T0356. (c, d) Models of T0283. (a) and (c) are original server models of T0356 and T0283, respectively.

9 G. Terashi et al. Figure 4 Two failed examples of the refinement step by FAMS. (a, b) Models of T0356. (c, d) Models of T0283. (a) and (c) are original server models of T0356 and T0283, respectively. (b) and (d) are the models refined by FAMS. The regions which were constructed by FAMS are circled. There exists extremely high tension in the main-chain structures of these regions. all similarity with native structure (GDT_TS). The success of fams-ace can be explained by three factors. (1) fams-ace can cope with the situation of several good models from server teams, especially from Zhang-server and MetaTasser [see Fig. 3(a)]. (2) Since the consensus method was used for final model selection, fams-ace did not select outstanding models or appropriate models in positive and negative outcomes. For 72.2 % (78/108) of TBM targets, fams-ace could select good models which have a GDT_TS within 90% of the highest GDT_TS among all server models [Fig. 3(b)]. (3) Selection of representative models with improved GDT_TS. The representative models from each server were selected according to the evaluation of the side-chain environments. Table III shows a comparison summary of fams-ace and the virtual server (described as 3D-Jury ), which uses 3D- Jury method in the final model selection from TS1 models of servers instead of representative models. Except for high accuracy template-based modeling targets (HA- TBM), fams-ace submitted better models than 3D-Jury. However, several problems in the fams-ace method were noted. For example, in the calculation of the consensus value, the similarity scores from MAXSUB are set to zero if the similarity score is below 40. Many models were judged to have insignificant similarities although the models had weak similarities. In this case, the consensus method fails during the final model selection step. Therefore, fams-ace could not select the near best quality models consistently for difficult targets [see Fig. 3(c)]. During the refinement of server models, multidomain problems which caused errors were encountered. An example can be seen with the T0356 in which a certain server presented a model which was divided by domains. FAMS interpreted the division to be a deletion region in the sequence and coordinates. Therefore, FAMS constructed breaking regions to connect the separated domains. The main-chain structures of the connected regions broke in the process of the difficult reconstruction [Fig. 4(a,b)]. Though unnatural models were usually rejected in the evaluation step by CIRCLE, in a few cases, 106 PROTEINS DOI /prot

10 Combined Method After Remodeling these unnatural models were selected as the best model by fams-ace. This tendency appeared to be associated with difficult targets. In the worst example, T0283, GDT_TS of fams-ace was only 25.52, in contrast to the highest GDT_TS of the server which was [see Fig. 4(c,d)]. Lastly, a major problem with fams-ace is the inability to present the models with accurate side-chain conformations, although GDT_TS is relatively high. Since fams-ace selects the best models according to the consensus value, this problem is unavoidable. Simple improvement of fams-ace For the reasons which were mentioned in the earlier section, we investigated the possibility of using CIRCLE instead of the consensus method. CIRCLE performs well by selecting according to the side-chain environments in the final model selection step. There are three steps in the new fams-ace process. (1) Rebuild and refine server models. (2) Select models according to the Z-score of the consensus score. The similarity score from MAXSUB did not change in the difficult targets (FR and NF) predicted by SVM. The thresholds of the model selection were optimized by CASP6 targets using the category classification (CM or noncm) reported by the CASP6 organizer. (3) Select final models using CIRCLE. Hereinafter, this collection of steps is denoted as fams-ace (improved). The greatest difference between fams-ace and famsace (improved) is the final model selection step. famsace and fams-ace (improved) use consensus methods by modified 3D-Jury and model evaluation methods by CIRCLE, respectively. The goal of fams-ace (improved) is to obtain models which have a high quality in both GDT_TS and side-chain accuracies. The results of famsace (improved) are shown in Table III and Figure 3(b,d). Except for HA-TBM targets, Z-score of GDT_TS and GDT_TS improved in comparison with fams-ace. There was an obvious improvement in side chains in all categories, especially in free modeling (FM) targets and difficult TBM targets (highest GDT_TS < 50). Thus, these results suggest that this improvement provides a method for obtaining good models in both GDT_TS and quality of side chains. However, the problem of a multidomain target still remains. Improving the refinement process and assignment of domains is planned for the future. Moreover, we propose developing a new system based on the improved fams-ace method to generate results superior to the best server models. ACKNOWLEDGMENTS We thank CASP7 organizers and assessors, and experimentalists who supplied targets for CASP7. And particular thanks to all server teams in CASP7. REFERENCES 1. Takeda-Shitaka M, Terashi G, Takaya D, Kanou K, Iwadate M, Umeyama H. Protein structure prediction in CASP6 using CHI- MERA and FAMS. Proteins 2005;61(Suppl 7): Sippl MJ. Knowledge-based potentials for proteins. Curr Opin Struct Biol 1995;5: Zhang Y, Skolnick J. SPICKER: a clustering approach to identify near-native protein folds. J Comput Chem 2004;25: Lee MR, Tsai J, Baker D, Kollman PA. Molecular dynamics in the endgame of protein structure prediction. J Mol Biol 2001;313: Luthy R, Bowie JU, Eisenberg D. Assessment of protein models with three-dimensional profiles. Nature 1992;356: Eisenberg D, Luthy R, Bowie JU. VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol 1997; 277: Fischer D. Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac Symp Biocomp 2000;5: Kelley LA, MacCallum RM, Sternberg MJ. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000;299: Wallner B, Elofsson A. Can correct protein models be identified? Protein Sci 2003;12: Fischer D. Servers for protein structure prediction. Curr Opin Struct Biol 2006;16: Tosatto SC. The Victor/FRST function for model quality estimation. J Comput Biol 2005;12: Pettitt CS, McGuffin LJ, Jones DT. Improving sequence-based fold recognition by using 3D model quality assessment. Bioinformatics 2005;21: Lundstrom J, Rychlewski L, Bujnicki J, Elofsson A. Pcons: a neuralnetwork-based consensus predictor that improves fold recognition. Protein Sci 2001;10: Ginalski K, Elofsson A, Fischer D, Rychlewski L. 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics 2003;19: Ogata K, Umeyama H. An automatic homology modeling method consisting of database searches and simulated annealing. J Mol Graph Model 2000;18: , Anguita D, Boni A, Ridella S, Rivieccio F, Sterpi D. Theoretical and practical model selection methods for support vector classifiers. In: Wang L, editor. Support vector machines: theory and applications, Vol. 177; Berlin: Springer-Verlag; pp Park KJ, Gromiha MM, Horton P, Suwa M. Discrimination of outer membrane proteins using support vector machines. Bioinformatics 2005;21: Zhou H, Zhou Y. Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins 2004;55: Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999;292: Siew N, Elofsson A, Rychlewski L, Fischer D. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 2000;16: Zemla A, Venclovas C, Moult J, Fidelis K. Processing and analysis of CASP3 protein structure predictions. Proteins 1999;37(Suppl 3): Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PRO- CHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst 1993;26: Morris AL, MacArthur MW, Hutchinson EG, Thornton JM. Stereochemical quality of protein structure coordinates. Proteins 1992; 12: DOI /prot PROTEINS 107

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO! Identification of Protein-model accuracy Why is it important? What is accuracy RMSD, fraction correct, Protein model correctness/quality