Genki Terashi, Mayuko Takeda-Shitaka, Kazuhiko Kanou, Mitsuo Iwadate, Daisuke Takaya, Akio Hosoi, Kazuhiro Ohta, and Hideaki Umeyama*

Size: px
Start display at page:

Download "Genki Terashi, Mayuko Takeda-Shitaka, Kazuhiko Kanou, Mitsuo Iwadate, Daisuke Takaya, Akio Hosoi, Kazuhiro Ohta, and Hideaki Umeyama*"

Transcription

1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS Prediction Report fams-ace: A combined method to select the best model after remodeling all server models Genki Terashi, Mayuko Takeda-Shitaka, Kazuhiko Kanou, Mitsuo Iwadate, Daisuke Takaya, Akio Hosoi, Kazuhiro Ohta, and Hideaki Umeyama* School of Pharmacy, Kitasato University, Tokyo, Japan ABSTRACT During Critical Assessment of Protein Structure Prediction (CASP7, Pacific Grove, CA, 2006), fams-ace was entered in the 3D coordinate prediction category as a human expert group. The procedure can be summarized by the following three steps. (1) All the server models were refined and rebuilt utilizing our homology modeling method. (2) Representative structures were selected from each server, according to a model quality evaluation, based on a 3D1D profile score (like Verify3D). (3) The top five models were selected and submitted in the order of the consensus-based score (like 3D-Jury). Fams-ace is a fully automated server and does not require human intervention. In this article, we introduce the methodology of fams-ace and discuss the successes and failures of this approach during CASP7. In addition, we discuss possible improvements for the next CASP. Proteins 2007; 69(Suppl 8): VC 2007 Wiley-Liss, Inc. Key words: TBM template based modeling; comparative modeling; homology modeling; protein structure prediction; CHIMERA; FAMS; SKE-CHIMERA; quality assessment; CASP. INTRODUCTION During the sixth round of the Critical Assessment of Protein Structure Prediction (CASP6), our SKE-CHIMERA team successfully predicted targets, particularly fold recognition/homologous (FR/H) targets. 1 At that time, an obvious flaw when selecting the best model among the many constructed models was noted. In the SKE-CHIMERA method, human intervention based on visual inspection and use of biological information is the most important factor. However, the performance level of human intervention was inconsistent and this was reflected in the CASP6 results. Therefore, there is a need for a technique that does not rely upon human intervention to select models which are closest to native structures among the decoy sets. Many scoring functions for evaluating protein structure are founded on knowledge-based potentials, 2 clustering methods, 3 structural energies using molecular mechanics force fields, 4 and structure-based sequence profiles (e.g. Verify3D, 5,6 Inbgu, 7 3D-PSSM, 8 ProQ 9 ). These scoring functions are used to assess model quality and ultimately select the best model among a set of several models. During CAFASP4, 10 the Model Quality Assessment Programs (MQAPs) category 11 assessed the accuracy based on evaluations using predicted models produced by participating CAFASP4 servers. Several scoring functions of MQAP were able to identify correct and incorrect protein models and consistently selected the best models among the candidate models. According to the report from Daniel Fisher, 12 the CAFASP4 organizer, the best-performing MQAPs were Verify3D, Victor/FRST, 13 and MODCHECK. 14 Verify3D methodology uses 18 (6 3 3) discrete environmental classes for each amino acid residue and utilizes the buried area, the fraction of polar area, and secondary structure for assessment. The side-chain environments of the amino acid residues were grouped into six classes according to the buried area and fraction Genki Terashi and Mayuko Takeda-Shitaka contributed equally to this work. The authors state no conflict of interest. *Correspondence to: Hideaki Umeyama, Department of Biomolecular Design, School of Pharmacy, Kitasato University, Shirokane, Minato-ku, Tokyo , Japan. umeyamah@pharm.kitasato-u.ac.jp Received 28 February 2007; Revised 13 August 2007; Accepted 16 August 2007 Published online 25 September 2007 in Wiley InterScience ( DOI: /prot PROTEINS VC 2007 WILEY-LISS, INC.

2 Combined Method After Remodeling of polar area. The secondary structures were separated into three classes. A score for each class was calculated using a log-odds scoring function from the dataset of experimental protein structures. Since it is a simple but powerful program for predicting model accuracy, Verify3D is widely used by human predictors when selecting the best model from many candidate models. In the MQAPs category of CAFASP4, Verify3D performed well in homology modeling and for fold-recognition targets. Victor/ FRST combines four knowledge-based potentials (pairwise, solvation, hydrogen bonds, and torsion angle potentials) and performed consistently well, especially for comparative modeling (CM) targets. MODCHECK is based on threading potentials (pairwise and solvation potentials) and calculates the quality score by summing the pairwise and solvation Z-scores, which are obtained by extensive sequence shuffling trials. The methods mentioned earlier consider only the quality of the target model. In contrast, other research groups use consensus methods (e.g. Pcons, 15 3D-Jury 16 ). Pcons, a neural-networkbased consensus method, combines the confidence score reported by each server and the similarity between models or templates. The 3D-Jury technique is a fully automated protein structure metaprediction system that is efficient in the event of a high correlation between the accuracy of the model and the confidence score based on the similarity between models. All server-generated models are compared by a similarity score. The similarity score of a model pair is equal to the number of C-alpha atom pairs that are within 3.5 Å after superimposition. The confidence score of a model is equal to the sum of the similarity scores for the considered model pairs divided by the number of considered pairs plus one. Moreover, 3D-Jury uses two modes to calculate confidence scores. The best-model-mode selects a single model, with the best similarity score from each server in the consensus score calculation. In contrast, the all-models-mode includes all the server models. Note that consensus methods require various candidate models from the server for comparison and for calculating the consensus value. Thus, in the CASP7 contest, 17 we combined two different scoring functions as a metaselector team: (1) the model quality score based on classification of the sidechain environment for each residue and (2) the consensus score. The model quality score, based on an algorithm inspired by Verify3D, was used as a filter to select a representative from the submitted models in each server. In the fams-ace method, we used an evaluation program, CIRCLE, which is based on knowledge-based potential of the side-chain packing. As mentioned earlier, the Verify3D algorithm uses 18 environmental classes for each amino acid residue. For the purpose of this study, the Verify3D classification was too discrete to analyze side-chain environments. Therefore, the classification of the discontinuities of side-chain environments was improved by increasing the number of classes from 18 to ( ) 5 136,350 and also by using a Gaussian filter. The consensus score, corresponding to 3D-Jury (single-model-mode), was then used for the final selection of the best model. In order to exclude the influence of unstable models when calculating the consensus score, only the model with the best model quality score assigned by CIRCLE from each server was used. Since the consensus method uses only C-alpha coordinates, the quality of side-chain packing was not considered. METHODS The fams-ace method is composed of four steps as illustrated in Figure 1 and is discussed in the following sections. Rebuilding and refinement of server models 3D models and alignments submitted by automatic servers were obtained from the CASP7 website. 17 The obtained 3D models and alignments were rebuilt to full atom three-dimensional models with our fully automatic modeling system (FAMS). Homology modeling was performed using each model as the reference protein (Fig. 1). Detailed information on the FAMS process can be obtained in a previously published article from our laboratory. 18 Short contacts were removed by optimization of the main-chain coordinates, using simulated annealing of the main chain with the conservation of side-chain conformation for each residue. Side-chain atom coordinates were optimized by iterative cycles of side-chain generation and main-chain optimization. The missing regions and discontinuities of the main chain were constructed by FAMS, using a loop search process to obtain an energetically stable structure. Thus, the aforementioned FAMS modeling is an essential step to evaluate models from the viewpoint of energy. If the coordinates of a side chain or a main chain have serious energy errors (e.g., short contacts, unnatural chiral center, or torsion angles), the comparisons between models will not be reasonably performed in the evaluation step. In this step, our scoring method uses the environment of the side chain that is described by the fraction of buried area and fraction of area which is covered by the polar atoms. Consequently, even if the coordinates of the main chain is close to the native, a model which has many short contacts in the side chains will be rejected from the selection. In addition, in the modelselection step (Selection of the Five Best Models by Using Consensus Methods section), the final models are selected according to the consensus value from a comparison of coordinates among models. Therefore, unnatural coordinates of the main chain in each server will cause an error when the consensus value is calculated. Assessment of target difficulty The next step evaluates the feasibility of the model construction with template protein coordinates from the DOI /prot PROTEINS 99

3 G. Terashi et al. Figure 1 A flowchart illustrating the key steps of the fams-ace method and describing the CASP7 original server containing five models from servers from 1 to M, the models refined by FAMS, model selection among five models by CIRCLE after estimation of the difficulty of target and the secondary structure correspondence, and consensus selection of the final five models by 3D-Jury score. experiments. fams-ace employs one of the two scoring functions depending on the target difficulty. In order to predict the target difficulty, the support vector machines (SVMs) program 19 was used. Classifications based on SVMs have been used for several applications in bioinformatics and computational biology. 20 The training datasets consisted of CASP6 targets classified as CM targets (as positive) or not (fold recognition or new fold, as negative). Score and homology (%) values of the best alignment resulting from the SPARKS2 program 21 were used as vectors for SVM classification. SPARKS2 performs alignments using a knowledge-based energy score with sequence-profile and secondary structure information. For CM in CASP6, SPARKS2 performed well in recognizing the best or near-best template. Thus, we assumed that if SPARKS2 cannot find reasonable alignments, the target must be truly difficult. The sensitivity of classification, defined as (TP/(TP1FN)), was 93.0% (40/(40 1 3)) in CASP6 targets. TP, FP, TN, and FN denote true positive, false positive, true negative, and false negative, respectively. The specificity (TN/(TN1FP)) was 93.6% (44/(4413)). The predicted classification for target difficulty was then used to select one of the two scoring functions in the next section. Selection of a representative model from each server model by structure evaluation The top five models submitted by each server were further selected by evaluating their free energy with standard techniques. The purpose of the evaluation is to remove the unstable models for the final model selection step. The best representative model for each server was selected by CIRCLE. CIRCLE considers two terms for the model quality: (1) model quality calculated from the side-chain environment of each residue and (2) similarity between the secondary structure propensities predicted for an amino acid sequence by PSI-PRED 22 and the secondary structures of the three-dimensional model. The side-chain environments for each residue were determined from three parameters: (1) the fraction of the molecular surface area of the side chains exposed to water or covered by the polar atoms, (2) the fraction of the sidechain area buried by any other atoms, and (3) the second- 100 PROTEINS DOI /prot

4 Combined Method After Remodeling Figure 2 Matrix data used in the scoring function of CIRCLE. Sets (a1, b1) show the frequency of residues LYS and LEU observed in the PDB dataset, respectively, as described in the Eq. (2). Sets (a2, b2) show converted data from a1, b1 matrices by using Gaussian weight as described in the Eq. (3). Sets (a3, b3) show scoring matrices of LYS and LEU according to the side-chain environments as described in the Eq. (5). ary structures. The values of (1) and (2) were categorized to 100 classes. Moreover, as shown in Figure 2, the sum of (1) and (2) is always above 1.00 (100%). Therefore, the number of combinations of (1) and (2) was 5050 [( )/ ]. The secondary structures were described using a sliding window around the residue to classify the secondary structures in detail. This was especially effective in CM targets, as the window sizes were three and one amino acid residues having the assignment of helix, sheet, or coil in CM targets and non-cm targets, respectively. For example, the secondary structure classified as CCC represents the residue existing in the center of the coil region traversing through three residues. Therefore, for CM targets (window size 3), 27 classes were used to classify the secondary structures. Finally, for CM and non-cm targets, the side-chain environments of amino acid residues were classified to 136,350 ( ) and 15,150 ( ) classes, respectively. These classifications can describe side-chain environments in more detail than Verify3D (18 classes) but also causes a shortage in the amount of information for each class. To solve this shortage problem, a Gaussian filter was applied instead of using the classification directly. The score of model quality is calculated by the following function.! p 2 b 2 wðpolar; buried; p; bþ ¼exp 2 3 r 2 exp polar 2 3 r 2 buried N ðaajss; polar; buriedþ ð1þ ð2þ N 0 ðaajss; polar; buriedþ ¼ X X wðpolar; buried; m; nþ m n 3 NðAAjss; polar þ m; buried þ nþ ð3þ PðAAjss; polar; buriedþ ¼ N 0 ðaajss; polar; buriedþ P N 0 ðaajss; polar; buriedþ aa SCOREðAAjenvÞ ¼log PðAAjenvÞ PðAAÞ PðAAjss; polar; buriedþ ¼ log PðAAÞ ð4þ ð5þ DOI /prot PROTEINS 101

5 G. Terashi et al. Table I Performance of Refinement by FAMS Original a FAMS model b All Selected All Selected 2,908,231 15,725 2,949,313 (11.4) 15,779 (10.3) Model length c 2,441,472 13,337 2,552,207 (14.5) 13,790 (13.4) Residues in most favored regions d 3,59, ,505 (28.2) 1636 (29.5) Residues in additional allowed regions d 67, ,253 (233.0) 190 (245.7) Residues in generously allowed regions d 40, ,348 (244.3) 163 (229.4) Residues in disallowed regions d 18, ,265 (240.3) 102 (251.9) No of unnatural chiral centers e (298.8) 2 (299.8) No. of short contacts f (0 2.0 Š) 411, (298.9) 4 (299.8) No. of short contacts f (0 2.2 Š) 661, ,880 (297.9) 73 (298.1) No. of short contacts f (0 2.4 Š) 1,559, ,346 (290.3) 814 (291.3) No. of short contacts f (0 2.6 Š) 801, ,797 (15.3) 5217 (11.6) g Side-chain accuracy of v 1 423, ,743 (14.2) 2767 (11.8) h Side-chain accuracy of v (20.3) (20.3) GDT_TS i Summary of model quality from PROCHECK, accuracy of side chain, and GDT_TS in all CASP7 targets. The columns of all represents the all server models. The columns of selected represents the models which were selected from server models as the best model by fams-ace. The value in parentheses (in percentages) is the rate of increase from original to FAMS model. a Original models submitted to CASP7 by automatic servers within 48 h of the target sequence being given. b Models refined by FAMS, the original modeling program. c Total amino acid residues existing in models. d Estimation of Ramachandran plot regions from PROCHECK. For model quality estimation, the number of residues in the most favored regions near the torsion angles around the main-chain C-alpha is a good indicator of the stereochemical quality of a protein structure. If the most favored regions are experimentally the ones with the best torsion, then additional and generously allowed regions are better and normal, and disallowed regions are concomitantly worse. e Number of experimentally unnatural chiral centers surrounding the main-chain C-alpha atom. f Number of short contacts is defined as the number of any pair of nonbonded atoms within the distance indicated. g The number of residues which have both C-alpha atoms within 3.5 Å and side chain with v 1 angles within 408 of native structures. h The number of residues which have C-alpha atoms within 3.5 Å and both v 1 and v 2 must be within 40 of native. i GDT_TS were calculated from 114 domains of native structure which were obtained from CASP7. As shown in Eq. (5), a general approach of our scoring function, which generates the score of each amino acid residue (AA) according to the environment (env), is the use of a statistical potential. Differences between CIRCLE and Verify3D are found in Eqs. (1) (4). Here, w(polar, buried,p,b) represents the Gaussian weight in the particular environment (polar and buried). The letters of b and p designate buried ratio and polar ratio axes, respectively, and describe the distance from the center location (polar and buried) of each Gaussian weight function. The standard deviations (r polar, r buried ) according to the environment of the side chain (fraction of polar area, buried area) in Eq. (1) are calculated from the virtual mutation dataset. We analyzed the variation of environments by considering the mutated amino acid residues in a particular position. A total of 504,716 datasets were constructed by using homology modeling methods. N ðaajss; polar; buriedþ is the number of residues AA observed in environment env from the PDB dataset [Fig. 2(a1,b1)]. As mentioned earlier, since the classification of environments is detailed, NðAAjss; polar; buriedþ could not be used directly to calculate P(AA env) in Eq. (5). Therefore, the Gaussian weight was used as a Gaussian filter in Eq. (3) to generate smoothed data from the raw data N ðaajss; polar; buriedþ. In Eq. (5), P(AA env) is the probability of the amino acid residue AA in the environment env. aa is a variable for the 20 amino acid residues. P(AA) is the probability of finding residue AA in all the amino acid residues. The P(AA env) contains the Gaussian weight function corresponding to Eq. (1) in order to consider the variability of the frequency in the side-chain environment. Examples of the scoring matrixes of hydrophobic (Leucine) and hydrophilic (Lysine) residues are shown in Figure 2. The score corresponding to the side-chain environments, Eq. (5), cannot consider the secondary structure similarity, although PSI-PRED, which predicts secondary structures, achieved an average Q 3 score of about 80%. Therefore, we added the term of secondary structure similarity between the model and prediction to CIRCLE. The measure of similarity in secondary structures is based on the following scoring function. Pði; jjconf Þ SSscoreði; jþ ¼log ð6þ P pre ðijconfþp m ðjjconfþ i represents the secondary structure of target sequence predicted by PSI-PRED. j is the secondary structure observed in the model. conf is one value of the confidences (0, 1, 2,..., 9) calculated by PSI-PRED. P pre (i conf) is the probability of the secondary structure i, which was predicted by PSI-PRED presence when a value of confidence is conf. P m (j conf) is the probability of the secondary structure j observed in the model when the value of confidence is conf. P(i, j conf) is the probability of the secondary structures i and j, mentioned earlier corresponding to conf. This similarity of secondary structures is a useful measure espe- 102 PROTEINS DOI /prot

6 Combined Method After Remodeling cially for difficult targets, that is, when near native structures do not exist or the similarity of secondary structures can find good local structures even though the structure folding is entirely different. According to the target difficulty rating predicted by SVMs, the total score is calculated as TotalScore 8 >< ¼ P >: length P n length n ð0:35 3 SSscore þ 3D1Dscore CM Þ n CM ð0:75 3 SSscore þ 3D1Dscore FRNF Þ n FR or NF ð7þ The coefficients for the measure of similarity of secondary structures (SSscore) were optimized from CASP6 targets. The similarity of the secondary structures is emphasized in difficult targets. Selection of the five best models by using consensus methods All models evaluated as the best for each server were compared using the consensus method in the 3D-Jury system. We modified the confidence score of 3D-Jury as consensusðm a Þ¼ P N i;a6¼i simðm a; M i Þ 1 þ N ð8þ where M a is the representative model of server a. N is the number of servers. sim(m a,m i ) is the similarity score from MAXSUB 23 between models M a and M i, which equal the number of C-alpha atom pairs that are within 3.5 Å. If the sim(m a,m i ) is below 40, it is set to zero (according to the 3D-Jury protocols). The best five models, according to the consensus value, were selected for submission to CASP7. No human intervention occurred during the procedure, and the fams-ace process did not consider server name or server performance. A server with exceptional performance was not rated any higher before commencement of our model selection process. RESULTS AND DISCUSSION Do modifications on server models improve model quality? Table I displays the performance of FAMS refinement in view of (1) stereochemical accuracy, (2) the accuracy of side chains (v 1 and v 112 ), and (3) overall similarity of C-alpha atoms positions (GDT_TS). 24 The side-chain accuracy of v 1 equals the number of residues which have both C-alpha atoms within 3.5 Å and side chains with v 1 angles within 408 of native structures. v 112 equals the number of residues which have C-alpha atoms within 3.5 Å, and both v 1 and v 2 must be within 408 of native. For Table II Top 10 of Input Servers According to the Contribution No. fams-ace No. fams-ace (improved) 22 Zhang-Server 20 Zhang-Server 15 MetaTasser 13 ROBETTA 5 beautshot 9 Pmodeller6 4 SP3 5 Pcons6 3 keasar-server 5 MetaTasser 3 PROTINFO 4 PROTINFO-AB 3 HHpred3 3 PROTINFO 3 HHpred2 3 HHpred2 3 FOLDpro 3 FOLDpro 3 BayesHH 3 FAMSD 2 shub 3 BayesHH 2 beautshotbase 2 SP4 2 UNI-EID_expm 2 FAMS 2 SPARKS2 2 CIRCLE 2 SP4 2 ABIpro 2 Pcons6 1 keasar-server 2 FAMSD 1 forecast-s 2 FAMS 1 UNI-EID_sfst 2 3Dpro 1 UNI-EID_expm 1 nfold 1 SP3 1 mgen-3d 1 SAM_T06_server 1 karypis.srv.2 1 RAPTOR-ACE 1 forecast-s 1 RAPTOR 1 SAM-T99 1 Phyre-2 1 ROKKY 1 Ma-OPUS-server 1 ROBETTA 1 HHpred3 1 RAPTOR-ACE 1 GeneSilicoMetaServer 1 RAPTOR 1 FORTE1 1 NN_PUT_lab 1 CaspIta-FOX 1 Ma-OPUS-server2 1 Bilab-ENABLE 1 Huber-Torda-Server 1 3Dpro 1 FUGUE The number of models selected as the best by fams-ace and fams-ace (improved), respectively, for each CASP7 target. For example, Zhang-Server obtained the best model 22 and 20 times in the fams-ace and fams-ace (improved) team, respectively. Moreover, MetaTasser and ROBETTA obtained the best model 15 and 13 times in the fams-ace and fams-ace (improved) team, respectively. the 25,615 server models obtained from the CASP7 website (represented as all ) and TS1 models of fams-ace (represented as selected ), we compared the models rebuilt by FAMS and the original models. PRO- CHECK 25,26 was used to assess the performance of the models from a stereochemical and geometric point of view. In almost all categories of PROCHECK, the server models which were rebuilt by FAMS had improved model quality. In summary of all server models, Table I shows that 41,000 amino acid residues were newly constructed, and residues in the favored and disallowed regions of the Ramachandran plot improved by about 110,000 and 18,000 residues, respectively. In addition, the number of wrong short contacts within 2.6 Å improved by 1.41 million. In side-chain accuracy, v 1 and v 112 increased 5.25% and 4.16%, respectively. Moreover, improvements of the stereochemical accuracy and accuracy of side chains were observed in selected models. Thus, these results show that FAMS remodeling was effective in improving model quality from a stereochemi- DOI /prot PROTEINS 103

7 G. Terashi et al. Figure 3 (a) Results of fams-ace (solid and broken horizontal lines for averaged GDT_TS and Z-score, respectively) in TBM targets showing the exclusion of one server in CASP7. The horizontal axis represents the excluded server. As several native structures are unpublished, GDT_TS of the original models from the CASP7 website were used. (b) Relative proportions of GDT_TS to best GDT_TS for each target. (c) Comparison of GDT_TS values between fams-ace and fams-ace (improved). High accuracy template-based modeling (HA-TBM) targets, template-based modeling (TBM) targets except for HA and FM targets categories are described by circle, square, and cross, respectively. (d) Comparison of the GDT_TS values between fams-ace and fams-ace (improved). HA, TBM except for HA, and FM categories are described by circle, square, and cross, respectively. cal and geometric basis and accuracy of side chains. On the other hand, improvements of GDT_TS were not observed in the rebuilt models despite an advance in model quality of the side chains, and stereochemical and geometric basis. However, the decrease of GDT_TS was 0.3%. These results indicate that (1) improvement of the side chains do not directly correspond to an improvement of GDT_TS and (2) FAMS can improve side-chain accuracy while keeping the folds of the model intact. The major purposes of the remodeling step are to normalize and improve the model quality for selecting representative models by CIRCLE. When comparing representative models and the first ranked models (TS1 models) for each server, the summed v 1, v 112 and GDT_TS of representative models were 1.48%, 1.32%, and 1.14% better than the first ranked models, respectively. Therefore, FAMS achieved the purpose of the remodeling step. In the future, FAMS should be improved to obtain better side chain and GDT_TS accuracies. Nevertheless, improvement of the three-dimensional model quality is very useful in selecting the best model. The contribution of the individual input servers In the fams-ace protocol, about 250 submitted models in each target were remodeled by FAMS, and then a representative from the five models of each server was selected according to CIRCLE. In the selection of representative models, TS1 and AL1 models comprised 45.7% of all selected models. This indicates that nearly half of the representative models were equal to the best-ranked models selected by the server. As mentioned previously, the representative models were superior to first ranked models. The contributions of each server are shown in Table II. About 23.2% of the final best models chosen by fams-ace 104 PROTEINS DOI /prot

8 Combined Method After Remodeling Table III Total Values of Z-score, GDT_TS, and the Accuracy of Side-Chain fams-ace a fams-ace (improved) b 3D-Jury c CIRCLE-FAMS d Z score HA-TBM (26.0) (12.3) (214.0) TBM (14.3) (27.1) (23.3) FM (1225.2) 2.16 (267.6) (1226.4) ALL (119.8) (211.2) (113.0) GDT_TS HA-TBM (21.0) (10.3) (21.3) TBM (10.4) (20.7) (21.1) FM (115.3) (25.4) (113.6) ALL (11.2) (21.0) (20.2) v 1 HA-TBM (15.2) 2307 (14.2) 2456 (111.0) TBM (15.5) 5154 (21.1) 5679 (19.0) FM (128.2) 174 (27.4) 231 (122.9) ALL (16.2) 5256 (21.1) 5821 (19.5) v 112 HA-TBM (14.1) 1292 (15.2) 1417 (115.4) TBM (19.6) 2751 (21.1) 3150 (113.2) FM (120.6) 75 (222.7) 113 (116.5) ALL (17.3) 2792 (21.7) 3216 (113.2) a CASP7 team using fams-ace. b Virtual team fams-ace (improved) mentioned in Result and Discussion. c Virtual team 3D-Jury (consensus method) using only TS1 models. d CASP7 team using CIRCLE for final model selection. HA-TBM (28 domains), TBM (108 domains), and FM (19 domains) denote high accuracy template based modeling targets, template-based modeling targets including HA, and free modeling targets, respectively. ALL (123 domains) means the total score for all the CASP7 targets. GDT_TS and the native structures for calculating v 1 and v 112 were obtained from CASP7 web site. The value in parentheses (in percentages) is the rate of increase from fams-ace. were refined Zhang-Server models which indicate that refined Zhang-Server models have a large consensus value compared with other representative models. Additionally, 15.8% of the best models selected by fams-ace were refined MetaTasser models. Furthermore, in order to discuss the contribution of each server from a standpoint of model accuracy and influence when calculating the consensus value, we determined the average GDT_TS of fams-ace for TBM targets (total of 108 domains) when a particular server was excluded [Fig. 3(a)]. The contribution of MetaTasser and Zhang-Server is larger than all other servers, as shown in the large decrease of the average GDT_TS value. This indicates that if MetaTasser and Zhang-Server had not participated in CASP7, fams-ace could not have achieved successful or real results. Thus, models from servers with exceptional performance (Zhang-Server and MetaTasser) are essential for a good performance from the fams-ace method. In addition, an increased in performance would occur if fams-ace had thinned out particular servers (e.g., SP3, beautshot), which showed a larger average GDT_TS than fams-ace (represented as horizontal line) in Figure 3(a). In other words, if these servers had not participated in CASP7, the GDT_TS of fams-ace would have improved. However, the average GDT_TS of fams-ace was indifferent to the exclusion of specific servers and could not equal the Zhang-server (67.86), but came near at Example T0371_D2 (template-based modeling target) T0371_D2 is the second domain (residues ) of the target T0371 (now PDB code: 2HX1). In this target, fams-ace submitted the best model in GDT_TS (72.1) and AL0 (82.64) amongst all of the CASP7 teams. The model presented by fams-ace was constructed from a Zhang- Server TS4 model (GDT_TS: 70.66, AL_0: 80.17). There were few differences between the fams-ace TS1 model and the Zhang-Server TS4 model (rms ). In addition, there were no differences in side-chain accuracies (v 1 and v 112 ). This example indicates that fams-ace could select a three-dimensional structure similar to the native structure. For two domains (T0371_D2 and T0321_D2), fams-ace submitted better models than the best-ranked server models. fams-ace considered only the consensus of the representative models instead of the quality of model. Consequently, fams-ace did not select outstanding models in both positive and negative viewpoints. What went right? What went wrong? The fams-ace method selected good models which were relatively high quality from the stand point of over- DOI /prot PROTEINS 105

9 G. Terashi et al. Figure 4 Two failed examples of the refinement step by FAMS. (a, b) Models of T0356. (c, d) Models of T0283. (a) and (c) are original server models of T0356 and T0283, respectively. (b) and (d) are the models refined by FAMS. The regions which were constructed by FAMS are circled. There exists extremely high tension in the main-chain structures of these regions. all similarity with native structure (GDT_TS). The success of fams-ace can be explained by three factors. (1) fams-ace can cope with the situation of several good models from server teams, especially from Zhang-server and MetaTasser [see Fig. 3(a)]. (2) Since the consensus method was used for final model selection, fams-ace did not select outstanding models or appropriate models in positive and negative outcomes. For 72.2 % (78/108) of TBM targets, fams-ace could select good models which have a GDT_TS within 90% of the highest GDT_TS among all server models [Fig. 3(b)]. (3) Selection of representative models with improved GDT_TS. The representative models from each server were selected according to the evaluation of the side-chain environments. Table III shows a comparison summary of fams-ace and the virtual server (described as 3D-Jury ), which uses 3D- Jury method in the final model selection from TS1 models of servers instead of representative models. Except for high accuracy template-based modeling targets (HA- TBM), fams-ace submitted better models than 3D-Jury. However, several problems in the fams-ace method were noted. For example, in the calculation of the consensus value, the similarity scores from MAXSUB are set to zero if the similarity score is below 40. Many models were judged to have insignificant similarities although the models had weak similarities. In this case, the consensus method fails during the final model selection step. Therefore, fams-ace could not select the near best quality models consistently for difficult targets [see Fig. 3(c)]. During the refinement of server models, multidomain problems which caused errors were encountered. An example can be seen with the T0356 in which a certain server presented a model which was divided by domains. FAMS interpreted the division to be a deletion region in the sequence and coordinates. Therefore, FAMS constructed breaking regions to connect the separated domains. The main-chain structures of the connected regions broke in the process of the difficult reconstruction [Fig. 4(a,b)]. Though unnatural models were usually rejected in the evaluation step by CIRCLE, in a few cases, 106 PROTEINS DOI /prot

10 Combined Method After Remodeling these unnatural models were selected as the best model by fams-ace. This tendency appeared to be associated with difficult targets. In the worst example, T0283, GDT_TS of fams-ace was only 25.52, in contrast to the highest GDT_TS of the server which was [see Fig. 4(c,d)]. Lastly, a major problem with fams-ace is the inability to present the models with accurate side-chain conformations, although GDT_TS is relatively high. Since fams-ace selects the best models according to the consensus value, this problem is unavoidable. Simple improvement of fams-ace For the reasons which were mentioned in the earlier section, we investigated the possibility of using CIRCLE instead of the consensus method. CIRCLE performs well by selecting according to the side-chain environments in the final model selection step. There are three steps in the new fams-ace process. (1) Rebuild and refine server models. (2) Select models according to the Z-score of the consensus score. The similarity score from MAXSUB did not change in the difficult targets (FR and NF) predicted by SVM. The thresholds of the model selection were optimized by CASP6 targets using the category classification (CM or noncm) reported by the CASP6 organizer. (3) Select final models using CIRCLE. Hereinafter, this collection of steps is denoted as fams-ace (improved). The greatest difference between fams-ace and famsace (improved) is the final model selection step. famsace and fams-ace (improved) use consensus methods by modified 3D-Jury and model evaluation methods by CIRCLE, respectively. The goal of fams-ace (improved) is to obtain models which have a high quality in both GDT_TS and side-chain accuracies. The results of famsace (improved) are shown in Table III and Figure 3(b,d). Except for HA-TBM targets, Z-score of GDT_TS and GDT_TS improved in comparison with fams-ace. There was an obvious improvement in side chains in all categories, especially in free modeling (FM) targets and difficult TBM targets (highest GDT_TS < 50). Thus, these results suggest that this improvement provides a method for obtaining good models in both GDT_TS and quality of side chains. However, the problem of a multidomain target still remains. Improving the refinement process and assignment of domains is planned for the future. Moreover, we propose developing a new system based on the improved fams-ace method to generate results superior to the best server models. ACKNOWLEDGMENTS We thank CASP7 organizers and assessors, and experimentalists who supplied targets for CASP7. And particular thanks to all server teams in CASP7. REFERENCES 1. Takeda-Shitaka M, Terashi G, Takaya D, Kanou K, Iwadate M, Umeyama H. Protein structure prediction in CASP6 using CHI- MERA and FAMS. Proteins 2005;61(Suppl 7): Sippl MJ. Knowledge-based potentials for proteins. Curr Opin Struct Biol 1995;5: Zhang Y, Skolnick J. SPICKER: a clustering approach to identify near-native protein folds. J Comput Chem 2004;25: Lee MR, Tsai J, Baker D, Kollman PA. Molecular dynamics in the endgame of protein structure prediction. J Mol Biol 2001;313: Luthy R, Bowie JU, Eisenberg D. Assessment of protein models with three-dimensional profiles. Nature 1992;356: Eisenberg D, Luthy R, Bowie JU. VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol 1997; 277: Fischer D. Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac Symp Biocomp 2000;5: Kelley LA, MacCallum RM, Sternberg MJ. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000;299: Wallner B, Elofsson A. Can correct protein models be identified? Protein Sci 2003;12: Fischer D. Servers for protein structure prediction. Curr Opin Struct Biol 2006;16: Tosatto SC. The Victor/FRST function for model quality estimation. J Comput Biol 2005;12: Pettitt CS, McGuffin LJ, Jones DT. Improving sequence-based fold recognition by using 3D model quality assessment. Bioinformatics 2005;21: Lundstrom J, Rychlewski L, Bujnicki J, Elofsson A. Pcons: a neuralnetwork-based consensus predictor that improves fold recognition. Protein Sci 2001;10: Ginalski K, Elofsson A, Fischer D, Rychlewski L. 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics 2003;19: Ogata K, Umeyama H. An automatic homology modeling method consisting of database searches and simulated annealing. J Mol Graph Model 2000;18: , Anguita D, Boni A, Ridella S, Rivieccio F, Sterpi D. Theoretical and practical model selection methods for support vector classifiers. In: Wang L, editor. Support vector machines: theory and applications, Vol. 177; Berlin: Springer-Verlag; pp Park KJ, Gromiha MM, Horton P, Suwa M. Discrimination of outer membrane proteins using support vector machines. Bioinformatics 2005;21: Zhou H, Zhou Y. Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins 2004;55: Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999;292: Siew N, Elofsson A, Rychlewski L, Fischer D. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 2000;16: Zemla A, Venclovas C, Moult J, Fidelis K. Processing and analysis of CASP3 protein structure predictions. Proteins 1999;37(Suppl 3): Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PRO- CHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst 1993;26: Morris AL, MacArthur MW, Hutchinson EG, Thornton JM. Stereochemical quality of protein structure coordinates. Proteins 1992; 12: DOI /prot PROTEINS 107

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO! Identification of Protein-model accuracy Why is it important? What is accuracy RMSD, fraction correct, Protein model correctness/quality

More information

Protein Structure Prediction

Protein Structure Prediction Protein Structure Prediction Michael Feig MMTSB/CTBP 2009 Summer Workshop From Sequence to Structure SEALGDTIVKNA Folding with All-Atom Models AAQAAAAQAAAAQAA All-atom MD in general not succesful for real

More information

Protein quality assessment

Protein quality assessment Protein quality assessment Speaker: Renzhi Cao Advisor: Dr. Jianlin Cheng Major: Computer Science May 17 th, 2013 1 Outline Introduction Paper1 Paper2 Paper3 Discussion and research plan Acknowledgement

More information

Protein Structure Prediction

Protein Structure Prediction Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on

More information

The typical end scenario for those who try to predict protein

The typical end scenario for those who try to predict protein A method for evaluating the structural quality of protein models by using higher-order pairs scoring Gregory E. Sims and Sung-Hou Kim Berkeley Structural Genomics Center, Lawrence Berkeley National Laboratory,

More information

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback

More information

Identification of correct regions in protein models using structural, alignment, and consensus information

Identification of correct regions in protein models using structural, alignment, and consensus information Identification of correct regions in protein models using structural, alignment, and consensus information BJO RN WALLNER AND ARNE ELOFSSON Stockholm Bioinformatics Center, Stockholm University, SE-106

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

proteins CASP Progress Report Progress from CASP6 to CASP7 Andriy Kryshtafovych, 1 Krzysztof Fidelis, 1 and John Moult 2 *

proteins CASP Progress Report Progress from CASP6 to CASP7 Andriy Kryshtafovych, 1 Krzysztof Fidelis, 1 and John Moult 2 * proteins STRUCTURE O FUNCTION O BIOINFORMATICS CASP Progress Report Progress from CASP6 to CASP7 Andriy Kryshtafovych, 1 Krzysztof Fidelis, 1 and John Moult 2 * 1 Genome Center, University of California,

More information

Template-Based Modeling of Protein Structure

Template-Based Modeling of Protein Structure Template-Based Modeling of Protein Structure David Constant Biochemistry 218 December 11, 2011 Introduction. Much can be learned about the biology of a protein from its structure. Simply put, structure

More information

TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6

TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6 PROTEINS: Structure, Function, and Bioinformatics Suppl 7:91 98 (2005) TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6 Yang Zhang, Adrian K. Arakaki, and Jeffrey

More information

TOUCHSTONE: A Unified Approach to Protein Structure Prediction

TOUCHSTONE: A Unified Approach to Protein Structure Prediction PROTEINS: Structure, Function, and Genetics 53:469 479 (2003) TOUCHSTONE: A Unified Approach to Protein Structure Prediction Jeffrey Skolnick, 1 * Yang Zhang, 1 Adrian K. Arakaki, 1 Andrzej Kolinski, 1,2

More information

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition Sequence identity Structural similarity Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Fold recognition Sommersemester 2009 Peter Güntert Structural similarity X Sequence identity Non-uniform

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

1-D Predictions. Prediction of local features: Secondary structure & surface exposure 1-D Predictions Prediction of local features: Secondary structure & surface exposure 1 Learning Objectives After today s session you should be able to: Explain the meaning and usage of the following local

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models Protein Modeling Generating, Evaluating and Refining Protein Homology Models Troy Wymore and Kristen Messinger Biomedical Initiatives Group Pittsburgh Supercomputing Center Homology Modeling of Proteins

More information

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics. Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond

More information

Basics of protein structure

Basics of protein structure Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu

More information

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target. HOMOLOGY MODELING Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

proteins Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * INTRODUCTION

proteins Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * INTRODUCTION proteins STRUCTURE O FUNCTION O BIOINFORMATICS Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * 1 Department of Biological Sciences, College

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Template-Based 3D Structure Prediction

Template-Based 3D Structure Prediction Template-Based 3D Structure Prediction Sequence and Structure-based Template Detection and Alignment Issues The rate of new sequences is growing exponentially relative to the rate of protein structures

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information

Analysis and Prediction of Protein Structure (I)

Analysis and Prediction of Protein Structure (I) Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng

More information

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods Cell communication channel Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu SEQUENCE STRUCTURE DNA Sequence Protein Sequence Protein Structure Protein structure ATGAAATTTGGAAACTTCCTTCTCACTTATCAGCCACCT...

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

ProcessingandEvaluationofPredictionsinCASP4

ProcessingandEvaluationofPredictionsinCASP4 PROTEINS: Structure, Function, and Genetics Suppl 5:13 21 (2001) DOI 10.1002/prot.10052 ProcessingandEvaluationofPredictionsinCASP4 AdamZemla, 1 ČeslovasVenclovas, 1 JohnMoult, 2 andkrzysztoffidelis 1

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

Protein Threading Based on Multiple Protein Structure Alignment

Protein Threading Based on Multiple Protein Structure Alignment Protein Threading Based on Multiple Protein Structure lignment Tatsuya kutsu Kim Lan Sim takutsu@ims.u-tokyo.ac.jp klsim@ims.u-tokyo.ac.jp Human Genome Center, Institute of Medical Science, University

More information

Fold assessment for comparative protein structure modeling

Fold assessment for comparative protein structure modeling Fold assessment for comparative protein structure modeling FRANCISCO MELO 1 AND ANDREJ SALI 2,3,4 1 Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad

More information

Protein Structures: Experiments and Modeling. Patrice Koehl

Protein Structures: Experiments and Modeling. Patrice Koehl Protein Structures: Experiments and Modeling Patrice Koehl Structural Bioinformatics: Proteins Proteins: Sources of Structure Information Proteins: Homology Modeling Proteins: Ab initio prediction Proteins:

More information

Protein Structure Determination

Protein Structure Determination Protein Structure Determination Given a protein sequence, determine its 3D structure 1 MIKLGIVMDP IANINIKKDS SFAMLLEAQR RGYELHYMEM GDLYLINGEA 51 RAHTRTLNVK QNYEEWFSFV GEQDLPLADL DVILMRKDPP FDTEFIYATY 101

More information

SUPPLEMENTARY MATERIALS

SUPPLEMENTARY MATERIALS SUPPLEMENTARY MATERIALS Enhanced Recognition of Transmembrane Protein Domains with Prediction-based Structural Profiles Baoqiang Cao, Aleksey Porollo, Rafal Adamczak, Mark Jarrell and Jaroslaw Meller Contact:

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Protein Structure Prediction, Engineering & Design CHEM 430

Protein Structure Prediction, Engineering & Design CHEM 430 Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment

More information

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

Contact map guided ab initio structure prediction

Contact map guided ab initio structure prediction Contact map guided ab initio structure prediction S M Golam Mortuza Postdoctoral Research Fellow I-TASSER Workshop 2017 North Carolina A&T State University, Greensboro, NC Outline Ab initio structure prediction:

More information

RMS/Coverage Graphs: A Qualitative Method for Comparing Three-Dimensional Protein Structure Predictions

RMS/Coverage Graphs: A Qualitative Method for Comparing Three-Dimensional Protein Structure Predictions PROTEINS: Structure, Function, and Genetics Suppl 3:15 21 (1999) RMS/Coverage Graphs: A Qualitative Method for Comparing Three-Dimensional Protein Structure Predictions Tim J.P. Hubbard* Sanger Centre,

More information

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Motif Prediction in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions

More information

ALL LECTURES IN SB Introduction

ALL LECTURES IN SB Introduction 1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL

More information

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure Structure prediction, fold recognition and homology modelling Marjolein Thunnissen Lund September 2012 Steps in protein modelling 3-D structure known Comparative Modelling Sequence of interest Similarity

More information

Detection of Protein Binding Sites II

Detection of Protein Binding Sites II Detection of Protein Binding Sites II Goal: Given a protein structure, predict where a ligand might bind Thomas Funkhouser Princeton University CS597A, Fall 2007 1hld Geometric, chemical, evolutionary

More information

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction Institute of Bioinformatics Johannes Kepler University, Linz, Austria Chapter 4 Protein Secondary

More information

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy 7.91 Amy Keating Solving structures using X-ray crystallography & NMR spectroscopy How are X-ray crystal structures determined? 1. Grow crystals - structure determination by X-ray crystallography relies

More information

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome

More information

Physiochemical Properties of Residues

Physiochemical Properties of Residues Physiochemical Properties of Residues Various Sources C N Cα R Slide 1 Conformational Propensities Conformational Propensity is the frequency in which a residue adopts a given conformation (in a polypeptide)

More information

proteins Prediction Methods and Reports

proteins Prediction Methods and Reports proteins STRUCTURE O FUNCTION O BIOINFORMATICS Prediction Methods and Reports Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html

More information

Bioinformatics: Secondary Structure Prediction

Bioinformatics: Secondary Structure Prediction Bioinformatics: Secondary Structure Prediction Prof. David Jones d.jones@cs.ucl.ac.uk LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem? Entries

More information

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural

More information

Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling

Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling 63:644 661 (2006) Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling Brajesh K. Rai and András Fiser* Department of Biochemistry

More information

proteins High Accuracy Assessment Assessment of CASP7 predictions in the high accuracy template-based modeling category

proteins High Accuracy Assessment Assessment of CASP7 predictions in the high accuracy template-based modeling category proteins STRUCTURE O FUNCTION O BIOINFORMATICS High Accuracy Assessment Assessment of CASP7 predictions in the high accuracy template-based modeling category Randy J. Read* and Gayatri Chavali Department

More information

Measuring quaternary structure similarity using global versus local measures.

Measuring quaternary structure similarity using global versus local measures. Supplementary Figure 1 Measuring quaternary structure similarity using global versus local measures. (a) Structural similarity of two protein complexes can be inferred from a global superposition, which

More information

proteins Comparison of structure-based and threading-based approaches to protein functional annotation Michal Brylinski, and Jeffrey Skolnick*

proteins Comparison of structure-based and threading-based approaches to protein functional annotation Michal Brylinski, and Jeffrey Skolnick* proteins STRUCTURE O FUNCTION O BIOINFORMATICS Comparison of structure-based and threading-based approaches to protein functional annotation Michal Brylinski, and Jeffrey Skolnick* Center for the Study

More information

A new prediction strategy for long local protein. structures using an original description

A new prediction strategy for long local protein. structures using an original description Author manuscript, published in "Proteins Structure Function and Bioinformatics 2009;76(3):570-87" DOI : 10.1002/prot.22370 A new prediction strategy for long local protein structures using an original

More information

Modeling for 3D structure prediction

Modeling for 3D structure prediction Modeling for 3D structure prediction What is a predicted structure? A structure that is constructed using as the sole source of information data obtained from computer based data-mining. However, mixing

More information

Protein Structure Prediction

Protein Structure Prediction Protein Structure Prediction Michael Feig MMTSB/CTBP 2006 Summer Workshop From Sequence to Structure SEALGDTIVKNA Ab initio Structure Prediction Protocol Amino Acid Sequence Conformational Sampling to

More information

Received: 04 April 2006 Accepted: 25 July 2006

Received: 04 April 2006 Accepted: 25 July 2006 BMC Bioinformatics BioMed Central Methodology article Improved alignment quality by combining evolutionary information, predicted secondary structure and self-organizing maps Tomas Ohlson 1, Varun Aggarwal

More information

As of December 30, 2003, 23,000 solved protein structures

As of December 30, 2003, 23,000 solved protein structures The protein structure prediction problem could be solved using the current PDB library Yang Zhang and Jeffrey Skolnick* Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington Street,

More information

IT og Sundhed 2010/11

IT og Sundhed 2010/11 IT og Sundhed 2010/11 Sequence based predictors. Secondary structure and surface accessibility Bent Petersen 13 January 2011 1 NetSurfP Real Value Solvent Accessibility predictions with amino acid associated

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Protein Structure Detection Methods October 30, 2017 Comparative Modeling Comparative modeling is modeling of the unknown based on comparison to what is known In the context of modeling or computing

More information

Prediction and refinement of NMR structures from sparse experimental data

Prediction and refinement of NMR structures from sparse experimental data Prediction and refinement of NMR structures from sparse experimental data Jeff Skolnick Director Center for the Study of Systems Biology School of Biology Georgia Institute of Technology Overview of talk

More information

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS Int. J. LifeSc. Bt & Pharm. Res. 2012 Kaladhar, 2012 Research Paper ISSN 2250-3137 www.ijlbpr.com Vol.1, Issue. 1, January 2012 2012 IJLBPR. All Rights Reserved PROTEIN SECONDARY STRUCTURE PREDICTION:

More information

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many

More information

Homology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics

Homology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics Swiss Institute of Bioinformatics EMBnet course: Introduction to Protein Structure Bioinformatics Homology Modeling I Basel, September 30, 2004 Torsten Schwede Biozentrum - Universität Basel Swiss Institute

More information

Course Notes: Topics in Computational. Structural Biology.

Course Notes: Topics in Computational. Structural Biology. Course Notes: Topics in Computational Structural Biology. Bruce R. Donald June, 2010 Copyright c 2012 Contents 11 Computational Protein Design 1 11.1 Introduction.........................................

More information

Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans

Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans From: ISMB-97 Proceedings. Copyright 1997, AAAI (www.aaai.org). All rights reserved. ANOLEA: A www Server to Assess Protein Structures Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans Facultés

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Protein Structure Prediction using String Kernels. Technical Report

Protein Structure Prediction using String Kernels. Technical Report Protein Structure Prediction using String Kernels Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street SE Minneapolis, MN 55455-0159

More information

proteins Effect of using suboptimal alignments in template-based protein structure prediction Hao Chen 1 and Daisuke Kihara 1,2,3 * INTRODUCTION

proteins Effect of using suboptimal alignments in template-based protein structure prediction Hao Chen 1 and Daisuke Kihara 1,2,3 * INTRODUCTION proteins STRUCTURE O FUNCTION O BIOINFORMATICS Effect of using suboptimal alignments in template-based protein structure prediction Hao Chen 1 and Daisuke Kihara 1,2,3 * 1 Department of Biological Sciences,

More information

Supersecondary Structures (structural motifs)

Supersecondary Structures (structural motifs) Supersecondary Structures (structural motifs) Various Sources Slide 1 Supersecondary Structures (Motifs) Supersecondary Structures (Motifs): : Combinations of secondary structures in specific geometric

More information

Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics

Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics Protein Modeling Methods Introduction to Bioinformatics Iosif Vaisman Ab initio methods Energy-based methods Knowledge-based methods Email: ivaisman@gmu.edu Protein Modeling Methods Ab initio methods:

More information

Better Bond Angles in the Protein Data Bank

Better Bond Angles in the Protein Data Bank Better Bond Angles in the Protein Data Bank C.J. Robinson and D.B. Skillicorn School of Computing Queen s University {robinson,skill}@cs.queensu.ca Abstract The Protein Data Bank (PDB) contains, at least

More information

Acta Cryst. (2017). D73, doi: /s

Acta Cryst. (2017). D73, doi: /s Acta Cryst. (2017). D73, doi:10.1107/s2059798317010932 Supporting information Volume 73 (2017) Supporting information for article: Designing better diffracting crystals of biotin carboxyl carrier protein

More information

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Supporting Information Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Christophe Schmitz, Robert Vernon, Gottfried Otting, David Baker and Thomas Huber Table S0. Biological Magnetic

More information

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007 Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline

More information

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids Science in China Series C: Life Sciences 2007 Science in China Press Springer-Verlag Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

More information

Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization

Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization Biophysical Journal Volume 101 November 2011 2525 2534 2525 Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization Dong Xu and Yang Zhang

More information

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier *

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier * Protein Structure Prediction Using Multiple Artificial Neural Network Classifier * Hemashree Bordoloi and Kandarpa Kumar Sarma Abstract. Protein secondary structure prediction is the method of extracting

More information

PREDICTION OF PROTEIN BINDING SITES BY COMBINING SEVERAL METHODS

PREDICTION OF PROTEIN BINDING SITES BY COMBINING SEVERAL METHODS PREDICTION OF PROTEIN BINDING SITES BY COMBINING SEVERAL METHODS T. Z. SEN, A. KLOCZKOWSKI, R. L. JERNIGAN L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University Ames, IA

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

proteins SHORT COMMUNICATION MALIDUP: A database of manually constructed structure alignments for duplicated domain pairs

proteins SHORT COMMUNICATION MALIDUP: A database of manually constructed structure alignments for duplicated domain pairs J_ID: Z7E Customer A_ID: 21783 Cadmus Art: PROT21783 Date: 25-SEPTEMBER-07 Stage: I Page: 1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS SHORT COMMUNICATION MALIDUP: A database of manually constructed

More information

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Article Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Yun-Fei Wang, Huan Chen, and Yan-Hong Zhou* Hubei Bioinformatics and Molecular Imaging Key Laboratory,

More information

Protein Structure Determination Using NMR Restraints BCMB/CHEM 8190

Protein Structure Determination Using NMR Restraints BCMB/CHEM 8190 Protein Structure Determination Using NMR Restraints BCMB/CHEM 8190 Programs for NMR Based Structure Determination CNS - Brünger, A. T.; Adams, P. D.; Clore, G. M.; DeLano, W. L.; Gros, P.; Grosse-Kunstleve,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature11054 Supplementary Fig. 1 Sequence alignment of Na v Rh with NaChBac, Na v Ab, and eukaryotic Na v and Ca v homologs. Secondary structural elements of Na v Rh are indicated above the

More information

Useful background reading

Useful background reading Overview of lecture * General comment on peptide bond * Discussion of backbone dihedral angles * Discussion of Ramachandran plots * Description of helix types. * Description of structures * NMR patterns

More information

clustq: Efficient Protein Decoy Clustering Using Superposition-free Weighted Internal Distance Comparisons

clustq: Efficient Protein Decoy Clustering Using Superposition-free Weighted Internal Distance Comparisons clustq: Efficient Protein Decoy Clustering Using Superposition-free Weighted Internal Distance Comparisons Debswapna Auburn University ACM-BCB August 31, 2018 What is protein decoy clustering? Clustering

More information

Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates

Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates MARIUSZ MILIK, 1 *, ANDRZEJ KOLINSKI, 1, 2 and JEFFREY SKOLNICK 1 1 The Scripps Research Institute, Department of Molecular

More information