Genki Terashi, Mayuko Takeda-Shitaka, Kazuhiko Kanou, Mitsuo Iwadate, Daisuke Takaya, Akio Hosoi, Kazuhiro Ohta, and Hideaki Umeyama*
|
|
- Maurice Jenkins
- 6 years ago
- Views:
Transcription
1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS Prediction Report fams-ace: A combined method to select the best model after remodeling all server models Genki Terashi, Mayuko Takeda-Shitaka, Kazuhiko Kanou, Mitsuo Iwadate, Daisuke Takaya, Akio Hosoi, Kazuhiro Ohta, and Hideaki Umeyama* School of Pharmacy, Kitasato University, Tokyo, Japan ABSTRACT During Critical Assessment of Protein Structure Prediction (CASP7, Pacific Grove, CA, 2006), fams-ace was entered in the 3D coordinate prediction category as a human expert group. The procedure can be summarized by the following three steps. (1) All the server models were refined and rebuilt utilizing our homology modeling method. (2) Representative structures were selected from each server, according to a model quality evaluation, based on a 3D1D profile score (like Verify3D). (3) The top five models were selected and submitted in the order of the consensus-based score (like 3D-Jury). Fams-ace is a fully automated server and does not require human intervention. In this article, we introduce the methodology of fams-ace and discuss the successes and failures of this approach during CASP7. In addition, we discuss possible improvements for the next CASP. Proteins 2007; 69(Suppl 8): VC 2007 Wiley-Liss, Inc. Key words: TBM template based modeling; comparative modeling; homology modeling; protein structure prediction; CHIMERA; FAMS; SKE-CHIMERA; quality assessment; CASP. INTRODUCTION During the sixth round of the Critical Assessment of Protein Structure Prediction (CASP6), our SKE-CHIMERA team successfully predicted targets, particularly fold recognition/homologous (FR/H) targets. 1 At that time, an obvious flaw when selecting the best model among the many constructed models was noted. In the SKE-CHIMERA method, human intervention based on visual inspection and use of biological information is the most important factor. However, the performance level of human intervention was inconsistent and this was reflected in the CASP6 results. Therefore, there is a need for a technique that does not rely upon human intervention to select models which are closest to native structures among the decoy sets. Many scoring functions for evaluating protein structure are founded on knowledge-based potentials, 2 clustering methods, 3 structural energies using molecular mechanics force fields, 4 and structure-based sequence profiles (e.g. Verify3D, 5,6 Inbgu, 7 3D-PSSM, 8 ProQ 9 ). These scoring functions are used to assess model quality and ultimately select the best model among a set of several models. During CAFASP4, 10 the Model Quality Assessment Programs (MQAPs) category 11 assessed the accuracy based on evaluations using predicted models produced by participating CAFASP4 servers. Several scoring functions of MQAP were able to identify correct and incorrect protein models and consistently selected the best models among the candidate models. According to the report from Daniel Fisher, 12 the CAFASP4 organizer, the best-performing MQAPs were Verify3D, Victor/FRST, 13 and MODCHECK. 14 Verify3D methodology uses 18 (6 3 3) discrete environmental classes for each amino acid residue and utilizes the buried area, the fraction of polar area, and secondary structure for assessment. The side-chain environments of the amino acid residues were grouped into six classes according to the buried area and fraction Genki Terashi and Mayuko Takeda-Shitaka contributed equally to this work. The authors state no conflict of interest. *Correspondence to: Hideaki Umeyama, Department of Biomolecular Design, School of Pharmacy, Kitasato University, Shirokane, Minato-ku, Tokyo , Japan. umeyamah@pharm.kitasato-u.ac.jp Received 28 February 2007; Revised 13 August 2007; Accepted 16 August 2007 Published online 25 September 2007 in Wiley InterScience ( DOI: /prot PROTEINS VC 2007 WILEY-LISS, INC.
2 Combined Method After Remodeling of polar area. The secondary structures were separated into three classes. A score for each class was calculated using a log-odds scoring function from the dataset of experimental protein structures. Since it is a simple but powerful program for predicting model accuracy, Verify3D is widely used by human predictors when selecting the best model from many candidate models. In the MQAPs category of CAFASP4, Verify3D performed well in homology modeling and for fold-recognition targets. Victor/ FRST combines four knowledge-based potentials (pairwise, solvation, hydrogen bonds, and torsion angle potentials) and performed consistently well, especially for comparative modeling (CM) targets. MODCHECK is based on threading potentials (pairwise and solvation potentials) and calculates the quality score by summing the pairwise and solvation Z-scores, which are obtained by extensive sequence shuffling trials. The methods mentioned earlier consider only the quality of the target model. In contrast, other research groups use consensus methods (e.g. Pcons, 15 3D-Jury 16 ). Pcons, a neural-networkbased consensus method, combines the confidence score reported by each server and the similarity between models or templates. The 3D-Jury technique is a fully automated protein structure metaprediction system that is efficient in the event of a high correlation between the accuracy of the model and the confidence score based on the similarity between models. All server-generated models are compared by a similarity score. The similarity score of a model pair is equal to the number of C-alpha atom pairs that are within 3.5 Å after superimposition. The confidence score of a model is equal to the sum of the similarity scores for the considered model pairs divided by the number of considered pairs plus one. Moreover, 3D-Jury uses two modes to calculate confidence scores. The best-model-mode selects a single model, with the best similarity score from each server in the consensus score calculation. In contrast, the all-models-mode includes all the server models. Note that consensus methods require various candidate models from the server for comparison and for calculating the consensus value. Thus, in the CASP7 contest, 17 we combined two different scoring functions as a metaselector team: (1) the model quality score based on classification of the sidechain environment for each residue and (2) the consensus score. The model quality score, based on an algorithm inspired by Verify3D, was used as a filter to select a representative from the submitted models in each server. In the fams-ace method, we used an evaluation program, CIRCLE, which is based on knowledge-based potential of the side-chain packing. As mentioned earlier, the Verify3D algorithm uses 18 environmental classes for each amino acid residue. For the purpose of this study, the Verify3D classification was too discrete to analyze side-chain environments. Therefore, the classification of the discontinuities of side-chain environments was improved by increasing the number of classes from 18 to ( ) 5 136,350 and also by using a Gaussian filter. The consensus score, corresponding to 3D-Jury (single-model-mode), was then used for the final selection of the best model. In order to exclude the influence of unstable models when calculating the consensus score, only the model with the best model quality score assigned by CIRCLE from each server was used. Since the consensus method uses only C-alpha coordinates, the quality of side-chain packing was not considered. METHODS The fams-ace method is composed of four steps as illustrated in Figure 1 and is discussed in the following sections. Rebuilding and refinement of server models 3D models and alignments submitted by automatic servers were obtained from the CASP7 website. 17 The obtained 3D models and alignments were rebuilt to full atom three-dimensional models with our fully automatic modeling system (FAMS). Homology modeling was performed using each model as the reference protein (Fig. 1). Detailed information on the FAMS process can be obtained in a previously published article from our laboratory. 18 Short contacts were removed by optimization of the main-chain coordinates, using simulated annealing of the main chain with the conservation of side-chain conformation for each residue. Side-chain atom coordinates were optimized by iterative cycles of side-chain generation and main-chain optimization. The missing regions and discontinuities of the main chain were constructed by FAMS, using a loop search process to obtain an energetically stable structure. Thus, the aforementioned FAMS modeling is an essential step to evaluate models from the viewpoint of energy. If the coordinates of a side chain or a main chain have serious energy errors (e.g., short contacts, unnatural chiral center, or torsion angles), the comparisons between models will not be reasonably performed in the evaluation step. In this step, our scoring method uses the environment of the side chain that is described by the fraction of buried area and fraction of area which is covered by the polar atoms. Consequently, even if the coordinates of the main chain is close to the native, a model which has many short contacts in the side chains will be rejected from the selection. In addition, in the modelselection step (Selection of the Five Best Models by Using Consensus Methods section), the final models are selected according to the consensus value from a comparison of coordinates among models. Therefore, unnatural coordinates of the main chain in each server will cause an error when the consensus value is calculated. Assessment of target difficulty The next step evaluates the feasibility of the model construction with template protein coordinates from the DOI /prot PROTEINS 99
3 G. Terashi et al. Figure 1 A flowchart illustrating the key steps of the fams-ace method and describing the CASP7 original server containing five models from servers from 1 to M, the models refined by FAMS, model selection among five models by CIRCLE after estimation of the difficulty of target and the secondary structure correspondence, and consensus selection of the final five models by 3D-Jury score. experiments. fams-ace employs one of the two scoring functions depending on the target difficulty. In order to predict the target difficulty, the support vector machines (SVMs) program 19 was used. Classifications based on SVMs have been used for several applications in bioinformatics and computational biology. 20 The training datasets consisted of CASP6 targets classified as CM targets (as positive) or not (fold recognition or new fold, as negative). Score and homology (%) values of the best alignment resulting from the SPARKS2 program 21 were used as vectors for SVM classification. SPARKS2 performs alignments using a knowledge-based energy score with sequence-profile and secondary structure information. For CM in CASP6, SPARKS2 performed well in recognizing the best or near-best template. Thus, we assumed that if SPARKS2 cannot find reasonable alignments, the target must be truly difficult. The sensitivity of classification, defined as (TP/(TP1FN)), was 93.0% (40/(40 1 3)) in CASP6 targets. TP, FP, TN, and FN denote true positive, false positive, true negative, and false negative, respectively. The specificity (TN/(TN1FP)) was 93.6% (44/(4413)). The predicted classification for target difficulty was then used to select one of the two scoring functions in the next section. Selection of a representative model from each server model by structure evaluation The top five models submitted by each server were further selected by evaluating their free energy with standard techniques. The purpose of the evaluation is to remove the unstable models for the final model selection step. The best representative model for each server was selected by CIRCLE. CIRCLE considers two terms for the model quality: (1) model quality calculated from the side-chain environment of each residue and (2) similarity between the secondary structure propensities predicted for an amino acid sequence by PSI-PRED 22 and the secondary structures of the three-dimensional model. The side-chain environments for each residue were determined from three parameters: (1) the fraction of the molecular surface area of the side chains exposed to water or covered by the polar atoms, (2) the fraction of the sidechain area buried by any other atoms, and (3) the second- 100 PROTEINS DOI /prot
4 Combined Method After Remodeling Figure 2 Matrix data used in the scoring function of CIRCLE. Sets (a1, b1) show the frequency of residues LYS and LEU observed in the PDB dataset, respectively, as described in the Eq. (2). Sets (a2, b2) show converted data from a1, b1 matrices by using Gaussian weight as described in the Eq. (3). Sets (a3, b3) show scoring matrices of LYS and LEU according to the side-chain environments as described in the Eq. (5). ary structures. The values of (1) and (2) were categorized to 100 classes. Moreover, as shown in Figure 2, the sum of (1) and (2) is always above 1.00 (100%). Therefore, the number of combinations of (1) and (2) was 5050 [( )/ ]. The secondary structures were described using a sliding window around the residue to classify the secondary structures in detail. This was especially effective in CM targets, as the window sizes were three and one amino acid residues having the assignment of helix, sheet, or coil in CM targets and non-cm targets, respectively. For example, the secondary structure classified as CCC represents the residue existing in the center of the coil region traversing through three residues. Therefore, for CM targets (window size 3), 27 classes were used to classify the secondary structures. Finally, for CM and non-cm targets, the side-chain environments of amino acid residues were classified to 136,350 ( ) and 15,150 ( ) classes, respectively. These classifications can describe side-chain environments in more detail than Verify3D (18 classes) but also causes a shortage in the amount of information for each class. To solve this shortage problem, a Gaussian filter was applied instead of using the classification directly. The score of model quality is calculated by the following function.! p 2 b 2 wðpolar; buried; p; bþ ¼exp 2 3 r 2 exp polar 2 3 r 2 buried N ðaajss; polar; buriedþ ð1þ ð2þ N 0 ðaajss; polar; buriedþ ¼ X X wðpolar; buried; m; nþ m n 3 NðAAjss; polar þ m; buried þ nþ ð3þ PðAAjss; polar; buriedþ ¼ N 0 ðaajss; polar; buriedþ P N 0 ðaajss; polar; buriedþ aa SCOREðAAjenvÞ ¼log PðAAjenvÞ PðAAÞ PðAAjss; polar; buriedþ ¼ log PðAAÞ ð4þ ð5þ DOI /prot PROTEINS 101
5 G. Terashi et al. Table I Performance of Refinement by FAMS Original a FAMS model b All Selected All Selected 2,908,231 15,725 2,949,313 (11.4) 15,779 (10.3) Model length c 2,441,472 13,337 2,552,207 (14.5) 13,790 (13.4) Residues in most favored regions d 3,59, ,505 (28.2) 1636 (29.5) Residues in additional allowed regions d 67, ,253 (233.0) 190 (245.7) Residues in generously allowed regions d 40, ,348 (244.3) 163 (229.4) Residues in disallowed regions d 18, ,265 (240.3) 102 (251.9) No of unnatural chiral centers e (298.8) 2 (299.8) No. of short contacts f (0 2.0 Š) 411, (298.9) 4 (299.8) No. of short contacts f (0 2.2 Š) 661, ,880 (297.9) 73 (298.1) No. of short contacts f (0 2.4 Š) 1,559, ,346 (290.3) 814 (291.3) No. of short contacts f (0 2.6 Š) 801, ,797 (15.3) 5217 (11.6) g Side-chain accuracy of v 1 423, ,743 (14.2) 2767 (11.8) h Side-chain accuracy of v (20.3) (20.3) GDT_TS i Summary of model quality from PROCHECK, accuracy of side chain, and GDT_TS in all CASP7 targets. The columns of all represents the all server models. The columns of selected represents the models which were selected from server models as the best model by fams-ace. The value in parentheses (in percentages) is the rate of increase from original to FAMS model. a Original models submitted to CASP7 by automatic servers within 48 h of the target sequence being given. b Models refined by FAMS, the original modeling program. c Total amino acid residues existing in models. d Estimation of Ramachandran plot regions from PROCHECK. For model quality estimation, the number of residues in the most favored regions near the torsion angles around the main-chain C-alpha is a good indicator of the stereochemical quality of a protein structure. If the most favored regions are experimentally the ones with the best torsion, then additional and generously allowed regions are better and normal, and disallowed regions are concomitantly worse. e Number of experimentally unnatural chiral centers surrounding the main-chain C-alpha atom. f Number of short contacts is defined as the number of any pair of nonbonded atoms within the distance indicated. g The number of residues which have both C-alpha atoms within 3.5 Å and side chain with v 1 angles within 408 of native structures. h The number of residues which have C-alpha atoms within 3.5 Å and both v 1 and v 2 must be within 40 of native. i GDT_TS were calculated from 114 domains of native structure which were obtained from CASP7. As shown in Eq. (5), a general approach of our scoring function, which generates the score of each amino acid residue (AA) according to the environment (env), is the use of a statistical potential. Differences between CIRCLE and Verify3D are found in Eqs. (1) (4). Here, w(polar, buried,p,b) represents the Gaussian weight in the particular environment (polar and buried). The letters of b and p designate buried ratio and polar ratio axes, respectively, and describe the distance from the center location (polar and buried) of each Gaussian weight function. The standard deviations (r polar, r buried ) according to the environment of the side chain (fraction of polar area, buried area) in Eq. (1) are calculated from the virtual mutation dataset. We analyzed the variation of environments by considering the mutated amino acid residues in a particular position. A total of 504,716 datasets were constructed by using homology modeling methods. N ðaajss; polar; buriedþ is the number of residues AA observed in environment env from the PDB dataset [Fig. 2(a1,b1)]. As mentioned earlier, since the classification of environments is detailed, NðAAjss; polar; buriedþ could not be used directly to calculate P(AA env) in Eq. (5). Therefore, the Gaussian weight was used as a Gaussian filter in Eq. (3) to generate smoothed data from the raw data N ðaajss; polar; buriedþ. In Eq. (5), P(AA env) is the probability of the amino acid residue AA in the environment env. aa is a variable for the 20 amino acid residues. P(AA) is the probability of finding residue AA in all the amino acid residues. The P(AA env) contains the Gaussian weight function corresponding to Eq. (1) in order to consider the variability of the frequency in the side-chain environment. Examples of the scoring matrixes of hydrophobic (Leucine) and hydrophilic (Lysine) residues are shown in Figure 2. The score corresponding to the side-chain environments, Eq. (5), cannot consider the secondary structure similarity, although PSI-PRED, which predicts secondary structures, achieved an average Q 3 score of about 80%. Therefore, we added the term of secondary structure similarity between the model and prediction to CIRCLE. The measure of similarity in secondary structures is based on the following scoring function. Pði; jjconf Þ SSscoreði; jþ ¼log ð6þ P pre ðijconfþp m ðjjconfþ i represents the secondary structure of target sequence predicted by PSI-PRED. j is the secondary structure observed in the model. conf is one value of the confidences (0, 1, 2,..., 9) calculated by PSI-PRED. P pre (i conf) is the probability of the secondary structure i, which was predicted by PSI-PRED presence when a value of confidence is conf. P m (j conf) is the probability of the secondary structure j observed in the model when the value of confidence is conf. P(i, j conf) is the probability of the secondary structures i and j, mentioned earlier corresponding to conf. This similarity of secondary structures is a useful measure espe- 102 PROTEINS DOI /prot
6 Combined Method After Remodeling cially for difficult targets, that is, when near native structures do not exist or the similarity of secondary structures can find good local structures even though the structure folding is entirely different. According to the target difficulty rating predicted by SVMs, the total score is calculated as TotalScore 8 >< ¼ P >: length P n length n ð0:35 3 SSscore þ 3D1Dscore CM Þ n CM ð0:75 3 SSscore þ 3D1Dscore FRNF Þ n FR or NF ð7þ The coefficients for the measure of similarity of secondary structures (SSscore) were optimized from CASP6 targets. The similarity of the secondary structures is emphasized in difficult targets. Selection of the five best models by using consensus methods All models evaluated as the best for each server were compared using the consensus method in the 3D-Jury system. We modified the confidence score of 3D-Jury as consensusðm a Þ¼ P N i;a6¼i simðm a; M i Þ 1 þ N ð8þ where M a is the representative model of server a. N is the number of servers. sim(m a,m i ) is the similarity score from MAXSUB 23 between models M a and M i, which equal the number of C-alpha atom pairs that are within 3.5 Å. If the sim(m a,m i ) is below 40, it is set to zero (according to the 3D-Jury protocols). The best five models, according to the consensus value, were selected for submission to CASP7. No human intervention occurred during the procedure, and the fams-ace process did not consider server name or server performance. A server with exceptional performance was not rated any higher before commencement of our model selection process. RESULTS AND DISCUSSION Do modifications on server models improve model quality? Table I displays the performance of FAMS refinement in view of (1) stereochemical accuracy, (2) the accuracy of side chains (v 1 and v 112 ), and (3) overall similarity of C-alpha atoms positions (GDT_TS). 24 The side-chain accuracy of v 1 equals the number of residues which have both C-alpha atoms within 3.5 Å and side chains with v 1 angles within 408 of native structures. v 112 equals the number of residues which have C-alpha atoms within 3.5 Å, and both v 1 and v 2 must be within 408 of native. For Table II Top 10 of Input Servers According to the Contribution No. fams-ace No. fams-ace (improved) 22 Zhang-Server 20 Zhang-Server 15 MetaTasser 13 ROBETTA 5 beautshot 9 Pmodeller6 4 SP3 5 Pcons6 3 keasar-server 5 MetaTasser 3 PROTINFO 4 PROTINFO-AB 3 HHpred3 3 PROTINFO 3 HHpred2 3 HHpred2 3 FOLDpro 3 FOLDpro 3 BayesHH 3 FAMSD 2 shub 3 BayesHH 2 beautshotbase 2 SP4 2 UNI-EID_expm 2 FAMS 2 SPARKS2 2 CIRCLE 2 SP4 2 ABIpro 2 Pcons6 1 keasar-server 2 FAMSD 1 forecast-s 2 FAMS 1 UNI-EID_sfst 2 3Dpro 1 UNI-EID_expm 1 nfold 1 SP3 1 mgen-3d 1 SAM_T06_server 1 karypis.srv.2 1 RAPTOR-ACE 1 forecast-s 1 RAPTOR 1 SAM-T99 1 Phyre-2 1 ROKKY 1 Ma-OPUS-server 1 ROBETTA 1 HHpred3 1 RAPTOR-ACE 1 GeneSilicoMetaServer 1 RAPTOR 1 FORTE1 1 NN_PUT_lab 1 CaspIta-FOX 1 Ma-OPUS-server2 1 Bilab-ENABLE 1 Huber-Torda-Server 1 3Dpro 1 FUGUE The number of models selected as the best by fams-ace and fams-ace (improved), respectively, for each CASP7 target. For example, Zhang-Server obtained the best model 22 and 20 times in the fams-ace and fams-ace (improved) team, respectively. Moreover, MetaTasser and ROBETTA obtained the best model 15 and 13 times in the fams-ace and fams-ace (improved) team, respectively. the 25,615 server models obtained from the CASP7 website (represented as all ) and TS1 models of fams-ace (represented as selected ), we compared the models rebuilt by FAMS and the original models. PRO- CHECK 25,26 was used to assess the performance of the models from a stereochemical and geometric point of view. In almost all categories of PROCHECK, the server models which were rebuilt by FAMS had improved model quality. In summary of all server models, Table I shows that 41,000 amino acid residues were newly constructed, and residues in the favored and disallowed regions of the Ramachandran plot improved by about 110,000 and 18,000 residues, respectively. In addition, the number of wrong short contacts within 2.6 Å improved by 1.41 million. In side-chain accuracy, v 1 and v 112 increased 5.25% and 4.16%, respectively. Moreover, improvements of the stereochemical accuracy and accuracy of side chains were observed in selected models. Thus, these results show that FAMS remodeling was effective in improving model quality from a stereochemi- DOI /prot PROTEINS 103
7 G. Terashi et al. Figure 3 (a) Results of fams-ace (solid and broken horizontal lines for averaged GDT_TS and Z-score, respectively) in TBM targets showing the exclusion of one server in CASP7. The horizontal axis represents the excluded server. As several native structures are unpublished, GDT_TS of the original models from the CASP7 website were used. (b) Relative proportions of GDT_TS to best GDT_TS for each target. (c) Comparison of GDT_TS values between fams-ace and fams-ace (improved). High accuracy template-based modeling (HA-TBM) targets, template-based modeling (TBM) targets except for HA and FM targets categories are described by circle, square, and cross, respectively. (d) Comparison of the GDT_TS values between fams-ace and fams-ace (improved). HA, TBM except for HA, and FM categories are described by circle, square, and cross, respectively. cal and geometric basis and accuracy of side chains. On the other hand, improvements of GDT_TS were not observed in the rebuilt models despite an advance in model quality of the side chains, and stereochemical and geometric basis. However, the decrease of GDT_TS was 0.3%. These results indicate that (1) improvement of the side chains do not directly correspond to an improvement of GDT_TS and (2) FAMS can improve side-chain accuracy while keeping the folds of the model intact. The major purposes of the remodeling step are to normalize and improve the model quality for selecting representative models by CIRCLE. When comparing representative models and the first ranked models (TS1 models) for each server, the summed v 1, v 112 and GDT_TS of representative models were 1.48%, 1.32%, and 1.14% better than the first ranked models, respectively. Therefore, FAMS achieved the purpose of the remodeling step. In the future, FAMS should be improved to obtain better side chain and GDT_TS accuracies. Nevertheless, improvement of the three-dimensional model quality is very useful in selecting the best model. The contribution of the individual input servers In the fams-ace protocol, about 250 submitted models in each target were remodeled by FAMS, and then a representative from the five models of each server was selected according to CIRCLE. In the selection of representative models, TS1 and AL1 models comprised 45.7% of all selected models. This indicates that nearly half of the representative models were equal to the best-ranked models selected by the server. As mentioned previously, the representative models were superior to first ranked models. The contributions of each server are shown in Table II. About 23.2% of the final best models chosen by fams-ace 104 PROTEINS DOI /prot
8 Combined Method After Remodeling Table III Total Values of Z-score, GDT_TS, and the Accuracy of Side-Chain fams-ace a fams-ace (improved) b 3D-Jury c CIRCLE-FAMS d Z score HA-TBM (26.0) (12.3) (214.0) TBM (14.3) (27.1) (23.3) FM (1225.2) 2.16 (267.6) (1226.4) ALL (119.8) (211.2) (113.0) GDT_TS HA-TBM (21.0) (10.3) (21.3) TBM (10.4) (20.7) (21.1) FM (115.3) (25.4) (113.6) ALL (11.2) (21.0) (20.2) v 1 HA-TBM (15.2) 2307 (14.2) 2456 (111.0) TBM (15.5) 5154 (21.1) 5679 (19.0) FM (128.2) 174 (27.4) 231 (122.9) ALL (16.2) 5256 (21.1) 5821 (19.5) v 112 HA-TBM (14.1) 1292 (15.2) 1417 (115.4) TBM (19.6) 2751 (21.1) 3150 (113.2) FM (120.6) 75 (222.7) 113 (116.5) ALL (17.3) 2792 (21.7) 3216 (113.2) a CASP7 team using fams-ace. b Virtual team fams-ace (improved) mentioned in Result and Discussion. c Virtual team 3D-Jury (consensus method) using only TS1 models. d CASP7 team using CIRCLE for final model selection. HA-TBM (28 domains), TBM (108 domains), and FM (19 domains) denote high accuracy template based modeling targets, template-based modeling targets including HA, and free modeling targets, respectively. ALL (123 domains) means the total score for all the CASP7 targets. GDT_TS and the native structures for calculating v 1 and v 112 were obtained from CASP7 web site. The value in parentheses (in percentages) is the rate of increase from fams-ace. were refined Zhang-Server models which indicate that refined Zhang-Server models have a large consensus value compared with other representative models. Additionally, 15.8% of the best models selected by fams-ace were refined MetaTasser models. Furthermore, in order to discuss the contribution of each server from a standpoint of model accuracy and influence when calculating the consensus value, we determined the average GDT_TS of fams-ace for TBM targets (total of 108 domains) when a particular server was excluded [Fig. 3(a)]. The contribution of MetaTasser and Zhang-Server is larger than all other servers, as shown in the large decrease of the average GDT_TS value. This indicates that if MetaTasser and Zhang-Server had not participated in CASP7, fams-ace could not have achieved successful or real results. Thus, models from servers with exceptional performance (Zhang-Server and MetaTasser) are essential for a good performance from the fams-ace method. In addition, an increased in performance would occur if fams-ace had thinned out particular servers (e.g., SP3, beautshot), which showed a larger average GDT_TS than fams-ace (represented as horizontal line) in Figure 3(a). In other words, if these servers had not participated in CASP7, the GDT_TS of fams-ace would have improved. However, the average GDT_TS of fams-ace was indifferent to the exclusion of specific servers and could not equal the Zhang-server (67.86), but came near at Example T0371_D2 (template-based modeling target) T0371_D2 is the second domain (residues ) of the target T0371 (now PDB code: 2HX1). In this target, fams-ace submitted the best model in GDT_TS (72.1) and AL0 (82.64) amongst all of the CASP7 teams. The model presented by fams-ace was constructed from a Zhang- Server TS4 model (GDT_TS: 70.66, AL_0: 80.17). There were few differences between the fams-ace TS1 model and the Zhang-Server TS4 model (rms ). In addition, there were no differences in side-chain accuracies (v 1 and v 112 ). This example indicates that fams-ace could select a three-dimensional structure similar to the native structure. For two domains (T0371_D2 and T0321_D2), fams-ace submitted better models than the best-ranked server models. fams-ace considered only the consensus of the representative models instead of the quality of model. Consequently, fams-ace did not select outstanding models in both positive and negative viewpoints. What went right? What went wrong? The fams-ace method selected good models which were relatively high quality from the stand point of over- DOI /prot PROTEINS 105
9 G. Terashi et al. Figure 4 Two failed examples of the refinement step by FAMS. (a, b) Models of T0356. (c, d) Models of T0283. (a) and (c) are original server models of T0356 and T0283, respectively. (b) and (d) are the models refined by FAMS. The regions which were constructed by FAMS are circled. There exists extremely high tension in the main-chain structures of these regions. all similarity with native structure (GDT_TS). The success of fams-ace can be explained by three factors. (1) fams-ace can cope with the situation of several good models from server teams, especially from Zhang-server and MetaTasser [see Fig. 3(a)]. (2) Since the consensus method was used for final model selection, fams-ace did not select outstanding models or appropriate models in positive and negative outcomes. For 72.2 % (78/108) of TBM targets, fams-ace could select good models which have a GDT_TS within 90% of the highest GDT_TS among all server models [Fig. 3(b)]. (3) Selection of representative models with improved GDT_TS. The representative models from each server were selected according to the evaluation of the side-chain environments. Table III shows a comparison summary of fams-ace and the virtual server (described as 3D-Jury ), which uses 3D- Jury method in the final model selection from TS1 models of servers instead of representative models. Except for high accuracy template-based modeling targets (HA- TBM), fams-ace submitted better models than 3D-Jury. However, several problems in the fams-ace method were noted. For example, in the calculation of the consensus value, the similarity scores from MAXSUB are set to zero if the similarity score is below 40. Many models were judged to have insignificant similarities although the models had weak similarities. In this case, the consensus method fails during the final model selection step. Therefore, fams-ace could not select the near best quality models consistently for difficult targets [see Fig. 3(c)]. During the refinement of server models, multidomain problems which caused errors were encountered. An example can be seen with the T0356 in which a certain server presented a model which was divided by domains. FAMS interpreted the division to be a deletion region in the sequence and coordinates. Therefore, FAMS constructed breaking regions to connect the separated domains. The main-chain structures of the connected regions broke in the process of the difficult reconstruction [Fig. 4(a,b)]. Though unnatural models were usually rejected in the evaluation step by CIRCLE, in a few cases, 106 PROTEINS DOI /prot
10 Combined Method After Remodeling these unnatural models were selected as the best model by fams-ace. This tendency appeared to be associated with difficult targets. In the worst example, T0283, GDT_TS of fams-ace was only 25.52, in contrast to the highest GDT_TS of the server which was [see Fig. 4(c,d)]. Lastly, a major problem with fams-ace is the inability to present the models with accurate side-chain conformations, although GDT_TS is relatively high. Since fams-ace selects the best models according to the consensus value, this problem is unavoidable. Simple improvement of fams-ace For the reasons which were mentioned in the earlier section, we investigated the possibility of using CIRCLE instead of the consensus method. CIRCLE performs well by selecting according to the side-chain environments in the final model selection step. There are three steps in the new fams-ace process. (1) Rebuild and refine server models. (2) Select models according to the Z-score of the consensus score. The similarity score from MAXSUB did not change in the difficult targets (FR and NF) predicted by SVM. The thresholds of the model selection were optimized by CASP6 targets using the category classification (CM or noncm) reported by the CASP6 organizer. (3) Select final models using CIRCLE. Hereinafter, this collection of steps is denoted as fams-ace (improved). The greatest difference between fams-ace and famsace (improved) is the final model selection step. famsace and fams-ace (improved) use consensus methods by modified 3D-Jury and model evaluation methods by CIRCLE, respectively. The goal of fams-ace (improved) is to obtain models which have a high quality in both GDT_TS and side-chain accuracies. The results of famsace (improved) are shown in Table III and Figure 3(b,d). Except for HA-TBM targets, Z-score of GDT_TS and GDT_TS improved in comparison with fams-ace. There was an obvious improvement in side chains in all categories, especially in free modeling (FM) targets and difficult TBM targets (highest GDT_TS < 50). Thus, these results suggest that this improvement provides a method for obtaining good models in both GDT_TS and quality of side chains. However, the problem of a multidomain target still remains. Improving the refinement process and assignment of domains is planned for the future. Moreover, we propose developing a new system based on the improved fams-ace method to generate results superior to the best server models. ACKNOWLEDGMENTS We thank CASP7 organizers and assessors, and experimentalists who supplied targets for CASP7. And particular thanks to all server teams in CASP7. REFERENCES 1. Takeda-Shitaka M, Terashi G, Takaya D, Kanou K, Iwadate M, Umeyama H. Protein structure prediction in CASP6 using CHI- MERA and FAMS. Proteins 2005;61(Suppl 7): Sippl MJ. Knowledge-based potentials for proteins. Curr Opin Struct Biol 1995;5: Zhang Y, Skolnick J. SPICKER: a clustering approach to identify near-native protein folds. J Comput Chem 2004;25: Lee MR, Tsai J, Baker D, Kollman PA. Molecular dynamics in the endgame of protein structure prediction. J Mol Biol 2001;313: Luthy R, Bowie JU, Eisenberg D. Assessment of protein models with three-dimensional profiles. Nature 1992;356: Eisenberg D, Luthy R, Bowie JU. VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol 1997; 277: Fischer D. Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pac Symp Biocomp 2000;5: Kelley LA, MacCallum RM, Sternberg MJ. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000;299: Wallner B, Elofsson A. Can correct protein models be identified? Protein Sci 2003;12: Fischer D. Servers for protein structure prediction. Curr Opin Struct Biol 2006;16: Tosatto SC. The Victor/FRST function for model quality estimation. J Comput Biol 2005;12: Pettitt CS, McGuffin LJ, Jones DT. Improving sequence-based fold recognition by using 3D model quality assessment. Bioinformatics 2005;21: Lundstrom J, Rychlewski L, Bujnicki J, Elofsson A. Pcons: a neuralnetwork-based consensus predictor that improves fold recognition. Protein Sci 2001;10: Ginalski K, Elofsson A, Fischer D, Rychlewski L. 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics 2003;19: Ogata K, Umeyama H. An automatic homology modeling method consisting of database searches and simulated annealing. J Mol Graph Model 2000;18: , Anguita D, Boni A, Ridella S, Rivieccio F, Sterpi D. Theoretical and practical model selection methods for support vector classifiers. In: Wang L, editor. Support vector machines: theory and applications, Vol. 177; Berlin: Springer-Verlag; pp Park KJ, Gromiha MM, Horton P, Suwa M. Discrimination of outer membrane proteins using support vector machines. Bioinformatics 2005;21: Zhou H, Zhou Y. Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins 2004;55: Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999;292: Siew N, Elofsson A, Rychlewski L, Fischer D. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 2000;16: Zemla A, Venclovas C, Moult J, Fidelis K. Processing and analysis of CASP3 protein structure predictions. Proteins 1999;37(Suppl 3): Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PRO- CHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst 1993;26: Morris AL, MacArthur MW, Hutchinson EG, Thornton JM. Stereochemical quality of protein structure coordinates. Proteins 1992; 12: DOI /prot PROTEINS 107
Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU
Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO! Identification of Protein-model accuracy Why is it important? What is accuracy RMSD, fraction correct, Protein model correctness/quality
More informationProtein Structure Prediction
Protein Structure Prediction Michael Feig MMTSB/CTBP 2009 Summer Workshop From Sequence to Structure SEALGDTIVKNA Folding with All-Atom Models AAQAAAAQAAAAQAA All-atom MD in general not succesful for real
More informationProtein quality assessment
Protein quality assessment Speaker: Renzhi Cao Advisor: Dr. Jianlin Cheng Major: Computer Science May 17 th, 2013 1 Outline Introduction Paper1 Paper2 Paper3 Discussion and research plan Acknowledgement
More informationProtein Structure Prediction
Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on
More informationThe typical end scenario for those who try to predict protein
A method for evaluating the structural quality of protein models by using higher-order pairs scoring Gregory E. Sims and Sung-Hou Kim Berkeley Structural Genomics Center, Lawrence Berkeley National Laboratory,
More informationProgramme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues
Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback
More informationIdentification of correct regions in protein models using structural, alignment, and consensus information
Identification of correct regions in protein models using structural, alignment, and consensus information BJO RN WALLNER AND ARNE ELOFSSON Stockholm Bioinformatics Center, Stockholm University, SE-106
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationproteins CASP Progress Report Progress from CASP6 to CASP7 Andriy Kryshtafovych, 1 Krzysztof Fidelis, 1 and John Moult 2 *
proteins STRUCTURE O FUNCTION O BIOINFORMATICS CASP Progress Report Progress from CASP6 to CASP7 Andriy Kryshtafovych, 1 Krzysztof Fidelis, 1 and John Moult 2 * 1 Genome Center, University of California,
More informationTemplate-Based Modeling of Protein Structure
Template-Based Modeling of Protein Structure David Constant Biochemistry 218 December 11, 2011 Introduction. Much can be learned about the biology of a protein from its structure. Simply put, structure
More informationTASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6
PROTEINS: Structure, Function, and Bioinformatics Suppl 7:91 98 (2005) TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6 Yang Zhang, Adrian K. Arakaki, and Jeffrey
More informationTOUCHSTONE: A Unified Approach to Protein Structure Prediction
PROTEINS: Structure, Function, and Genetics 53:469 479 (2003) TOUCHSTONE: A Unified Approach to Protein Structure Prediction Jeffrey Skolnick, 1 * Yang Zhang, 1 Adrian K. Arakaki, 1 Andrzej Kolinski, 1,2
More information09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition
Sequence identity Structural similarity Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Fold recognition Sommersemester 2009 Peter Güntert Structural similarity X Sequence identity Non-uniform
More informationNumber sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence
Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the
More information1-D Predictions. Prediction of local features: Secondary structure & surface exposure
1-D Predictions Prediction of local features: Secondary structure & surface exposure 1 Learning Objectives After today s session you should be able to: Explain the meaning and usage of the following local
More informationCMPS 3110: Bioinformatics. Tertiary Structure Prediction
CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite
More informationProtein Modeling. Generating, Evaluating and Refining Protein Homology Models
Protein Modeling Generating, Evaluating and Refining Protein Homology Models Troy Wymore and Kristen Messinger Biomedical Initiatives Group Pittsburgh Supercomputing Center Homology Modeling of Proteins
More informationProcheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.
Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond
More informationBasics of protein structure
Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu
More informationHOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.
HOMOLOGY MODELING Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationStatistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics
Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia
More informationTemplate Free Protein Structure Modeling Jianlin Cheng, PhD
Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling
More informationIntroduction to Comparative Protein Modeling. Chapter 4 Part I
Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More informationproteins Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * INTRODUCTION
proteins STRUCTURE O FUNCTION O BIOINFORMATICS Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * 1 Department of Biological Sciences, College
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationTemplate-Based 3D Structure Prediction
Template-Based 3D Structure Prediction Sequence and Structure-based Template Detection and Alignment Issues The rate of new sequences is growing exponentially relative to the rate of protein structures
More informationAlpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University
Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and
More informationAnalysis and Prediction of Protein Structure (I)
Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng
More informationProtein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods
Cell communication channel Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu SEQUENCE STRUCTURE DNA Sequence Protein Sequence Protein Structure Protein structure ATGAAATTTGGAAACTTCCTTCTCACTTATCAGCCACCT...
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationProcessingandEvaluationofPredictionsinCASP4
PROTEINS: Structure, Function, and Genetics Suppl 5:13 21 (2001) DOI 10.1002/prot.10052 ProcessingandEvaluationofPredictionsinCASP4 AdamZemla, 1 ČeslovasVenclovas, 1 JohnMoult, 2 andkrzysztoffidelis 1
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationProtein Threading Based on Multiple Protein Structure Alignment
Protein Threading Based on Multiple Protein Structure lignment Tatsuya kutsu Kim Lan Sim takutsu@ims.u-tokyo.ac.jp klsim@ims.u-tokyo.ac.jp Human Genome Center, Institute of Medical Science, University
More informationFold assessment for comparative protein structure modeling
Fold assessment for comparative protein structure modeling FRANCISCO MELO 1 AND ANDREJ SALI 2,3,4 1 Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad
More informationProtein Structures: Experiments and Modeling. Patrice Koehl
Protein Structures: Experiments and Modeling Patrice Koehl Structural Bioinformatics: Proteins Proteins: Sources of Structure Information Proteins: Homology Modeling Proteins: Ab initio prediction Proteins:
More informationProtein Structure Determination
Protein Structure Determination Given a protein sequence, determine its 3D structure 1 MIKLGIVMDP IANINIKKDS SFAMLLEAQR RGYELHYMEM GDLYLINGEA 51 RAHTRTLNVK QNYEEWFSFV GEQDLPLADL DVILMRKDPP FDTEFIYATY 101
More informationSUPPLEMENTARY MATERIALS
SUPPLEMENTARY MATERIALS Enhanced Recognition of Transmembrane Protein Domains with Prediction-based Structural Profiles Baoqiang Cao, Aleksey Porollo, Rafal Adamczak, Mark Jarrell and Jaroslaw Meller Contact:
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationProtein Structure Prediction, Engineering & Design CHEM 430
Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment
More informationTHE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION
THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*
More informationContact map guided ab initio structure prediction
Contact map guided ab initio structure prediction S M Golam Mortuza Postdoctoral Research Fellow I-TASSER Workshop 2017 North Carolina A&T State University, Greensboro, NC Outline Ab initio structure prediction:
More informationRMS/Coverage Graphs: A Qualitative Method for Comparing Three-Dimensional Protein Structure Predictions
PROTEINS: Structure, Function, and Genetics Suppl 3:15 21 (1999) RMS/Coverage Graphs: A Qualitative Method for Comparing Three-Dimensional Protein Structure Predictions Tim J.P. Hubbard* Sanger Centre,
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationMotif Prediction in Amino Acid Interaction Networks
Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions
More informationALL LECTURES IN SB Introduction
1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL
More informationSteps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure
Structure prediction, fold recognition and homology modelling Marjolein Thunnissen Lund September 2012 Steps in protein modelling 3-D structure known Comparative Modelling Sequence of interest Similarity
More informationDetection of Protein Binding Sites II
Detection of Protein Binding Sites II Goal: Given a protein structure, predict where a ligand might bind Thomas Funkhouser Princeton University CS597A, Fall 2007 1hld Geometric, chemical, evolutionary
More informationBioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter
Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction Institute of Bioinformatics Johannes Kepler University, Linz, Austria Chapter 4 Protein Secondary
More information7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy
7.91 Amy Keating Solving structures using X-ray crystallography & NMR spectroscopy How are X-ray crystal structures determined? 1. Grow crystals - structure determination by X-ray crystallography relies
More information114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009
114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome
More informationPhysiochemical Properties of Residues
Physiochemical Properties of Residues Various Sources C N Cα R Slide 1 Conformational Propensities Conformational Propensity is the frequency in which a residue adopts a given conformation (in a polypeptide)
More informationproteins Prediction Methods and Reports
proteins STRUCTURE O FUNCTION O BIOINFORMATICS Prediction Methods and Reports Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based
More informationTemplate Free Protein Structure Modeling Jianlin Cheng, PhD
Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html
More informationBioinformatics: Secondary Structure Prediction
Bioinformatics: Secondary Structure Prediction Prof. David Jones d.jones@cs.ucl.ac.uk LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem? Entries
More informationSyllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)
Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural
More informationMultiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling
63:644 661 (2006) Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling Brajesh K. Rai and András Fiser* Department of Biochemistry
More informationproteins High Accuracy Assessment Assessment of CASP7 predictions in the high accuracy template-based modeling category
proteins STRUCTURE O FUNCTION O BIOINFORMATICS High Accuracy Assessment Assessment of CASP7 predictions in the high accuracy template-based modeling category Randy J. Read* and Gayatri Chavali Department
More informationMeasuring quaternary structure similarity using global versus local measures.
Supplementary Figure 1 Measuring quaternary structure similarity using global versus local measures. (a) Structural similarity of two protein complexes can be inferred from a global superposition, which
More informationproteins Comparison of structure-based and threading-based approaches to protein functional annotation Michal Brylinski, and Jeffrey Skolnick*
proteins STRUCTURE O FUNCTION O BIOINFORMATICS Comparison of structure-based and threading-based approaches to protein functional annotation Michal Brylinski, and Jeffrey Skolnick* Center for the Study
More informationA new prediction strategy for long local protein. structures using an original description
Author manuscript, published in "Proteins Structure Function and Bioinformatics 2009;76(3):570-87" DOI : 10.1002/prot.22370 A new prediction strategy for long local protein structures using an original
More informationModeling for 3D structure prediction
Modeling for 3D structure prediction What is a predicted structure? A structure that is constructed using as the sole source of information data obtained from computer based data-mining. However, mixing
More informationProtein Structure Prediction
Protein Structure Prediction Michael Feig MMTSB/CTBP 2006 Summer Workshop From Sequence to Structure SEALGDTIVKNA Ab initio Structure Prediction Protocol Amino Acid Sequence Conformational Sampling to
More informationReceived: 04 April 2006 Accepted: 25 July 2006
BMC Bioinformatics BioMed Central Methodology article Improved alignment quality by combining evolutionary information, predicted secondary structure and self-organizing maps Tomas Ohlson 1, Varun Aggarwal
More informationAs of December 30, 2003, 23,000 solved protein structures
The protein structure prediction problem could be solved using the current PDB library Yang Zhang and Jeffrey Skolnick* Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington Street,
More informationIT og Sundhed 2010/11
IT og Sundhed 2010/11 Sequence based predictors. Secondary structure and surface accessibility Bent Petersen 13 January 2011 1 NetSurfP Real Value Solvent Accessibility predictions with amino acid associated
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Protein Structure Detection Methods October 30, 2017 Comparative Modeling Comparative modeling is modeling of the unknown based on comparison to what is known In the context of modeling or computing
More informationPrediction and refinement of NMR structures from sparse experimental data
Prediction and refinement of NMR structures from sparse experimental data Jeff Skolnick Director Center for the Study of Systems Biology School of Biology Georgia Institute of Technology Overview of talk
More informationPROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS
Int. J. LifeSc. Bt & Pharm. Res. 2012 Kaladhar, 2012 Research Paper ISSN 2250-3137 www.ijlbpr.com Vol.1, Issue. 1, January 2012 2012 IJLBPR. All Rights Reserved PROTEIN SECONDARY STRUCTURE PREDICTION:
More informationSequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5
Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many
More informationHomology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics
Swiss Institute of Bioinformatics EMBnet course: Introduction to Protein Structure Bioinformatics Homology Modeling I Basel, September 30, 2004 Torsten Schwede Biozentrum - Universität Basel Swiss Institute
More informationCourse Notes: Topics in Computational. Structural Biology.
Course Notes: Topics in Computational Structural Biology. Bruce R. Donald June, 2010 Copyright c 2012 Contents 11 Computational Protein Design 1 11.1 Introduction.........................................
More informationFrancisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans
From: ISMB-97 Proceedings. Copyright 1997, AAAI (www.aaai.org). All rights reserved. ANOLEA: A www Server to Assess Protein Structures Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans Facultés
More informationHYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH
HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationProtein Structure Prediction using String Kernels. Technical Report
Protein Structure Prediction using String Kernels Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street SE Minneapolis, MN 55455-0159
More informationproteins Effect of using suboptimal alignments in template-based protein structure prediction Hao Chen 1 and Daisuke Kihara 1,2,3 * INTRODUCTION
proteins STRUCTURE O FUNCTION O BIOINFORMATICS Effect of using suboptimal alignments in template-based protein structure prediction Hao Chen 1 and Daisuke Kihara 1,2,3 * 1 Department of Biological Sciences,
More informationSupersecondary Structures (structural motifs)
Supersecondary Structures (structural motifs) Various Sources Slide 1 Supersecondary Structures (Motifs) Supersecondary Structures (Motifs): : Combinations of secondary structures in specific geometric
More informationProtein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics
Protein Modeling Methods Introduction to Bioinformatics Iosif Vaisman Ab initio methods Energy-based methods Knowledge-based methods Email: ivaisman@gmu.edu Protein Modeling Methods Ab initio methods:
More informationBetter Bond Angles in the Protein Data Bank
Better Bond Angles in the Protein Data Bank C.J. Robinson and D.B. Skillicorn School of Computing Queen s University {robinson,skill}@cs.queensu.ca Abstract The Protein Data Bank (PDB) contains, at least
More informationActa Cryst. (2017). D73, doi: /s
Acta Cryst. (2017). D73, doi:10.1107/s2059798317010932 Supporting information Volume 73 (2017) Supporting information for article: Designing better diffracting crystals of biotin carboxyl carrier protein
More informationProtein Structure Determination from Pseudocontact Shifts Using ROSETTA
Supporting Information Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Christophe Schmitz, Robert Vernon, Gottfried Otting, David Baker and Thomas Huber Table S0. Biological Magnetic
More informationMolecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007
Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline
More informationGrouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids
Science in China Series C: Life Sciences 2007 Science in China Press Springer-Verlag Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids
More informationImproving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization
Biophysical Journal Volume 101 November 2011 2525 2534 2525 Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization Dong Xu and Yang Zhang
More informationProtein Structure Prediction Using Multiple Artificial Neural Network Classifier *
Protein Structure Prediction Using Multiple Artificial Neural Network Classifier * Hemashree Bordoloi and Kandarpa Kumar Sarma Abstract. Protein secondary structure prediction is the method of extracting
More informationPREDICTION OF PROTEIN BINDING SITES BY COMBINING SEVERAL METHODS
PREDICTION OF PROTEIN BINDING SITES BY COMBINING SEVERAL METHODS T. Z. SEN, A. KLOCZKOWSKI, R. L. JERNIGAN L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University Ames, IA
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationproteins SHORT COMMUNICATION MALIDUP: A database of manually constructed structure alignments for duplicated domain pairs
J_ID: Z7E Customer A_ID: 21783 Cadmus Art: PROT21783 Date: 25-SEPTEMBER-07 Stage: I Page: 1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS SHORT COMMUNICATION MALIDUP: A database of manually constructed
More informationPrediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines
Article Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Yun-Fei Wang, Huan Chen, and Yan-Hong Zhou* Hubei Bioinformatics and Molecular Imaging Key Laboratory,
More informationProtein Structure Determination Using NMR Restraints BCMB/CHEM 8190
Protein Structure Determination Using NMR Restraints BCMB/CHEM 8190 Programs for NMR Based Structure Determination CNS - Brünger, A. T.; Adams, P. D.; Clore, G. M.; DeLano, W. L.; Gros, P.; Grosse-Kunstleve,
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature11054 Supplementary Fig. 1 Sequence alignment of Na v Rh with NaChBac, Na v Ab, and eukaryotic Na v and Ca v homologs. Secondary structural elements of Na v Rh are indicated above the
More informationUseful background reading
Overview of lecture * General comment on peptide bond * Discussion of backbone dihedral angles * Discussion of Ramachandran plots * Description of helix types. * Description of structures * NMR patterns
More informationclustq: Efficient Protein Decoy Clustering Using Superposition-free Weighted Internal Distance Comparisons
clustq: Efficient Protein Decoy Clustering Using Superposition-free Weighted Internal Distance Comparisons Debswapna Auburn University ACM-BCB August 31, 2018 What is protein decoy clustering? Clustering
More informationAlgorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates
Algorithm for Rapid Reconstruction of Protein Backbone from Alpha Carbon Coordinates MARIUSZ MILIK, 1 *, ANDRZEJ KOLINSKI, 1, 2 and JEFFREY SKOLNICK 1 1 The Scripps Research Institute, Department of Molecular
More information