BIOINFORMATICS ORIGINAL PAPER

Size: px
Start display at page:

Download "BIOINFORMATICS ORIGINAL PAPER"

Transcription

1 BIOINFORMATICS ORIGINAL PAPER Vol. 21 no , pages doi: /bioinformatics/bti522 Structural bioinformatics Prediction of protein protein interactions using distant conservation of sequence patterns and structure relationships Jordi Espadaler 1,, Oriol Romero-Isart 2,,, Richard M. Jackson 2, and Baldo Oliva 1, 1 Grup de Bioinformàtica Estructural (GRIB-IMIM), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona 08003, Catalonia, Spain and 2 School of Biochemistry and Microbiology, University of Leeds, Leeds LS2 9JT, UK Received on May 16, 2005; accepted on June 13, 2005 Advance Access publication June 16, 2005 ABSTRACT Motivation: Given that association and dissociation of protein molecules is crucial in most biological processes several in silico methods have been recently developed to predict protein protein interactions. Structural evidence has shown that usually interacting pairs of close homologs (interologs) physically interact in the same way. Moreover, conservation of an interaction depends on the conservation of the interface between interacting partners. In this article we make use of both, structural similarities among domains of known interacting proteins found in the Database of Interacting Proteins (DIP) and conservation of pairs of sequence patches involved in protein protein interfaces to predict putative protein interaction pairs. Results: We have obtained a large amount of putative protein protein interaction ( ). The list is independent from other techniques both experimental and theoretical. We separated the list of predictions into three sets according to their relationship with known interacting proteins found in DIP. For each set, only a small fraction of the predicted protein pairs could be independently validated by cross checking with the Human Protein Reference Database (HPRD). The fraction of validated protein pairs was always larger than that expected by using random protein pairs. Furthermore, a correlation map of interacting protein pairs was calculated with respect to molecular function, as defined in the Gene Ontology database. It shows good consistency of the predicted interactions with data in the HPRD database. The intersection between the lists of interactions of other methods and ours produces a network of potentially high-confidence interactions. Contact: boliva@imim.es Supplementary information: BioinformaticsO5_1/Supplementary_material.pdf INTRODUCTION On the importance of protein protein interactions While the amount of genome sequence information increases exponentially, the annotation of protein sequences appears to be lagging behind, both in terms of quality and quantity. Multi-pronged, high-throughput functional genomics approaches are needed to To whom correspondence should be addressed. The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Present address: Institut de Física d Altes Energies, Universitat Autònoma de Barcelona, Bellaterra (Barcelona) 08193, Catalonia, Spain bridge the gap between raw sequence information and the relevant biochemical and medical information. A wide repertoire of techniques must be used, from proteomics, bioinformatics, nucleotide chemistry and cell biology to model organisms as well as to find targets for modern drug discovery. Most, if not all, biological processes are regulated through association and dissociation of protein molecules. Moreover, functional units of cells are often complex assemblies of several macromolecules, where proteins play a pivotal role. Clearly, at the molecular level, the function of a protein is determined by the set of molecules it interacts with and the result of the interaction (e.g. chemical reaction, signal transduction, etc). Therefore, protein protein interaction networks play an outstanding role in the organization of life (Bornberg-Bauer et al., 2005). The lower bound on binary protein protein interactions and functional links in yeast have been estimated to be in the range of (Von Mering et al., 2002), which corresponds to about nine partners per protein. However, it has been estimated that most protein protein interactions in nature conform to one of about types (Aloy and Russell, 2004). Current methods to find interactions A major goal of functional genomics is to determine protein interaction networks for whole organisms. Experimental methods that can globally tackle the problem have been developed, such as the yeast two-hybrid system (Uetz et al., 2000) and affinity purification followed by mass spectrometry (Gavin et al., 2002). These highthroughput methods have led to the creation of databases containing large sets of protein interactions, such as Database of Interacting Proteins (DIP) (Salwinski et al., 2004), MIPS (Mewes et al., 2004) and Human Protein Reference Database (HPRD) (Peri et al., 2004). In addition, several in silico methods have been developed to predict protein protein interactions based on features such as gene context. These include gene fusion (Marcotte et al., 1999), gene neighborhood (Dandekar et al., 1998) and phylogenetic profiles (Pellegrini et al., 1999). Although all of these methods can be used to predict interactions, their goals are different. Yeast two-hybrid aims to detect direct binary physical binding, while affinity purification aims to detect physical binding in the form of protein complexes. On the other hand, many in silico methods seek to predict functional association, which often implies but is not restricted to physical binding The Author Published by Oxford University Press. All rights reserved. For Permissions, please journals.permissions@oupjournals.org

2 Prediction of protein interactions by structure Use of protein structure to predict interactions An emerging new approach in the protein interactions field is to take advantage of structural information to predict physical binding (Aloy and Russell, 2003; Lu et al., 2002). Although the total number of complexes of known structure is relatively small, it is possible to expand this set by considering homologous proteins. It has been shown that in the majority of cases close homologs (>30% sequence identity) physically interact in the same way with each other (Aloy et al., 2003). However, conservation of a particular interaction depends on the conservation of the interface between interacting partners. Studies indicate that the compositions of contacting residues are unique, and that incorporating evolutionary and predicted structural information improves the prediction of protein protein interactions (Keskin et al., 2004). In general, it has been shown that residues located at the interface tend to be structurally conserved (Ma et al., 2003). A number of studies on a few protein protein interfaces have addressed the question of which are the critical residues at protein-binding sites and the types of sequence motif to be used on protein protein interaction predictions (Li et al., 2004b). Working hypothesis Based on the availability of complexes of known structure, short stretches of contiguous residues (hereafter called residue patches) involved in the interface can be easily determined by analysis of residue contacts. These patches can be converted into profiles by allowing amino acid substitutions based on conservation of chemical properties. The probability of a short stretch of residues in a protein sequence matching a patch profile by chance can be greatly reduced when requiring a simultaneous match of two or more patches. Therefore, pairs of patches co-occurring in the same protein interface can be used to search for proteins which could display similar interactions. We further combine this local sequence similarity-based method with a domain structure similarity approach to narrow the list of putative new interacting proteins (Nye et al., 2005). This list of interactions therefore possesses both, a common domain structure and common interface sequence pattern to the original interaction pairs. The predicted interface is therefore suggestive of a structural model for the putative protein-protein interface. METHODS We have used seven databases for our analysis: (1) the Swiss-Prot database for protein sequences (release 45.1) (Apweiler et al., 2004); (2) the SCOP database for the classification of protein structures (release 1.65) (Andreeva et al., 2004); (3) the DIP database for protein protein interactions experimentally identified (release of January 2004); (4) the SPIN-PP database of protein complexes with known three-dimensional structure of 855 entries of interfaces with <80% sequence identity; (5) the STRING database, only for predicted protein protein relationships/interactions (Mering et al., 2003); (6) the HPRD database with human protein protein interactions manually curated by a critical reading of the published literature by expert biologists; and (7) the Gene Ontology (GO) database of protein functions (Hill et al., 2002). Predicting new interactions by Sequence Search of Interface Patterns (SSIP) A set of pairs of non-identical interacting proteins (or peptides longer than 20 residues) was extracted from the SPIN-PP database. This is hereafter called the seeding set of protein complexes. The protein complexes of the seeding set were grouped as follows: (1) electron transfer; (2) hydrolases; (3) immune system; (4) isomerases; (5) kinases and phosphorylases; (6) lectins; (7) ligases; (8) lyases; (9) membrane spanning; (10) oxido-reductases; (11) oxygen transport; (12) proteases; (13) toxins; (14) transferases; (15) viral derived interactions; and (16) unclassified. Protein complexes were allowed to belong to more than one group. For a protein complex, we defined the distance between a pair of residues of each interacting protein as the minimum distance between the pairs of atoms of each residue. This definition defines the contact region or interface between two proteins in a complex which is composed of at least two chains (Tsai et al., 1996). The interface of the interaction of two proteins under a specific cut-off was defined as the set of residues of two proteins with a distance below that cut-off. This necessarily yields unordered fragments, as well as isolated residues. We defined a patch of residues of one protein in an interaction as a set of more than five contiguous residues, most belonging to the interface of the interaction. Clearly, a patch of residues depends on the cut-off used to define the interface. Some residues may not belong to the interface and still be sequentially surrounded by residues that belong to the interface. One or two residues (located in i and i +1 in the protein sequence) belong to a patch if they are surrounded by two residues (located in i 1 and i + 1 or i + 2 in the protein sequence) that belong to the same interface and the total length of the patch is larger than five residues. A characteristic interface for each complex of the seeding set was obtained by choosing the minimum atom-atom cut-off distance (between 2 and 5 Å) that produced at least two separated patches of residues in each interacting protein. For each patch of the characteristic interface, a profile was built by multiple-alignment of 100 artificial sequences obtained by random sequence substitutions. These substitutions were not constrained by substitution matrix weights, but by rules based on chemical properties of the residues sidechains. The groups of amino acids transposable for substitutions were as follows: (1) negative charge: Glu and Asp; (2) positive charge: Lys, Arg and His; (3) polar (hydrogen bonding): Ser, Thr, Asn and Gln; (4) non-polar (aliphatic): Ala, Val, Leu, Ile and Met; and (5) non-polar (aromatic): Phe, Trp and Tyr. Hidden Markov models (HMMs) were built with the artificial multiple sequence alignment of a patch, hereafter referred to as HMM-patches. The program HMMER was used to build the HMM-patch (Eddy, 1998). The set of HMM-patches of a protein in one complex (i.e. formed by proteins A and B) were used for searching sequences from Swiss-Prot. They were forced to match at least two HMM-patches of a protein under a threshold P -value of Therefore, the P -value to find a particular protein with at least two patches was < The pairs formed by proteins A and B, where A and B were found using the HMM-patches of A and B, respectively, constituted a set of potentially interacting protein-pairs derived from the seeding complex formed by A and B. In order to further refine the prediction, only those biologically relevant pairs formed by proteins from the same species were considered, and the rest of the pairs were disregarded. For each new potential interaction, the sequences matching the alignment with the HMM-patches were used to generate new artificial multiple alignments. These were used to build new HMM-patches and repeat a new search. The procedure was iterated until no new pairs of proteins were added on the set of potentially interacting protein-pairs derived from a seeding couple, AB. Predicting new interactions by Structure Relationship (SR) We use the hypothesis that homologous sequences share similar interactions and, therefore, the set of interacting partners of a given protein are enriched by its homologs (Espadaler et al., 2005). A protein interaction network can be represented by a graph with nodes as proteins and edges as protein interactions. In such a graph, a set of proteins connected to protein X (i.e. physically interacting with X) is named partners of X. Further, we define successive levels of partnership: the set of partners of X is named partners of X at level 1 and the set of partners of the partners of X at level 1 forms the set of partners at level 2, and so on. Given the commutative relation of the interactions (i.e. if B is found in the set of partners of A, then A is found in the set of 3361

3 J.Espadaler et al Table 1. Total number of predicted interactions by SSIP Functional group Seed-pairs SPIN-PP pairs Homodimers Heteromers Total Homolog No homolog Total Homolog No homolog Entries DIP Entries DIP Entries DIP Entries STRING DIP Entries STRING DIP Entries STRING DIP Electron transfer Hydrolases Immune Isomerases Kinases phosphorylases Lectins Ligases Lyases Membrane spanning Oxido reductases Oxygen transport Proteases Toxins Transferase Unclassified Viral derived The results are shown in each row for the groups from SPIN-PP. The number of interactions used as seed are indicated in the first column. The total of predictions obtained by SSIP are indicated in the columns as Entries. For homomers and heteromers the total number of coincidences with DIP interactions are indicated, while only for heteromers it is possible to corroborate the prediction in STRING.

4 Prediction of protein interactions by structure partners of B), protein X should be in the set of partners of itself at level 2 (see Supplementary Material). Therefore, given that homologous proteins perform similar functions associated with similar interaction partners, the sets of partners of protein X at even levels contain more sequences homologous to protein X than a randomly selected set of sequences of the same size. Similarly, the set of partners of protein X at odd levels should contain proteins that would potentially interact with X. Pairs of potential orthologs of known interacting protein partners from a given organism are identified as potentially conserved interactions, or interologs, in a second organism (Matthews et al., 2001). We extended this assumption by considering all possible relatives of two proteins, where we defined as relatives those proteins that share similar fold and function. We used the SCOP database to assign a fold for as many sequences as possible in DIP. Fold, superfamily and family domain codes of SCOP were assigned to a total of 4324 proteins in DIP that could be matched by BLAST to a protein in SCOP, covering one-sixth of all proteins in DIP (i.e. group DIP-SCOP). More precisely, one or more domain codes were assigned to a protein sequence in DIP when the alignment between the two sequences had an E-value < 10 8 over at least 75% of the residues in the SCOP domain. All proteins sharing at least one domain family-code with another protein X are defined as homologs of X. The algorithm to obtain potential new interactions on the basis of structure involves four steps. First, we search all the relatives (using SCOP codes for family) of a pair of interacting proteins, named A and B, extracted from DIP. Second, we consider the pair of proteins A and B, relative to A and B, respectively, will be potential interactions. Third, we increase the number of potential interactions by considering all protein pairs formed by the combination of the relatives of A (including A ) with all partners of A at odd levels (including B) and their relatives (including B ). Similarly, the partners of B at odd levels (including A), and all their relatives (including A ), can potentially interact with any of the relatives of A (including A ). And fourth, redundant couples (independent of the order of proteins forming the pair) were removed. In order to avoid a high number of false positives we only analyzed the odd levels 1 and 3 of structure-based partnership. Combined potential interactions The intersection of the two sets of potential interactions consisting of the SSIP and those predicted by the SR were separated into three lists of pairs. These are designated as follows: (I 1 ) i.e. pairs formed by two interacting proteins from DIP; (I 2 ) i.e. pairs formed by two proteins, A and B, each one sharing at least one domain of the same family with one of the proteins from a pair of interacting proteins in DIP, C and D (i.e. A and C, and also B and D, have a domain of the same family, respectively); and (I 3 ) i.e. pairs formed by proteins that cannot be related with a pair of interacting proteins in DIP through a domain of the same family. The first set (I 1 ) corresponds to the most probable interactions. The second set (I 2 ) corresponds to potential interactions found by using the first structural-based partnership level, and the third set (I 3 ) to potential interactions that could only be found through the third structure-based partnership level. All sets were found by SSIP. Validation of the interactions by HPRD database and prediction of interologs To further evaluate the method we analyzed the sets of the interacting pairs formed by two human proteins of groups I 1,I 2 and I 3. These sets were compared with the database of manually confirmed interactions of human proteins from the HPRD. Finally, the successfully predicted interactions in human proteins can be expanded to the rest of orthologous genes in other species, increasing the number of predictions by putative interologs. It has to be noted, however, that we cannot calculate the accuracy of the prediction in this evaluation, as a large number of predicted interactions has not been tested and should not be considered false. Therefore, we have compared the molecular function of the pairs of interacting proteins in the HPRD database and in the predicted sets as a measure of accuracy (Von Mering et al., 2003). The function of proteins was defined at level two of the GO molecular function ontology. RESULTS Protein protein interactions predicted by sequence search A total of putative interactions were obtained with sequences after searching with HMM-patches using the SSIP method described above. Of the sequences a total of 8552 are also defined as nodes in the DIP database. Figure 1 shows the normalized number of heteromer and homodimer interactions predicted for each functional group defined in SPIN-PP. The hydrolases, proteases and transferases show a larger normalized number of non-homologous heteromer interactions with respect to the other groups. This is not necessarily related to the number of SPIN-PP or seed pairs (cf. immune, oxido-reductases, toxins and viral derived). The prediction of interactions obtained by means of the group of the immune system was the largest set corroborated by DIP (11 out of 748 predictions). Also, a total of 1603 out of 7939 predicted nonhomologous heteromer interactions from the group of transferases were independently predicted by STRING. Table 1 shows the distribution of predicted pairs according to each group of the seeding set. The SSIP method predicts 46 interactions that are also described in the DIP. Most of these interactions are found between pairs of proteins homologous to an interacting couple in the seeding set (39 out of 46), showing that the method found mostly putative interologs, which are pairs of potential orthologs of known interacting protein partners with conserved interactions. The method predicts 3885 interactions that are also found in the STRING database (Mering et al., 2003); however, in this case 3760 belonged to pairs without homology to any of the pairs in the seeding set. These protein protein interactions can be considered to have been arrived at independent of the gene context methods used by STRING. It is not possible to perform the iterative method of prediction (SSIP) for all protein pairs from SPIN-PP, because the dataset of protein-sequences is limited, and the method did not necessarily find new putative patterns for all binding sites. Even though dimmers are removed from the seeding set, it is still possible to predict homodimers. This implies that the sequences of the proteins of a predicted dimmer were matched by remote similarity to the HMMpatches of a pair of interacting proteins in the seeding set that were heteromers. Protein protein interactions predicted by structure For the method of predicting new interactions by Structure Relationships (see Methods section), SCOP codes could be assigned to proteins in DIP. This DIP SCOP subset covers one-sixth of all proteins in DIP. Each pair of proteins from DIP was expanded to interactions, based on the known structure of at least one of the proteins of the pair. If the structure of one of the proteins from the interacting pair was not known, its expansion is not performed. Consequently, we obtain interacting pairs formed by: (i) the relation between two sets of proteins (i.e. using two SCOP family codes, with N and M proteins, respectively) that produced N M putative interacting pairs; (ii) between one single protein and one set of proteins (i.e. with N proteins of the same family) that produced N putative interacting pairs; or (iii) between two single proteins (i.e. the expansion could not be performed for any of the proteins that 3363

5 J.Espadaler et al. Fig. 1. Distribution of predicted interactions of heteromers (a) and homomers (b) by SSIP. The total number of interactions is divided by the number of original interactions from the SPIN-PP database. The normalized number of predictions is shown in bars for each group of seeding pairs from SPIN-PP with interacting pairs of homolog proteins to the seeding pair (black bar) or predictions of non-homolog interactions (gray bar). form the pair). This produced a total of 1220 putative interacting protein-sets (formed by pairs of protein families, or one protein family and a single protein, or two single proteins), with a confirmed protein protein interaction in DIP. The intersection of protein pairs predicted by the SR and SSIP methods was performed, and separated into three sets (see Methods section): (I 1 ) 41 pairs formed by two interacting proteins from DIP (see Supplementary Material); (I 2 ) 1220 protein-sets (given above) that yields protein pairs formed by two proteins, each one sharing at least one domain of the same family with one of the proteins from a pair of interacting proteins in DIP; and (I 3 ) proteinsets yielding protein pairs formed by proteins that cannot be related with a pair of interacting proteins in DIP, through a domain of the same family. The interactions of set I 1 are interactions corroborated experimentally, with a known family relationship in SCOP and sequence patterns that match the requirements of a seeding interaction from SPIN-PP. The predicted interactions set I 2 contains a larger number of interactions than I 1 ; however, few have been confirmed experimentally. Although set I 1 is not making any new prediction, it can be used as a reference for the method. More precisely, we can check the examples by modeling the protein complex to confirm their validity. An example of a predicted interaction The predicted complex between Actin-2 and Profilin in Drosophila (in set I 1 ) can be modeled by that of the structurally characterized orthologs in Bos taurus. Of the three binding sites of B.taurus, two are recognized in the sequences of Actin-2 and Profilin in Drosophila (Fig. 2). The third binding site is half lost with a gap; however, our sequence search was able to detect the location of the binding sites. The putative structure of Actin-2 and Profilin was predicted by means of homology using PSI-BLAST (Altschul et al., 1997), with the corresponding coincidence of fold and family. Also, the interaction was experimentally found by yeast two-hybrid in Drosophila melanogaster. Therefore, this interaction is found in set I 1 with our method. 3364

6 Prediction of protein interactions by structure (a) -MAGRLPACVIDVGTGYSKLGFAGNKEPQFIIPSAIAIKESARVGDTNTRRITKGIEDLD --DDDIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGV------MVG--MGQKD FFIGDEAFDATG-YSIKYPVRHGLVEDWDLMERFLEQCVFKYLRAEPEDHYFLLTEPPLN SYVGDEAQSKRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPLN TPENREYTAEIMFETFNVPGLYIAVQAVLALAASWASRSAEERTLTGIVVDSGDGVTHVI PKANREKMTQIMFETFNTPAMYVAIQAVLSLYAS GRT-TGIVMDSGDGVTHTV PVAEGYVIGSCIKHIPIAGRNITSFIQSLLREREVGIPPEQSLETAKAIKEKHCYICPDI PIYEGYALPHAILRLDLAGRDLTDYLMKILTERGYSFTTTAEREIVRDIKEKLCYVALDF AKEFAKYDTEPGKWIRNFSGVNTVTKAPFNVDVGYERFLGPEIFFHPEFSNPDFTIPLSE EQEMATAASSS-SLEKSYELPDGQV-----ITIGNERFRCPEALFQPSFL-GMESCGIHE IVDNVIQNCPIDVRRPLYNNIVLSGGSTMFKDFGRRLQRDIKRSVDTRLRISENLSEGRI TTFNSIMKCDVDIRKDLYANTVLSGGTTMYPGIADRMQKEIT AL KPKPIDVQVITHHMQRYAVWFGGSMLASTPEFYQVCHTKAAYEEYGPSICRHNPVFGTMT APSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQEYDESGPSIVHRKCF----- (c) (b) PROF_DROME 2btfP PROF_DROME 2btfP PROF_DROME 2btfP MSWQDYVDNQLLASQCVTKACIAGHDG--NIWAQSSGFEVTK---EELSKLISGFDQQ-- AGWNAYIDN-LMADGTCQDAAIVGYKDSPSVWAAVPGKTFVNITPAEVGILVGKDRSSFF DGLTSNGVTLAGQRYIYLSGT-DRVVRAKLGRSG------VHCMKTTQAVIVSIYEDPVQ VNGLTLGGQKCSVIRDSLLQDGEFTMDLRTKSTGGAPTFNITVTMTAKTLVLLMGKEGVH PQQAASVVEKLGDYLITCGY GGMINKKCYEMASHLRRSQY Fig. 2. Structure of the interaction between Actin-2 () and Profilin (PROF_DROME) from Drosophila. Sequence alignment of Actin-2 (a) and Profilin (b) and the HMM profiles from chains A and P of 2btf (β-actin profilin complex from taurus), respectively. The aligned sequences of the binding sites of Actin-2 and Profilin are indicated in background colors according to the interaction: cyan, green and yellow. (c) Superposition of the X-ray structure of β-actin profilin complex (actin in orange, profiling in cyan) and the model of de complex between actin (green) and profilin (red) from Drosophila, obtained with the program MODELLER (Eswar et al., 2003). The superposition was obtained with the program STAMP (Russell and Barton, 1992). Owing to the large evolutionary distance between bovine and fly, the sequences of the binding sites have undergone significant changes. Still, we can use our method to corroborate this interaction, as we have used loose restrictions on the E-value and the presence of at least two binding sites. Consequently, the structure of the complex of Actin-2 and Profilin from Drosophila is modeled using the structure of the complex in B.Taurus (Fig. 2). The final complex structure is obtained by superimposition of the two models on their respective template structures in the complex (Fig. 2c). Although further analysis is required to corroborate the binding energy, we can readily accept the interaction and the model, as these have been experimentally probed. The number of interactions in sets I 2 and I 3 are too large to be similarly treated. Therefore, we need other methods to validate the putative interactions. Here, we compare the results with additional databases and check the consistency of the predictions. Validation of the interactions by HPRD database and prediction of interologs In order to validate the sets of predicted interactions, the results were corroborated with the set of non-dimmer protein protein interactions of the HPRD database. The percentage of validated interactions found in I 1,I 2 and I 3 are indicated in Table 2. The percentage is compared with the result of: (i) using only sequence patterns (SSIP) for the prediction, (ii) pairs predicted by SSIP that are also found in the DIP database (SSIP-DIP) and (iii) pairs predicted by SSIP that are also found in the STRING database (SSIP-STRING). We found 2636 human proteins for which interactions were predicted by SSIP. Interactions numbering 3044 in HPRD involve a pair of proteins from this set. Only 0.17% or 127 out of interactions were validated in HPRD, with <5% coverage. However, the probability of finding the 3044 interactions of HPRD in possible pairs (i.e /2) is 0.09%, while the probability by SSIP was increased to 0.17%. These results show the improvement in coverage with respect to a random selection of interactions. Nevertheless, the best-validated result is obtained with the combination I 1, which gives the worst coverage. An intermediate solution is obtained with set I 2 (% of validated interactions is 0.53% and the coverage is still 1.8%). Since there is no definitive measure for validating predictions, as in the previous experimental and theoretical studies of protein interactions, we used the tendency for interacting proteins to belong to the same GO functional class as a measure of reliability (Von Mering et al., 2003). Our results were compared to the distribution of pairs in the HPRD database of manually confirmed interactions of human proteins. The correlation map of interacting pairs was calculated with respect to molecular function as defined in GO for each protein. The distributions in Figure 3 show the consistency of the prediction (in sets I 2,I 3 and SSIP-STRING) with the data in the HPRD database. The number of pairs containing proteins with enzymatic activity is large, because the seeding set used to obtain the prediction contained a large number of enzymes. Nevertheless, the proportion of pairs containing at least one protein with catalytic activity is similarly distributed in the HPRD database, with 3365

7 J.Espadaler et al. Table 2. Comparative results of predictions of interactions Percentage of validated in HPRD Percentage coverage of HPRD Interactions found in HPRD Predicted interologs Total of interactions Total of interactions of human proteins Total sequences Total human sequences of HPRD I I I SSIP-STRING SSIP-DIP SSIP The percentage of interactions validated by the database of human known interactions HPRD and its coverage is indicted in the first two columns. This is calculated with the corresponding number of pairs of interacting human proteins of the human sequences from HPRD found in each set. (a) (b) (c) (d) Fig. 3. Density of interactions according to molecular function. Density of protein interactions in the HPRD database (a), SSIP-STRING set of interactions (b), I 2 (c) and I 3 (d). The distribution of protein protein interactions is calculated as the ratio of interaction pairs in the square over the total number of protein pairs possibly formed by combination of the proteins in the square. Each square compares sets of proteins with molecular functions defined as in GO: motor activity (M), catalytic activity (C), signal-transduction (ST), structural molecule (S), transporter (T), enzyme-regulator (ER), transcription (TC), translation (TL) and unknown activity (U). The scale of grays shown in the left indicates the intervals of protein protein interactions over possible pairs. The total number of interactions in each set is shown in Table 2, while the HPRD database contains protein protein interactions. few exceptions (i.e. the interactions between enzymes and structural proteins and signal transduction activity, having a larger number of predicted interactions than the number of known interactions in the HPRD database). In addition, the set of interactions correctly predicted among human protein protein interactions is expanded to other species. As only minimal progress has been made in mapping the human proteome using high-throughput screens, the transfer of interaction information within and across species has become increasingly important. This transfer is obtained by assigning pairs of interactions of orthologous genes. Similarly, if two human proteins interact (A and B), the product of its orthologous genes in other species (A and B ) may also interact (this is known as an interolog). According to this hypothesis, we search for all protein pairs in sets I 1,I 2 and I 3 formed by interologs of a pair of interacting proteins in HPRD. This produces in I 1,I 2 and I 3 a total each of 8, 184 and 134 interactions, respectively (Table 2). These predictions are annotated as predicted interologs and they are predicted by our method and also by the interologs approach. The total number of interactions found in HPRD and predicted in I 1,I 2 and I 3 is 108, while the total predicted in SSIP-DIP and 3366

8 Prediction of protein interactions by structure SSIP-STRING and found in HPRD is only 10. The coverage of the prediction in HPRD joining the sets I 1,I 2 and I 3, is 10 times larger than that joining SSIP-DIP and SSIP-STRING (Table 2). Also, the number of interologs predicted using sets I 1,I 2 and I 3 is almost 10 times larger than that predicted when using SSIP-DIP and SSIP-STRING. DISCUSSION A widely adopted methodology is to use the knowledge of the location of binding sites to discover protein protein interactions. Recent studies have been devoted to characterizing and extraction of motif sequences involved in binding sites (Li et al., 2004a) or physical properties involved in the area of interaction (Li et al., 2004b). In the present work, we have extracted sequence motifs involved in the interaction of known complexes. We have transformed these motifs into HMM-patches, according to the conservation of the hydrophobic/hydrophilic relationships between residues. We have used these profiles to search sequences with remote homology that contain two features: (1) more than one interface sequence motif and (2) a degree of structural similarity to the original proteins involved in the interaction. In the present work, we have obtained a large amount of putative protein protein interactions. The lists obtained are independent of other techniques, experimental and theoretical. Consequently, the intersection between the lists of interactions of these methods and ours produces a network of high-confidence interactions. We cannot independently corroborate whether these predictions are correct or not, except some specific cases, such as those predicted for set I 1 or where independent experimental confirmation exists for a protein interaction in the literature. As the sets of interactions are predicted according to a seeding complex of known structure, we can further develop a test based on the comparative modeling of the proteins involved in the interaction and the construction of the putative interface. This was done with an example for which the interaction has been experimentally probed. The result helps in understanding not only the difficulties but also the advantages of this final corroboration. Unfortunately, this cannot be done for the sets I 2 and I 3 with more than predictions, unless an automatic procedure for construction and verification is performed. Therefore we have adopted a different approach to test the quality of the predictions, and to also increase the confidence in some of the predicted interactions we analyzed the total number of predictions by comparison with other databases of protein protein interactions. This comparison was performed with respect to the HPRD database. The method of obtaining our predictions, derived from the DIP database, differs from the database presented in HPRD. Therefore, the number of common interactions in our predicted sets and in HPRD was understandably small. On the other hand, it is expected that most interactions are performed by proteins with similar function and/or location, as many topological studies of the interactome graph have shown (Bader et al., 2004; Deng et al., 2004). Indeed, we also obtained a similar correlation map for protein molecular function, as described in GO, with the interactions of HPRD and our predicted interactions. Yet another method of prediction is based on the transfer of the interaction to other species by homology (interologs). We expect that the rate of false interactions produced in the transfer from human to another specie will be reduced if the interaction is also predicted in the sets of I 1,I 2 or I 3. We obtained a total of 326 protein protein interactions with high expectations of their being true. The predicted protein protein interactions described here have to be corroborated either by experiments or by additional prediction methods for protein protein interactions. In conclusion, further analysis of the structure of these complexes needs to be performed in order to validate some of these interactions. Work is in progress towards an automatic procedure that could discriminate true interactions from false. ACKNOWLEDGEMENTS O.R.I. acknowledges funding from the Strategic Research Fund, Faculty of Biological Sciences, University of Leeds. J.E. acknowledges student fellowships of Departament d Universitats, Recerca i Societat de la Informació de la Generalitat de Catalunya (DURSI). This work was supported by grants from Fundación Ramón Areces and Spanish Ministerio de Ciencia y Tecnología (McyT, BIO ). Conflict of Interest: none declared. REFERENCES Aloy,P. and Russell,R.B. (2003) InterPreTS: protein interaction prediction through tertiary structure. Bioinformatics, 19, Aloy,P. and Russell,R.B. (2004) Ten thousand interactions for the molecular biologist. Nat. Biotechnol., 22, Aloy,P. et al. (2003) The relationship between sequence and interaction divergence in proteins. J. Mol. Biol., 332, Altschul,S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, Andreeva,A. et al. (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res., 32, D226 D229. Apweiler,R. et al. (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Res., 32 (Database issue), D115 D119. Bader,J.S. et al. (2004) Gaining confidence in high-throughput protein interaction networks. Nat. Biotechnol., 22, Bornberg-Bauer,E. et al. (2005) The evolution of domain arrangements in proteins and interaction networks. Cell Mol. Life Sci., 62, Dandekar,T. et al. (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci., 23, Deng,M. et al. (2004) Mapping Gene Ontology to proteins based on protein protein interaction data. Bioinformatics, 20, Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, Espadaler,J. et al. (2005) Detecting remotely related proteins by their interactions and sequence similarity. Proc. Natl Acad. Sci. USA, 102, Eswar,N. et al. (2003) Tools for comparative protein structure modeling and analysis Nucleic Acids Res., 31, Gavin,A.C. et al. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415, Hill,D.P. et al. (2002) Extension and integration of the Gene Ontology (GO): combining GO vocabularies with external vocabularies. Genome Res., 12, Keskin,O. et al. (2004) A new, structurally nonredundant, diverse data set of protein protein interfaces and its implications. Protein Sci., 13, Li,H. et al. (2004a) Discovery of binding motif pairs from protein complex structural data and protein interaction sequence data. Pac. Symp. Biocomput., Li,X. et al. (2004b) Protein Protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking. J. Mol. Biol., 344, Lu,L. et al. (2002) Multiprospector: an algorithm for the prediction of protein protein interactions by multimeric threading Proteins, 49, Ma,B. et al. (2003) Protein protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc. Natl Acad. Sci. USA, 100, Marcotte,E. et al. (1999) Detecting protein function and protein protein interactions from genome sequences. Science, 285,

9 J.Espadaler et al. Matthews,L. et al. (2001) Identification of potential interaction networks using sequencebased searches for conserved protein protein interactions or interologs. Genome Res., 11, Mering,C.V. et al. (2003) STRING: a database of predicted functional associations between proteins. Nucleic Acids Res., 31, Mewes,H.W. et al. (2004) MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res., 32 (Database issue), D41 D44. Nye,T.M. et al. (2005) Statistical analysis of domains in interacting protein pairs. Bioinformatics, 21, Pellegrini,M. et al. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl Acad. Sci. USA, 96, Peri,S. et al. (2004) Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res., 32 (Database issue), D497 D501. Russell,R. and Barton,G. (1992) Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels. Proteins, 14, Salwinski,L. et al. (2004) The Database of Interacting Proteins: 2004 update. Nucleic Acids Res., 32 (Database issue), D449 D451. Tsai,C.J. et al. (1996) A dataset of protein protein interfaces generated with a sequenceorder-independent comparison technique. J. Mol. Biol., 260, Uetz,P. et al. (2000) A comprehensive analysis of protein protein interactions in Saccharomyces cerevisiae. Nature, 403, Von Mering,C. et al. (2002) Comparative assessment of large-scale data sets of protein protein interactions. Nature, 417, Von Mering,C. et al. (2003) Genome evolution reveals biochemical networks and functional modules. Proc. Natl Acad. Sci. USA, 100,

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

Heteropolymer. Mostly in regular secondary structure

Heteropolymer. Mostly in regular secondary structure Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4 C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6!

More information

Towards Detecting Protein Complexes from Protein Interaction Data

Towards Detecting Protein Complexes from Protein Interaction Data Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Comparison of Protein-Protein Interaction Confidence Assignment Schemes

Comparison of Protein-Protein Interaction Confidence Assignment Schemes Comparison of Protein-Protein Interaction Confidence Assignment Schemes Silpa Suthram 1, Tomer Shlomi 2, Eytan Ruppin 2, Roded Sharan 2, and Trey Ideker 1 1 Department of Bioengineering, University of

More information

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics. Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond

More information

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure Bioch/BIMS 503 Lecture 2 Structure and Function of Proteins August 28, 2008 Robert Nakamoto rkn3c@virginia.edu 2-0279 Secondary Structure Φ Ψ angles determine protein structure Φ Ψ angles are restricted

More information

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions? 1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section

More information

Some Problems from Enzyme Families

Some Problems from Enzyme Families Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems

More information

Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two

Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two Supplementary Figure 1. Biopanningg and clone enrichment of Alphabody binders against human IL 23. Positive clones in i phage ELISA with optical density (OD) 3 times higher than background are shown for

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

Interaction Network Topologies

Interaction Network Topologies Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005 Inferrng Protein-Protein Interactions Using Interaction Network Topologies Alberto Paccanarot*,

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Motif Prediction in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions

More information

Prediction of protein function from sequence analysis

Prediction of protein function from sequence analysis Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The omic era Genome Sequencing Projects: Archaea: 74 species In Progress:52 Bacteria:

More information

Packing of Secondary Structures

Packing of Secondary Structures 7.88 Lecture Notes - 4 7.24/7.88J/5.48J The Protein Folding and Human Disease Professor Gossard Retrieving, Viewing Protein Structures from the Protein Data Base Helix helix packing Packing of Secondary

More information

Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013

Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013 Hydration of protein-rna recognition sites Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India 1 st November, 2013 Central Dogma of life DNA

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27

Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27 Acta Cryst. (2014). D70, doi:10.1107/s1399004714021695 Supporting information Volume 70 (2014) Supporting information for article: Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Supplemental Materials for. Structural Diversity of Protein Segments Follows a Power-law Distribution

Supplemental Materials for. Structural Diversity of Protein Segments Follows a Power-law Distribution Supplemental Materials for Structural Diversity of Protein Segments Follows a Power-law Distribution Yoshito SAWADA and Shinya HONDA* National Institute of Advanced Industrial Science and Technology (AIST),

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Evidence for dynamically organized modularity in the yeast protein-protein interaction network Evidence for dynamically organized modularity in the yeast protein-protein interaction network Sari Bombino Helsinki 27.3.2007 UNIVERSITY OF HELSINKI Department of Computer Science Seminar on Computational

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions BIRKBECK COLLEGE (University of London) Advanced Certificate in Principles in Protein Structure MSc Structural Molecular Biology Date: Thursday, 1st September 2011 Time: 3 hours You will be given a start

More information

Physiochemical Properties of Residues

Physiochemical Properties of Residues Physiochemical Properties of Residues Various Sources C N Cα R Slide 1 Conformational Propensities Conformational Propensity is the frequency in which a residue adopts a given conformation (in a polypeptide)

More information

DATE A DAtabase of TIM Barrel Enzymes

DATE A DAtabase of TIM Barrel Enzymes DATE A DAtabase of TIM Barrel Enzymes 2 2.1 Introduction.. 2.2 Objective and salient features of the database 2.2.1 Choice of the dataset.. 2.3 Statistical information on the database.. 2.4 Features....

More information

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.

More information

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Lecture Notes for Fall Network Modeling. Ernest Fraenkel Lecture Notes for 20.320 Fall 2012 Network Modeling Ernest Fraenkel In this lecture we will explore ways in which network models can help us to understand better biological data. We will explore how networks

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

Protein Structure Prediction

Protein Structure Prediction Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on

More information

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller Chemogenomic Chemistry Biology Chemical biology Medical chemistry Chemical genetics Chemoinformatics Bioinformatics Chemoproteomics

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

NMR study of complexes between low molecular mass inhibitors and the West Nile virus NS2B-NS3 protease

NMR study of complexes between low molecular mass inhibitors and the West Nile virus NS2B-NS3 protease University of Wollongong Research Online Faculty of Science - Papers (Archive) Faculty of Science, Medicine and Health 2009 NMR study of complexes between low molecular mass inhibitors and the West Nile

More information

Database update 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families

Database update 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families Database update 3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families Agnel P. Joseph 1, Prashant Shingate 1,2, Atul K. Upadhyay 1 and

More information

Automatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX

Automatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX Genome Informatics 12: 113 122 (2001) 113 Automatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX Atsushi Yoshimori Carlos A. Del Carpio yosimori@translell.eco.tut.ac.jp

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Protein Structures: Experiments and Modeling. Patrice Koehl

Protein Structures: Experiments and Modeling. Patrice Koehl Protein Structures: Experiments and Modeling Patrice Koehl Structural Bioinformatics: Proteins Proteins: Sources of Structure Information Proteins: Homology Modeling Proteins: Ab initio prediction Proteins:

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Analysis of correlated mutations in Ras G-domain

Analysis of correlated mutations in Ras G-domain www.bioinformation.net Volume 13(6) Hypothesis Analysis of correlated mutations in Ras G-domain Ekta Pathak * Bioinformatics Department, MMV, Banaras Hindu University. Ekta Pathak - E-mail: ektavpathak@gmail.com;

More information

7.012 Problem Set 1. i) What are two main differences between prokaryotic cells and eukaryotic cells?

7.012 Problem Set 1. i) What are two main differences between prokaryotic cells and eukaryotic cells? ame 7.01 Problem Set 1 Section Question 1 a) What are the four major types of biological molecules discussed in lecture? Give one important function of each type of biological molecule in the cell? b)

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB) Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein

More information

In-Silico Approach for Hypothetical Protein Function Prediction

In-Silico Approach for Hypothetical Protein Function Prediction In-Silico Approach for Hypothetical Protein Function Prediction Shabanam Khatoon Department of Computer Science, Faculty of Natural Sciences Jamia Millia Islamia, New Delhi Suraiya Jabin Department of

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Final Chem 4511/6501 Spring 2011 May 5, 2011 b Name

Final Chem 4511/6501 Spring 2011 May 5, 2011 b Name Key 1) [10 points] In RNA, G commonly forms a wobble pair with U. a) Draw a G-U wobble base pair, include riboses and 5 phosphates. b) Label the major groove and the minor groove. c) Label the atoms of

More information

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell Mathematics and Biochemistry University of Wisconsin - Madison 0 There Are Many Kinds Of Proteins The word protein comes

More information

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural

More information

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1. Protein Structure Analysis and Verification Course S-114.2500 Basics for Biosystems of the Cell exercise work Maija Nevala, BIO, 67485U 16.1.2008 1. Preface When faced with an unknown protein, scientists

More information

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Viewing and Analyzing Proteins, Ligands and their Complexes 2 2 Viewing and Analyzing Proteins, Ligands and their Complexes 2 Overview Viewing the accessible surface Analyzing the properties of proteins containing thousands of atoms is best accomplished by representing

More information

Measuring quaternary structure similarity using global versus local measures.

Measuring quaternary structure similarity using global versus local measures. Supplementary Figure 1 Measuring quaternary structure similarity using global versus local measures. (a) Structural similarity of two protein complexes can be inferred from a global superposition, which

More information

Comparison of Human Protein-Protein Interaction Maps

Comparison of Human Protein-Protein Interaction Maps Comparison of Human Protein-Protein Interaction Maps Matthias E. Futschik 1, Gautam Chaurasia 1,2, Erich Wanker 2 and Hanspeter Herzel 1 1 Institute for Theoretical Biology, Charité, Humboldt-Universität

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature17991 Supplementary Discussion Structural comparison with E. coli EmrE The DMT superfamily includes a wide variety of transporters with 4-10 TM segments 1. Since the subfamilies of the

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

Structure to Function. Molecular Bioinformatics, X3, 2006

Structure to Function. Molecular Bioinformatics, X3, 2006 Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families

More information

Any protein that can be labelled by both procedures must be a transmembrane protein.

Any protein that can be labelled by both procedures must be a transmembrane protein. 1. What kind of experimental evidence would indicate that a protein crosses from one side of the membrane to the other? Regions of polypeptide part exposed on the outside of the membrane can be probed

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Sequential resonance assignments in (small) proteins: homonuclear method 2º structure determination

Sequential resonance assignments in (small) proteins: homonuclear method 2º structure determination Lecture 9 M230 Feigon Sequential resonance assignments in (small) proteins: homonuclear method 2º structure determination Reading resources v Roberts NMR of Macromolecules, Chap 4 by Christina Redfield

More information

Sequence Based Bioinformatics

Sequence Based Bioinformatics Structural and Functional Analysis of Inosine Monophosphate Dehydrogenase using Sequence-Based Bioinformatics Barry Sexton 1,2 and Troy Wymore 3 1 Bioengineering and Bioinformatics Summer Institute, Department

More information

networks in molecular biology Wolfgang Huber

networks in molecular biology Wolfgang Huber networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018

More information

Improving domain-based protein interaction prediction using biologically-significant negative dataset

Improving domain-based protein interaction prediction using biologically-significant negative dataset Int. J. Data Mining and Bioinformatics, Vol. x, No. x, xxxx 1 Improving domain-based protein interaction prediction using biologically-significant negative dataset Xiao-Li Li*, Soon-Heng Tan and See-Kiong

More information

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS 141 GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS MATTHIAS E. FUTSCHIK 1 ANNA TSCHAUT 2 m.futschik@staff.hu-berlin.de tschaut@zedat.fu-berlin.de GAUTAM

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Exploring Evolution & Bioinformatics

Exploring Evolution & Bioinformatics Chapter 6 Exploring Evolution & Bioinformatics Jane Goodall The human sequence (red) differs from the chimpanzee sequence (blue) in only one amino acid in a protein chain of 153 residues for myoglobin

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology

More information

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling Lethality and centrality in protein networks Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling molecules, or building blocks of cells and microorganisms.

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

Modelling of Possible Binding Modes of Caffeic Acid Derivatives to JAK3 Kinase

Modelling of Possible Binding Modes of Caffeic Acid Derivatives to JAK3 Kinase John von Neumann Institute for Computing Modelling of Possible Binding Modes of Caffeic Acid Derivatives to JAK3 Kinase J. Kuska, P. Setny, B. Lesyng published in From Computational Biophysics to Systems

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

Lecture 15: Realities of Genome Assembly Protein Sequencing

Lecture 15: Realities of Genome Assembly Protein Sequencing Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing

More information

Supplementary information

Supplementary information Supplementary information The structural basis of modularity in ECF-type ABC transporters Guus B. Erkens 1,2, Ronnie P-A. Berntsson 1,2, Faizah Fulyani 1,2, Maria Majsnerowska 1,2, Andreja Vujičić-Žagar

More information

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

What makes a good graphene-binding peptide? Adsorption of amino acids and peptides at aqueous graphene interfaces: Electronic Supplementary

What makes a good graphene-binding peptide? Adsorption of amino acids and peptides at aqueous graphene interfaces: Electronic Supplementary Electronic Supplementary Material (ESI) for Journal of Materials Chemistry B. This journal is The Royal Society of Chemistry 21 What makes a good graphene-binding peptide? Adsorption of amino acids and

More information

Study of Mining Protein Structural Properties and its Application

Study of Mining Protein Structural Properties and its Application Study of Mining Protein Structural Properties and its Application A Dissertation Proposal Presented to the Department of Computer Science and Information Engineering College of Electrical Engineering and

More information

The Structure and Functions of Proteins

The Structure and Functions of Proteins Wright State University CORE Scholar Computer Science and Engineering Faculty Publications Computer Science and Engineering 2003 The Structure and Functions of Proteins Dan E. Krane Wright State University

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach Author manuscript, published in "Journal of Computational Intelligence in Bioinformatics 2, 2 (2009) 131-146" Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach Omar GACI and Stefan

More information

Course Notes: Topics in Computational. Structural Biology.

Course Notes: Topics in Computational. Structural Biology. Course Notes: Topics in Computational Structural Biology. Bruce R. Donald June, 2010 Copyright c 2012 Contents 11 Computational Protein Design 1 11.1 Introduction.........................................

More information

An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules

An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules Ying Liu 1 Department of Computer Science, Mathematics and Science, College of Professional

More information

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models Last time Domains Hidden Markov Models Today Secondary structure Transmembrane proteins Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL

More information

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure Last time Today Domains Hidden Markov Models Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL SSLGPVVDAHPEYEEVALLERMVIPERVIE FRVPWEDDNGKVHVNTGYRVQFNGAIGPYK

More information

ProtoNet 4.0: A hierarchical classification of one million protein sequences

ProtoNet 4.0: A hierarchical classification of one million protein sequences ProtoNet 4.0: A hierarchical classification of one million protein sequences Noam Kaplan 1*, Ori Sasson 2, Uri Inbar 2, Moriah Friedlich 2, Menachem Fromer 2, Hillel Fleischer 2, Elon Portugaly 2, Nathan

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Evaluation of the relative contribution of each STRING feature in the overall accuracy operon classification

Evaluation of the relative contribution of each STRING feature in the overall accuracy operon classification Evaluation of the relative contribution of each STRING feature in the overall accuracy operon classification B. Taboada *, E. Merino 2, C. Verde 3 blanca.taboada@ccadet.unam.mx Centro de Ciencias Aplicadas

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

Protein Secondary Structure Prediction using Feed-Forward Neural Network

Protein Secondary Structure Prediction using Feed-Forward Neural Network COPYRIGHT 2010 JCIT, ISSN 2078-5828 (PRINT), ISSN 2218-5224 (ONLINE), VOLUME 01, ISSUE 01, MANUSCRIPT CODE: 100713 Protein Secondary Structure Prediction using Feed-Forward Neural Network M. A. Mottalib,

More information

Protein Structure Prediction Using Neural Networks

Protein Structure Prediction Using Neural Networks Protein Structure Prediction Using Neural Networks Martha Mercaldi Kasia Wilamowska Literature Review December 16, 2003 The Protein Folding Problem Evolution of Neural Networks Neural networks originally

More information