Protein-protein Interaction Prediction using Desolvation Energies and Interface Properties

Size: px
Start display at page:

Download "Protein-protein Interaction Prediction using Desolvation Energies and Interface Properties"

Transcription

1 200 IEEE International Conference on Bioinformatics and Biomedicine Protein-protein Interaction Prediction using Desolvation Energies and Interface Properties Luis Rueda, Sridip Banerjee, Md. Mominul Aziz, Mohammad Raza School of Computer Science University of indsor, 40 Sunset Ave., indsor, ON, N9B 3P4, Canada Abstract An important aspect in understanding and classifying protein-protein interactions (PPI) is to analyze their interfaces in order to distinguish between transient and obligate complexes. e propose a classification approach to discriminate between these two types of complexes. Our approach has two important aspects. First, we have used desolvation energies amino acid and atom type of the residues present in the interface, which are the input features of the classifiers. Principal components of the data were found and then the classification is performed via linear dimensionality reduction (LDR) methods. Second, we have investigated various interface properties of these interactions. From the analysis of protein quaternary structures, physicochemical properties are treated as the input features of the classifiers. Various features are extracted from each complex, and the classification is performed via different linear dimensionality reduction (LDR) methods. The results on standard benchmarks of transient and obligate protein complexes show that (i) desolvation energies are better discriminants than solvent accessibility and conservation properties, among others, and (ii) the proposed approach outperforms previous solvent accessible area based approaches using support vector machines. Keywords-protein-protein interaction; classification; linear dimensionality reduction; desolvation energy; interface properties I. INTRODUCTION In the field of proteomics one of the current goals is to map the protein interaction networks into different organisms []. In the complex web of interacting proteins, defining a protein by its position needs protein-protein interaction information. Knowledge of this information greatly helps biological research and makes the discovery of novel drug targets much easier. Traditionally, the detection of proteinprotein interactions was limited to labor-intensive experimental techniques such as co-immunoprecipitation or affinity chromatography. However, these methods may not be generally applicable to all proteins in all organisms, and may also be prone to systematic errors. Recently, for the large-scale prediction of protein protein interactions various complementary computational approaches have been developed based on protein sequence, structure and evolutionary relationships in complete genomes. Some of the studies in PPI consider the characterization of the geometry [2], physicochemical properties [3], the preference of residues to appear on the surface [4], and the role of hydrogen and saline bridges, and hydrophobic and polar interactions on the proteins surfaces [5]. Other studies include the analysis of the loss of surface accessible to solvent [6], as a result of the interaction and the analysis of the conservation of residues in the interaction surface [7]. In an upper level, amino acid composition of proteinprotein interfaces has been studied to infer the composition of the residues at the interface, which is generally different from the rest of the surface. In [8], six types of interfaces were studied: intra and inter domains, homo and heterooligomers, and obligate and transient complexes. That study concluded that the amino acid composition of these surfaces are different, as there is only.5% of similarity between the internal and external surfaces, and 0.2% similarity between hetero surfaces belonging to obligate homo complexes and transient homo complexes. To study the behavior of transient and obligate interactions, in [9], a classification of these two types of interactions was proposed, where interactions are classified based on the lifetime of the complex. Obligate interactions are usually more stable, while transient interactions are less stable, and hence, more difficult to discriminate and understand, due to their short life [0]. Protomers from obligate complexes do not exist as stable structures in vivo, whereas protomers of non-obligate complexes may dissociate from each other and stay as stable as functional units. For these reasons, it is of prime importance in proteomics to distinguish between obligate and transient complexes. Additionally, in [], it was proposed that interfaces in obligate complexes are inherently hydrophobic. In [2], three different types of interactions were studied, namely crystal packing, obligate and non-obligate interactions. That study is based on using solvent accessible surface area, conservation scores, and /0/$ IEEE 7

2 shapes of the interfaces. The interfaces of some transient complexes were also found to be with clusters of hydrophobic residues [3]. Moreover, transient complexes are rich in aromatic residues and arginine but depleted in other charged residues [4]. However, hydrophobicity at the interfaces of transient complexes is not as distinguishable from the remainder of the surface as hydrophobicity at the interfaces of the obligate complexes [4]. As a result, it is difficult to make an accurate prediction of the interfaces of transient complexes using a single parameter of residue interface propensity. In [5], a research on protein-protein interactions was conducted in which each interaction is analyzed in physical interaction, co-complex relationship and co-member of the pathway (i.e. enzymes are involved in enzyme or metabolic ways). Although interfaces have been the main subject of study to predict protein-protein interactions, an accuracy of 70% has been independently achieved by several different groups [6] [9]. These approaches have been carried out by analyzing a wide range of parameters, including desolvation energies, amino acid composition, conservation, electrostatic energies, and hydrophobicity. In a recent work, prediction of four different PPI types has been performed, including transient enzyme inhibitor/non enzyme inhibitor and permanent homo/hetero obligate complexes [20]. This work uses association rules to understand and characterize the diverse kinds of interactions, and carry out experiments on 47 preclassified complexes a smaller set than the one used in [2], and which is used here. In this paper, we have proposed a classification approach on using desolvation energy and interface properties to discriminate between transient and obligate interactions, achieving 88.68% and 80.27% accuracy for the datasets of [2] and [2], respectively. II. THE PREDICTION METHODS In order to classify complexes on the basis of desolvation energies (400 features for amino acid type and 324 features for atom type), we have first used principal component analysis (PCA) as a pre-processing step. PCA, though an unsupervised classifier, is applied to eliminate ill-conditioned matrices involved in the linear dimensionality reduction (LDR) techniques. To select the principal components, we have used different threshold values, and selected the threshold that leads to the highest accuracy. After obtaining those principal components, we have classified complexes with LDR methods including the well-known Fisher s discriminant analysis and two hetaroscedastic approaches. For classifying on the basis of different number of physicochemical interface properties, we have also compared the LDR methods with the well-known support vector machine (SVM). The basic idea of LDR is to represent an object of dimension n as a lower-dimensional vector of dimension d, achieving this by performing a lineal transformation. e consider two classes, ω and ω 2, represented by two normally distributed random vectors x N(m, S ) and x 2 N(m 2, S 2 ), respectively, with p and p 2 the a priori probabilities. After the LDR is applied, two new random vectors y = Ax and y 2 = Ax 2, where y N(Am ; AS A t ) and y 2 N(Am 2 ; AS 2 A t ) with m i and S i being the mean vectors and covariance matrices in the original space, respectively. The aim of LDR is to find a linear transformation matrix A in such a way that the new classes (y i = Ax i ) are as separable as possible. Let S = p S + p 2 S 2 and S E =(m m 2 )(m m 2 ) t be the within-class and between-class scatter matrices respectively. Various criteria have been proposed to measure this separability [22]. e consider three LDR methods: (a) the well-know Fisher s discriminant analysis (FDA) [23], [24], where criterion to optimize is as follows. J FDA (A) =tr { (AS A t ) (AS E A t ) }. () The matrix A is found by considering the eigenvector corresponding to the largest eigenvalue of S FDA = S S E. (b) The heteroscedastic discriminant analysis (HDA) approach [25], which aims to obtain the matrix A that maximizes the function J HDA (A) =tr { (AS A t ) [AS E A t AS 2 p log(s 2 S S 2 )+p 2 log(s 2 S 2S 2 ) p p 2 ]} S 2. (2) A t This criterion is maximized by obtaining the eigenvectors, corresponding to the largest eigenvalues, of the matrix: S HDA = S [ ] S E S 2 p log(s 2 S S 2 )+p 2 log(s 2 S 2S 2 ) p p 2 S 2. (3) (c) The Chernoff discriminant analysis (CDA) approach [22], which aims to maximize the following function: J CDA (A) =tr{p p 2 AS E A t (AS A t ) + log(as A t ) p log(as A t ) p 2 log(as 2 A t )}. (4) In [22], a gradient-based algorithm was proposed, which maximizes the function in an iterative way. For this gradient algorithm, a learning rate, α k needs to be computed. In order to ensure that the gradient algorithm converges, α k is maximized by the secant method. One of the keys in this algorithm is the initialization of the matrix A, and in this work, we have performed ten different initializations and then chosen the solution for A that gives the maximum Chernoff distance. 8

3 III. THE FEATURES In our approach, we have introduced the use of desolvation energies as physicochemical properties to discriminate between transient and obligate complexes. e have also used other interface and non-interface properties that include solvent accessibility, among others. A. Desolvation Energies Different approaches have been developed to group different types of protein, based on their different properties. Among them, desolvation energies are very efficient for classification, as shown later in the paper. Knowledge-based contact potential that accounts for hydrophobic interactions, self-energy change upon desolvation of charged and polar atom groups and side-chain entropy loss is called desolvation free energy. In [26], the binding free energy, G bind,is defined by the following equation: G bind = E elec + G des, (5) where E elec is the total electrostatic energy and G des is the total desolvation energy, which for a protein is defined as follows: g(r)σσe ij. (6) If we are considering the interaction between the i th atom of a ligand and the j th atom of a receptor then e ij is the atomic contact potential (ACP) [27] between them and g(r) is a smooth function based on their distance. For simplicity, we consider the smooth function to be linear. e also consider the criteria that for a successful interaction, atoms should be within 7 Å distance. ithin 5 and 7 Å, this range the value of g(r) varies from 0 to using a smooth function. The value of g(r) is for atoms that are less than 5 Å appart [26]. To create the datasets for classification, two pre-classified datasets of protein complexes were obtained from the studies of [2], [2]. The first set of proteins, Mintseris et al. dataset, contains complexes of two classes: 209 transient complexes and 5 obligate complexes. The second dataset, Zhu et al. dataset, contains 62 transient complexes and 75 obligate complexes. e collected the structural information about protein complexes from the protein data bank (PDB) [28]. From [27], we obtained 8 different atom types. For each pair of atom types we obtained the cumulative sum of desolvation energies which were computed using Eq. (6), obtaining 8 2 different values for each complex, and hence 324 features. e also considered pairs of amino acids, and for this, we computed 20 2 values for each pair using Eq. (6), obtaining 400 different features. e then created two data subsets from each of the datasets of Mintseris et al. [2] and Zhu et al. [2]. Additionally, we considered the solvent accessible surface area (SASA) using the the NACCESS program [29] and weighted our prepared four data subsets with the SASA values to include the effective surface taking part of the interactions. In the Mintseris et al. dataset, many proteins have multiple chains, and hence we calculate the desolvation energy value based on the pairs or between multiple chains we call these all against all and one against one comparisons, respectively. Finally, we obtained 2 datasets to test our classification methods. In all of these datasets some feature vectors contain zeros in most of the values, which where filtered by applying PCA. B. Interface Properties e have also considered other properties, mainly for those atoms and amino acids in the interface. A residue is defined as being part of the interface, if its SASA decreases by more than Å 2 upon the formation of the complex. SASA values for the residues were calculated using NACCESS [29] with a probe sphere of radius.4 Å 2. Other derived features, (a) interface area and (b) interface area ratio, which can be derived from this SASA value, were calculated in a same way as the NOXclass method [2]. e have considered 40 features, as opposed to NOXClass that considers six features. These features are number based amino acid composition, and area based amino acid composition, as described below. (a) Number-based amino acid composition: The numberbased amino acid composition, v n, is defined as the frequency of each type of the 20 amino acids in the protein protein interface. After calculating which residues are in the interface we obtained the frequency of each type of the 20 standard amino acids of the residues. (b) Area-based amino acid composition: By weighting each residue with its SASA, the area based amino acid composition v a is computed. v a,i=,...,20 = 2 Interface Area Σ r,type(r)=i SASA(r) (7) (c) Amino acid composition of the interface: This feature was computed as in [2]. (d) Correlation between amino acid composition of interface and protein surface: These two features were also calculated as per the method described in [2]. (e) Gap Volume Index: This feature was computed with the SURFNET program [30], as in [2]. (f) Conservation scores for residues in the interface: This features was computed, as in [2], by the ConSurf method [3]. e describe the datasets used in terms of the features included, where n is the number of features. e have first classified primarily for the first four features (Table III, n =4). These features are (a) interface area, (b) interface area ratio, (c) amino acid composition of the interface and (d) correlation between amino acid composition of interface and protein surface. e have then added two more features: (e) gap volume index and (f) conservation score of the 9

4 interface (Table III, n =6). For the analysis, we have used a larger dataset, by adding another feature, (g) area based amino acid composition (Table III, n = 26). Finally, we have added another feature: (h) number-based amino acid composition (Table III, n =46). e have classified with all these dimensions of features (n =4,n=6,n=26,n=46) (a total of eight properties) and compared the significance and importance of these properties and features. IV. CLASSIFICATION In order to classify each complex, first a linear algebraic operation y = Ax is applied to the n-dimensional vector, obtaining y, a d-dimensional vector, where d is ideally much smaller than n. The linear transformation matrix A corresponds to the one obtained by one of the LDR methods, namely FDA, HDA or CDA. The resulting vector y is then passed through a quadratic Bayesian (QB) classifier [23], which is the optimal classifier for normal distributions. For additional tests, a linear Bayesian (LB) classifiers is considered, by deriving a Bayesian classifier with a common covariance matrix: S = S + S 2. To study the performance of the classifiers, a 0-fold cross validation procedure was carried out, and then the average accuracy was computed, where accuracy for each individual fold was computed as follows: acc = (TP + TN)/N f, where TP and TN are the true positive (obligate) and true negative (transient) counters respectively, and N f is the total number of complexes in the test set of the corresponding fold. For the LDR schemes, three different classifiers were implemented and evaluated, namely the combinations of three LDR criteria, FDA, HDA and CDA, combined with a QB or LB classifier. For each of these classifiers reductions to dimensions d =,..., 20 were performed, followed by QB and LB. In the subsequent tables, each column reports the highest average accuracy among all possible reduced dimensions. Since the classification problem is two-class, FDA always leads to reducing to dimension one. The best accuracy for each method for each dataset is underlined to indicate the classifier that performed best of all for that dataset. For comparison purposes, we have also trained and tested a support vector machine (SVM) with a radial basis function kernel, and optimized the parameters by performing a grid search. V. EXPERIMENTS AND DISCUSSION To present and discuss the results, the following acronyms are used when referring to the different datasets: MAS = Mintseris et. al. dataset all against all with SAS, MA = Mintseris et. al. dataset all against all without SASA, MOS = Mintseris et. al. dataset one against one with SASA, MO = Mintseris et. al. dataset one against one without SASA, ZS = Zhu et. al. dataset with SASA, Z = Zhu et. al. dataset without SASA. Table I CLASSIFICATION RESULTS FOR DESOLVATION PROPERTIES, ATOM TYPE. Quadratic Linear Mint. FDA HDA CDA FDA HDA CDA MAS MA MOS MO Zhu FDA HDA CDA FDA HDA CDA ZS Z Table II CLASSIFICATION RESULTS FOR DESOLVATION PROPERTIES, AMINO ACID TYPE. Quadratic Linear Mint. FDA HDA CDA FDA HDA CDA MAS MA MOS MO Zhu FDA HDA CDA FDA HDA CDA ZS Z The results for the atom type properties are depicted in Table I, while the results for the amino acid type properties are shown in Table II. For the Mintseris et al. dataset, it is clearly observable that the best performance was achieved when using atom type features. Among all the atom type features, the one against one dataset weighted with SASA (solvent accessible surface area) value performed best with LDR methods combined with the QB classifier. For LDR criterion HDA achieves the best performance with an accuracy of 80.27%. Among the classification of all these features of Mintseris et al. dataset, desolvation energies for atom type features leads to the best classification accuracies followed by interface properties and desolvation energies for amino acid type features. This suggests that desolvation energies are more important at the atom type level in classifying transient and obligate complexes. Additionally, classification on the basis of interface properties features yields 79.25% accuracy, which is no less than 2% below the best accuracy achieved by desolvation energies at the atom type level. The results for the interface properties are shown in Table III. The classification accuracies in the table show that the LDR methods achieve better performance in most of the cases. This demonstrates that LDR methods perform better than the SVM. If we observe the interface properties features (Table III), we observe that after adding 20 amino acid compositions area-based features to our primary fourfeature datasets, the classification accuracy decreases. Thus, we infer that amino acid composition area-based features do not contribute to the classification of transient and obligate complexes. Then, we added the amino acid compositions number-based features to the 24 dataset, and accuracy increases to 79.25%. From this, we conclude that amino acid 20

5 Table III CLASSIFICATION RESULTS FOR INTERFACE PROPERTIES. Quadratic Linear Mintseris et. al. n SVM FDA HDA CDA FDA HDA CDA Zhu et. al. n SVM FDA HDA CDA FDA HDA CDA compositions number-based features are good discriminators of obligate and transient complexes. Desolvation energies for amino acid type features performs slightly worse than atom type features, and as good as interface properties features for all types of chain combinations and with weighted and non-weighted SASA. All the best performances achieved by amino acid type features datasets (for all types of chain combinations and with weighted and non-weighted SASA) was achieved by LDR methods combined with the QB classifier. Of these, the LDR criterion that achieves the best performance is CDA in all four different kinds of datasets. For the Zhu et al. dataset, we observe that the best performance is achieved, again, using desolvation energies for atom type features. Since in Zhu et al. dataset there are only two interacting chains in a protein complex, there is no option here to divide it in one against one or all against all combinations. For the desolvation energies for atom type without SASA, 88.68% accuracy was achieved by LDR methods with a linear classifier combined with the HDA criterion. Among the classification of all these features of Zhu et al. dataset, desolvation energies for atom type features achieve the best classification accuracies followed by interface properties and desolvation energies amino acid type features. This suggests that desolvation energies for atom type are more important in classifying transient and obligate complexes. e observe from the interface properties (Table III) the superior classification accuracies of LDR methods. This demonstrates that LDR methods perform better than the SVM. If we see the interface properties features (Table III), we observe that after adding 20 amino acid compositions area based features to our primary sixfeature dataset (accuracy = %), the classification accuracy decreases (accuracy = 78.09%). Thus, we infer that amino acid compositions area-based features do not contribute to the classification of transient and obligate complexes. Then, we added the amino acid composition number-based features to the 24 datasets, and accuracy increases to 8.83%. From this, we conclude that amino acid composition number-based features are good discriminators of obligate and transient complexes. In this dataset, the accuracy obtained when using desolvation energies for atom types is much better than the interface properties accuracies. e clearly observe from this that the desolvation energies for atom type features lead to better perfomance in both datasets than the interface properties features (interface area, interface area ratio, amino acid composition area-based, amino acid composition number-based, correlation between amino acid compositions of interface and protein surface, gap volume index, conservation score of the interface). To conclude the paper and as a matter of comparison, we emphasize on the following two aspects of the proposed approach with respect to previous ones. The proposed method outperformed NOXClass of [2] in terms of using other features that include the amino acid composition, area and number based. The LDR methods outperform the SVM, even when the latter is optimized for the kernel and its parameters. The proposed method reveals that the use of desolvation energies for atom type properties are the best discriminants for transient and obligate complexes, on two well-known datasets. VI. CONCLUSION e have proposed a classification approach that uses desolvation energy properties to distinguish between transient and obligate protein complexes. Our classifiers are based on linear dimensionality reduction (LDR) methods that involve homoscedastic and heteroscedastic criteria coupled with quadratic and linear Bayesian classifiers. The results on two datasets of pre-classified complexes show that the LDR schemes coupled with quadratic Bayesian and linear Bayesian classifier achieves much better classification performance, even better than SVM with an RBF kernel, and far better than previous classification approaches (an increase from 75.2% to 88.68%) to distinguish obligate and transient interactions [2]. Our results, also, clearly demonstrate that desolvation energies are quite important in distinguishing transient and obligate complexes. Our future work involves the use of this approach in different proteinprotein interaction classification problems, including intra and inter domains, homo and hetero-oligomers, and the use of other features such as residual vicinity, shape of the structure of the interface, secondary structure, planarity, physicochemical features, hydrophobicity and others. ACKNOLEDGMENTS This research work has been partially supported by NSERC, the Natural Sciences and Research Council of Canada, grant No. RGPIN 26360, and the University of indsor, internal Start-up and VP research equipment grants. REFERENCES [] A. Mendelsohn and R. Brent, Protein interaction methodstoward an endgame. Science, vol. 284(5422), pp ,

6 [2] M. C. Lawrence and P. M. Colman, Shape complementarity at protein/protein interfaces, J. Mol Biol, vol. 234, no. 4, pp , 993. [3] P. Chakrabarti and J. Janin, Dissecting protein-protein recognition sites, Proteins, vol. 47, no. 3, pp , [4] A. L. Gnatt, P. Cramer, J. Fu, D. A. Bushnell, and R. D. Kornberg, Structural basis of transcription: an RNA polymerase II elongation complex at 3.3 A resolution, Science, vol. 292, no. 5523, pp , 200. [5] D. Xu, C. Tsai, and R. Nussinov, Hydrogen bonds and salt bridges accross protein-protein interfaces, Protein Eng, vol. 0, no. 9, pp , 997. [6] H. Shanahan and J. Thornton, Amino acid architecture and the distribution of polar atoms on the surfaces of proteins, Biopolymers, vol. 78, no. 6, pp , [7] B. Ma, T. Elkayam, H. olfson, and R.Nussinov, Proteinprotein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces, Proc Natl Acad Sci, USA, vol. 00, no. 0, pp , [8] Y. Ofran and B. Rost, Analysing six types of protein-protein interfaces, J Mol Biol, vol. 325, no. 2, pp , [9] I. Nooren and J. Thornton, Diversity of protein-protein interactions, EMBO Journal, vol. 22, no. 4, pp , [0] S. Jones and J. M. Thornton, Principles of protein-protein interactions, Proc. Natl Acad. Sci, USA, vol. 93, no., pp. 3 20, 996. [] F. Glaser, D. M. Steinberg, I. A. Vakser, and N. Ben- Tal, Residue frequencies and pairing preferences at proteinprotein interfaces, Proteins, vol. 43, no. 2, pp , 200. [2] H. Zhu, F. Domingues, I. Sommer, and T. Lengauer, Noxclass: Prediction of protein-protein interaction types, BMC Bioinformatics, vol. 7, no. 27, pp. doi:0.86/ , [3] J. Young, A role for surface hydrophobicity in protein protein recognition, Protein Sci, vol. 3, pp , 994. [4] L. LoConte, C. Chothia, and J. Janin, The atomic structure of protein-protein recognition sites, J Mol Biol, vol. 285, no. 5, pp , 999. [5] Y. Qi, Z. Bar-Joseph, and J. Klein-Seetharaman, Evaluation of different biological data and computational classification methods for use in protein interaction prediction, Proteins, vol. 63, no. 3, pp , [6] A. J. Bordner and R. Abagyan, Statistical analysis and prediction of protein-protein interfaces, Proteins, vol. 60, no. 3, pp , [8] S. Neuvirth and R. Raz, ProMate. a structure based prediction program to identify the location of protein protein binding sites, J Mol Biol, vol. 338, pp. 8 99, [9] H. Zhou and Y. Shan, Prediction of protein interaction sites from sequence profile and residue neighbor list, Proteins, vol. 44, no. 3, pp , 200. [20] S. H. Park, J. Reyes, D. Gilbert, J.. Kim, and S. Kim, Prediction of protein-protein interaction types using association rule based classification, BMC Bioinformatics, vol. 0, no. 36, 2009, doi:0.86/ [2] J. Mintseris and Z. eng, Structure, function, and evolution of transient and obligate protein-protein interactions, Proc Natl Acad Sci, USA, vol. 02, no. 3, pp , [22] L. Rueda and M. Herrera, Linear Dimensionality Reduction by Maximizing the Chernoff Distance in the Transformed Space, Pattern Recognition, vol. 4, no. 0, pp , [23] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. New York, NY: John iley and Sons, Inc., [24] R. Fisher, The Use of Multiple Measurements in Taxonomic Problems, Annals of Eugenics, vol. 7, pp , 936. [25] M. Loog and P. Duin, Linear Dimensionality Reduction via a Heteroscedastic Extension of LDA: The Chernoff Criterion, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 6, pp , [26] C. Camacho and C. Zhang, FastContact: rapid estimate of contact and binding free energies, Bioinformatics, vol. 2, no. 0, pp , [27] C. Zhang, G. Vasmatzis, J. L.Cornette, and C. DeLisi, Determination of atomic desolvation energies from the structures of crystallized proteins, J. Mol. Biol., vol. 267, pp , 997. [28] H. Berman, J. estbrook, Z. Feng, G. Gilliland, T. Bhat, H. eissig, I. Shindyalov, and P. Bourne, The Protein Data Bank, Nucleic Acids Research, vol. 28, pp , [29] S. Hubbard and J. Thornton, Naccess, computer program, 993. [30] R. Laskowski, Surfnet: a program for visualizing molecular surfaces, cavities and intermolecular interactions. J Mol Graph, vol. 3(5):323:30, pp , 995. [3] B.-T. A. Armon, D. Graur, Consurf:an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol, vol. 307, no , 200. [7] H. J. Caffrey and S. Somaroo, Are protein protein interfaces more conserved in sequence than the rest of the protein surface? Protein Science, vol. 3, pp ,

Analysis of Relevant Physicochemical Properties in Obligate and Non-obligate Protein-protein Interactions

Analysis of Relevant Physicochemical Properties in Obligate and Non-obligate Protein-protein Interactions Analysis of Relevant Physicochemical Properties in Obligate and Non-obligate Protein-protein Interactions Mina Maleki, Md. Mominul Aziz, Luis Rueda School of Computer Science, University of Windsor 401

More information

Prediction of crystal packing and biological proteinprotein interactions with Linear Dimensionality Reduction-SVD

Prediction of crystal packing and biological proteinprotein interactions with Linear Dimensionality Reduction-SVD University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations 2011 Prediction of crystal packing and biological proteinprotein interactions with Linear Dimensionality Reduction-SVD

More information

Linear Dimensionality Reduction by Maximizing the Chernoff Distance in the Transformed Space

Linear Dimensionality Reduction by Maximizing the Chernoff Distance in the Transformed Space University of Windsor Scholarship at UWindsor Computer Science Publications Department of Computer Science 2008 Linear Dimensionality Reduction by Maximizing the Chernoff Distance in the Transformed Space

More information

Analyzing six types of protein-protein interfaces. Yanay Ofran and Burkhard Rost

Analyzing six types of protein-protein interfaces. Yanay Ofran and Burkhard Rost Analyzing six types of protein-protein interfaces Yanay Ofran and Burkhard Rost Goal of the paper To check 1. If there is significant difference in amino acid composition in various interfaces of protein-protein

More information

Detection of Protein Binding Sites II

Detection of Protein Binding Sites II Detection of Protein Binding Sites II Goal: Given a protein structure, predict where a ligand might bind Thomas Funkhouser Princeton University CS597A, Fall 2007 1hld Geometric, chemical, evolutionary

More information

Characterization of Protein Protein Interfaces

Characterization of Protein Protein Interfaces Protein J DOI 1.17/s193-7-91-x Characterization of Protein Protein Changhui Yan Æ Feihong Wu Æ Robert L. Jernigan Æ Drena Dobbs Æ Vasant Honavar Ó Springer Science+Business Media, LLC 7 Abstract We analyze

More information

Measuring quaternary structure similarity using global versus local measures.

Measuring quaternary structure similarity using global versus local measures. Supplementary Figure 1 Measuring quaternary structure similarity using global versus local measures. (a) Structural similarity of two protein complexes can be inferred from a global superposition, which

More information

Analysis of Obligate and Non-obligate Complexes using Desolvation Energies in Domain-domain Interactions

Analysis of Obligate and Non-obligate Complexes using Desolvation Energies in Domain-domain Interactions Analysis of Obligate and Non-obligate Complexes using Desolvation Energies in Domain-domain Interactions Mina Maleki, Md. Mominul Aziz, and Luis Rueda School of Computer Science University of Windsor 401

More information

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Article Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Yun-Fei Wang, Huan Chen, and Yan-Hong Zhou* Hubei Bioinformatics and Molecular Imaging Key Laboratory,

More information

Motif Prediction in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach Prof. Dr. M. A. Mottalib, Md. Rahat Hossain Department of Computer Science and Information Technology

More information

PDBe TUTORIAL. PDBePISA (Protein Interfaces, Surfaces and Assemblies)

PDBe TUTORIAL. PDBePISA (Protein Interfaces, Surfaces and Assemblies) PDBe TUTORIAL PDBePISA (Protein Interfaces, Surfaces and Assemblies) http://pdbe.org/pisa/ This tutorial introduces the PDBePISA (PISA for short) service, which is a webbased interactive tool offered by

More information

Learning Kernel Parameters by using Class Separability Measure

Learning Kernel Parameters by using Class Separability Measure Learning Kernel Parameters by using Class Separability Measure Lei Wang, Kap Luk Chan School of Electrical and Electronic Engineering Nanyang Technological University Singapore, 3979 E-mail: P 3733@ntu.edu.sg,eklchan@ntu.edu.sg

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Conditional Graphical Models

Conditional Graphical Models PhD Thesis Proposal Conditional Graphical Models for Protein Structure Prediction Yan Liu Language Technologies Institute University Thesis Committee Jaime Carbonell (Chair) John Lafferty Eric P. Xing

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Beta Atomic Contacts: Identifying Critical Specific Contacts in Protein Binding Interfaces

Beta Atomic Contacts: Identifying Critical Specific Contacts in Protein Binding Interfaces Page 1 of 9 Beta Atomic Contacts: Identifying Critical Specific Contacts in Protein Binding Interfaces Qian Liu, Chee Keong Kwoh, Steven C. H. Hoi Abstract Specific binding between proteins plays a crucial

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

Protein structure forces, and folding

Protein structure forces, and folding Harvard-MIT Division of Health Sciences and Technology HST.508: Quantitative Genomics, Fall 2005 Instructors: Leonid Mirny, Robert Berwick, Alvin Kho, Isaac Kohane Protein structure forces, and folding

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Protein Complex Identification by Supervised Graph Clustering

Protein Complex Identification by Supervised Graph Clustering Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie

More information

Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes

Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes Introduction Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes The production of new drugs requires time for development and testing, and can result in large prohibitive costs

More information

schematic diagram; EGF binding, dimerization, phosphorylation, Grb2 binding, etc.

schematic diagram; EGF binding, dimerization, phosphorylation, Grb2 binding, etc. Lecture 1: Noncovalent Biomolecular Interactions Bioengineering and Modeling of biological processes -e.g. tissue engineering, cancer, autoimmune disease Example: RTK signaling, e.g. EGFR Growth responses

More information

Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes

Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes Introduction The production of new drugs requires time for development and testing, and can result in large prohibitive costs

More information

Multi-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria

Multi-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria Multi-Class Linear Dimension Reduction by Weighted Pairwise Fisher Criteria M. Loog 1,R.P.W.Duin 2,andR.Haeb-Umbach 3 1 Image Sciences Institute University Medical Center Utrecht P.O. Box 85500 3508 GA

More information

Insights into Protein Protein Interfaces using a Bayesian Network Prediction Method

Insights into Protein Protein Interfaces using a Bayesian Network Prediction Method doi:10.1016/j.jmb.2006.07.028 J. Mol. Biol. (2006) 362, 365 386 Insights into Protein Protein Interfaces using a Bayesian Network Prediction Method James R. Bradford 1, Chris J. Needham 2, Andrew J. Bulpitt

More information

Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data

Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data From Fisher to Chernoff M. Loog and R. P.. Duin Image Sciences Institute, University Medical Center Utrecht, Utrecht, The Netherlands,

More information

Myoelectrical signal classification based on S transform and two-directional 2DPCA

Myoelectrical signal classification based on S transform and two-directional 2DPCA Myoelectrical signal classification based on S transform and two-directional 2DPCA Hong-Bo Xie1 * and Hui Liu2 1 ARC Centre of Excellence for Mathematical and Statistical Frontiers Queensland University

More information

A Novel Rejection Measurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis

A Novel Rejection Measurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis 009 0th International Conference on Document Analysis and Recognition A Novel Rejection easurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis Chun Lei He Louisa Lam Ching

More information

Structural and mechanistic insight into the substrate. binding from the conformational dynamics in apo. and substrate-bound DapE enzyme

Structural and mechanistic insight into the substrate. binding from the conformational dynamics in apo. and substrate-bound DapE enzyme Electronic Supplementary Material (ESI) for Physical Chemistry Chemical Physics. This journal is the Owner Societies 215 Structural and mechanistic insight into the substrate binding from the conformational

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Plan Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Exercise: Example and exercise with herg potassium channel: Use of

More information

Boosting: Algorithms and Applications

Boosting: Algorithms and Applications Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition

More information

The prediction of membrane protein types with NPE

The prediction of membrane protein types with NPE The prediction of membrane protein types with NPE Lipeng Wang 1a), Zhanting Yuan 1, Xuhui Chen 1, and Zhifang Zhou 2 1 College of Electrical and Information Engineering Lanzhou University of Technology,

More information

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Biophysical Journal, Volume 98 Supporting Material Molecular dynamics simulations of anti-aggregation effect of ibuprofen Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Supplemental

More information

A General Model for Amino Acid Interaction Networks

A General Model for Amino Acid Interaction Networks Author manuscript, published in "N/P" A General Model for Amino Acid Interaction Networks Omar GACI and Stefan BALEV hal-43269, version - Nov 29 Abstract In this paper we introduce the notion of protein

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller Chemogenomic Chemistry Biology Chemical biology Medical chemistry Chemical genetics Chemoinformatics Bioinformatics Chemoproteomics

More information

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels

Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Adaptive Kernel Principal Component Analysis With Unsupervised Learning of Kernels Daoqiang Zhang Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 2193, China

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

Reconnaissance d objetsd et vision artificielle

Reconnaissance d objetsd et vision artificielle Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le

More information

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery AtomNet A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery Izhar Wallach, Michael Dzamba, Abraham Heifets Victor Storchan, Institute for Computational and

More information

Structure-Activity Modeling - QSAR. Uwe Koch

Structure-Activity Modeling - QSAR. Uwe Koch Structure-Activity Modeling - QSAR Uwe Koch QSAR Assumption: QSAR attempts to quantify the relationship between activity and molecular strcucture by correlating descriptors with properties Biological activity

More information

FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach

FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach Shirley Hui and Forbes J. Burkowski University of Waterloo, 200 University Avenue W., Waterloo, Canada ABSTRACT A topic

More information

Introducing Hippy: A visualization tool for understanding the α-helix pair interface

Introducing Hippy: A visualization tool for understanding the α-helix pair interface Introducing Hippy: A visualization tool for understanding the α-helix pair interface Robert Fraser and Janice Glasgow School of Computing, Queen s University, Kingston ON, Canada, K7L3N6 {robert,janice}@cs.queensu.ca

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Automatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX

Automatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX Genome Informatics 12: 113 122 (2001) 113 Automatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX Atsushi Yoshimori Carlos A. Del Carpio yosimori@translell.eco.tut.ac.jp

More information

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES Volume 5, Number 3-4, Pages 351 358 c 2009 Institute for Scientific Computing and Information STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

PREDICTION OF PROTEIN BINDING SITES BY COMBINING SEVERAL METHODS

PREDICTION OF PROTEIN BINDING SITES BY COMBINING SEVERAL METHODS PREDICTION OF PROTEIN BINDING SITES BY COMBINING SEVERAL METHODS T. Z. SEN, A. KLOCZKOWSKI, R. L. JERNIGAN L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University Ames, IA

More information

arxiv:q-bio/ v1 [q-bio.mn] 13 Aug 2004

arxiv:q-bio/ v1 [q-bio.mn] 13 Aug 2004 Network properties of protein structures Ganesh Bagler and Somdatta Sinha entre for ellular and Molecular Biology, Uppal Road, Hyderabad 57, India (Dated: June 8, 218) arxiv:q-bio/89v1 [q-bio.mn] 13 Aug

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia

More information

Protein structure sampling based on molecular dynamics and improvement of docking prediction

Protein structure sampling based on molecular dynamics and improvement of docking prediction 1 1 1, 2 1 NMR X Protein structure sampling based on molecular dynamics and improvement of docking prediction Yusuke Matsuzaki, 1 Yuri Matsuzaki, 1 Masakazu Sekijima 1, 2 and Yutaka Akiyama 1 When a protein

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids Science in China Series C: Life Sciences 2007 Science in China Press Springer-Verlag Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid ASMDA Brest May 2005 Introduction Modern data are high dimensional: Imagery:

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH

SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH Ashutosh Kumar Singh 1, S S Sahu 2, Ankita Mishra 3 1,2,3 Birla Institute of Technology, Mesra, Ranchi Email: 1 ashutosh.4kumar.4singh@gmail.com,

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

Protein quality assessment

Protein quality assessment Protein quality assessment Speaker: Renzhi Cao Advisor: Dr. Jianlin Cheng Major: Computer Science May 17 th, 2013 1 Outline Introduction Paper1 Paper2 Paper3 Discussion and research plan Acknowledgement

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High

More information

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature17991 Supplementary Discussion Structural comparison with E. coli EmrE The DMT superfamily includes a wide variety of transporters with 4-10 TM segments 1. Since the subfamilies of the

More information

In silico pharmacology for drug discovery

In silico pharmacology for drug discovery In silico pharmacology for drug discovery In silico drug design In silico methods can contribute to drug targets identification through application of bionformatics tools. Currently, the application of

More information

BIBC 100. Structural Biochemistry

BIBC 100. Structural Biochemistry BIBC 100 Structural Biochemistry http://classes.biology.ucsd.edu/bibc100.wi14 Papers- Dialogue with Scientists Questions: Why? How? What? So What? Dialogue Structure to explain function Knowledge Food

More information

Protein-Protein Interaction Classification Using Jordan Recurrent Neural Network

Protein-Protein Interaction Classification Using Jordan Recurrent Neural Network Protein-Protein Interaction Classification Using Jordan Recurrent Neural Network Dilpreet Kaur Department of Computer Science and Engineering PEC University of Technology Chandigarh, India dilpreet.kaur88@gmail.com

More information

Probabilistic Class-Specific Discriminant Analysis

Probabilistic Class-Specific Discriminant Analysis Probabilistic Class-Specific Discriminant Analysis Alexros Iosifidis Department of Engineering, ECE, Aarhus University, Denmark alexros.iosifidis@eng.au.dk arxiv:8.05980v [cs.lg] 4 Dec 08 Abstract In this

More information

Fluid Dynamics Models for Low Rank Discriminant Analysis

Fluid Dynamics Models for Low Rank Discriminant Analysis Yung-Kyun Noh, Byoung-Tak Zhang Daniel D. Lee GRASP Lab, University of Pennsylvania, Philadelphia, PA 94, USA Biointelligence Lab, Seoul National University, Seoul 5-74, Korea nohyung@seas.upenn.edu, btzhang@bi.snu.ac.kr,

More information

2 Spial. Chapter 1. Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6. Pathway level. Atomic level. Cellular level. Proteome level.

2 Spial. Chapter 1. Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6. Pathway level. Atomic level. Cellular level. Proteome level. 2 Spial Chapter Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Spial Quorum sensing Chemogenomics Descriptor relationships Introduction Conclusions and perspectives Atomic level Pathway level Proteome

More information

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr Study on Classification Methods Based on Three Different Learning Criteria Jae Kyu Suhr Contents Introduction Three learning criteria LSE, TER, AUC Methods based on three learning criteria LSE:, ELM TER:

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Discriminant Kernels based Support Vector Machine

Discriminant Kernels based Support Vector Machine Discriminant Kernels based Support Vector Machine Akinori Hidaka Tokyo Denki University Takio Kurita Hiroshima University Abstract Recently the kernel discriminant analysis (KDA) has been successfully

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach Author manuscript, published in "Journal of Computational Intelligence in Bioinformatics 2, 2 (2009) 131-146" Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach Omar GACI and Stefan

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA) Symmetric Two Dimensional inear Discriminant Analysis (2DDA) Dijun uo, Chris Ding, Heng Huang University of Texas at Arlington 701 S. Nedderman Drive Arlington, TX 76013 dijun.luo@gmail.com, {chqding,

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

IMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS

IMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS IMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS Aslı Filiz 1, Eser Aygün 2, Özlem Keskin 3 and Zehra Cataltepe 2 1 Informatics Institute and 2 Computer Engineering Department,

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Motif Extraction and Protein Classification

Motif Extraction and Protein Classification Motif Extraction and Protein Classification Vered Kunik 1 Zach Solan 2 Shimon Edelman 3 Eytan Ruppin 1 David Horn 2 1 School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel {kunikver,ruppin}@tau.ac.il

More information

A New Similarity Measure among Protein Sequences

A New Similarity Measure among Protein Sequences A New Similarity Measure among Protein Sequences Kuen-Pin Wu, Hsin-Nan Lin, Ting-Yi Sung and Wen-Lian Hsu * Institute of Information Science Academia Sinica, Taipei 115, Taiwan Abstract Protein sequence

More information

Structural Bioinformatics (C3210) Molecular Docking

Structural Bioinformatics (C3210) Molecular Docking Structural Bioinformatics (C3210) Molecular Docking Molecular Recognition, Molecular Docking Molecular recognition is the ability of biomolecules to recognize other biomolecules and selectively interact

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Pose and affinity prediction by ICM in D3R GC3. Max Totrov Molsoft

Pose and affinity prediction by ICM in D3R GC3. Max Totrov Molsoft Pose and affinity prediction by ICM in D3R GC3 Max Totrov Molsoft Pose prediction method: ICM-dock ICM-dock: - pre-sampling of ligand conformers - multiple trajectory Monte-Carlo with gradient minimization

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Ant Colony Approach to Predict Amino Acid Interaction Networks

Ant Colony Approach to Predict Amino Acid Interaction Networks Ant Colony Approach to Predict Amino Acid Interaction Networks Omar Gaci, Stefan Balev To cite this version: Omar Gaci, Stefan Balev. Ant Colony Approach to Predict Amino Acid Interaction Networks. IEEE

More information

Chem 204. Mid-Term Exam I. July 21, There are 3 sections to this exam: Answer ALL questions

Chem 204. Mid-Term Exam I. July 21, There are 3 sections to this exam: Answer ALL questions Chem 204 Mid-Term Exam I July 21, 2009 Name: Answer Key Student ID: There are 3 sections to this exam: Answer ALL questions Section I: Multiple-Choice 20 questions, 2 pts each Section II: Fill-in-the-Blank

More information

Classification of high dimensional data: High Dimensional Discriminant Analysis

Classification of high dimensional data: High Dimensional Discriminant Analysis Classification of high dimensional data: High Dimensional Discriminant Analysis Charles Bouveyron, Stephane Girard, Cordelia Schmid To cite this version: Charles Bouveyron, Stephane Girard, Cordelia Schmid.

More information

Protein surface descriptors for binding sites comparison and ligand prediction

Protein surface descriptors for binding sites comparison and ligand prediction Protein surface descriptors for binding sites comparison and ligand prediction Rayan Chikhi Internship report, Kihara Bioinformatics Laboratory, Purdue University, 2007 Abstract. Proteins molecular recognition

More information

Docking. GBCB 5874: Problem Solving in GBCB

Docking. GBCB 5874: Problem Solving in GBCB Docking Benzamidine Docking to Trypsin Relationship to Drug Design Ligand-based design QSAR Pharmacophore modeling Can be done without 3-D structure of protein Receptor/Structure-based design Molecular

More information