Protein Structure Comparison Methods

Size: px
Start display at page:

Download "Protein Structure Comparison Methods"

Transcription

1 Protein Structure Comparison Methods D. Petrova Key Words: Protein structure comparison; models; comparison algorithms; similarity measure Abstract. Existing methods for protein structure comparison are examined and their basic properties are presented. Analysis of the methods is made, which is based on their three major components model of the structure, comparison algorithm and similarity measure. Each method is evaluated according some criteria, which include the level of representation and the way of service of the model, the complexity and strategy of the comparison algorithm and how meaningful the similarity measure is. The methods are grouped according these criteria. Advantages and disadvantages of all groups are discussed in aspect to their application for solving different type of problems in contemporary structural biology. Introduction Proteins are long chains of amino acids and like other biological macromolecules are essential parts of organisms and participate in every process within cells. Many proteins are enzymes that catalyze biochemical reactions and are vital to metabolism. Proteins also have structural, mechanical or transport functions or form system of scaffolding that maintains cell shape. Other proteins are important in cell signaling, immune responses, cell adhesion and the cell cycle... The number of newly determined protein structures is growing fast. When the Protein Data Bank [1] was originally founded it contained just 7 protein structures. Since then it has achieved an approximate exponential growth in the number of structures and does not show any sign of falling off. Today there are protein structures, deposited in Protein Data Bank and up to 2015 their number would reach The need for developing new methods and algorithms for studying protein structure, function and evolution is evident. One of the main tasks in these studies is the precise and adequate comparison of protein structure. Protein structure comparisons are employed in almost all branches of contemporary structural biology. They are applied for: Protein fold classification SCOP [2] is one of the databases, where the classification of protein structures is based on evolutionary relationships and on the principles that govern their three-dimensional structure. The method used to construct the protein classification in SCOP is essentially the visual inspection and comparison of structures. CATH [3] includes the adaptation of a method for rapid structure comparison, based on secondary structure matching. Protein structure modelling for protein structure prediction [4]. Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry. Its aim is the prediction of the three-dimensional structure of proteins from their amino acid sequences, sometimes including additional relevant information such as the structures of related proteins. In other words, it deals with the prediction of a protein s tertiary structure from its primary structure, where the prediction is based on comparison with proteins with already known structure. Structure-based function prediction: [5] the functions of proteins are strongly connected with their structure and chemical characteristics. The structure of a protein can directly reveal the mechanisms of its functions. This is the reason why protein structure comparison methods can be applied to obtain structural information, which will aid the function prediction. Structure comparison methods are needed here for determining the unknown protein functions, where the conclusions are based on the detected structural similarity with proteins with already known functions. Protein Structure Comparison Methods In order to compare two (or more) protein structures three components are necessary: model of the protein, alignment algorithm and similarity measure. The ideal combination of these 3 components would be sensitive give high scores to similar structures, selective give low scores to structures, which are not similar and fast. A: Protein Structure Representation/Model One of the main characteristics of protein model is the level of representation and how detailed it is. Òwo sets of structure elements A = { a1, a2,..., an} for protein P A and B = { b 1, b 2,..., b m} for protein P B are given. The choice of structure elements defines the following possibilities for the level of representation: 1) Low, atomic level, where the coordinates of Cα atoms of the proteins are used to construct the model. Let the set of Cα atoms in protein X be defined with eq. (1). (1) Cα X ) = Cα, Cα,... Cα } ( { 1 2 p Then the sets A and B of structure elements for both compared proteins can be defined as: (2) A Cα P ) ( A B Cα( P B ) 2) Atomic level of representation, where proteins are presented with fragments with fixed size. ragment is defined as a sequence of amino acids: (3) f = { Cα 1, Cα 2,... Cα k} Parameter k in eq.(3) is the fragment s size (the number of amino acids, which are included in the fragment). Then the sets of structure elements A and B can be defined as sets of fragments with fixed size: (4) A = { f1, f2,,... f r } B = { f f,... } 1, 2, f s 3) The level of secondary structure elements, where

2 proteins are presented with their sets of helices and sheets/ strands. Let the set of secondary structure elements in protein X be defined with eq. (5). SSE (5) ( X ) = h1,... hp, { s11, s12,... s1 } { } l,... sq1, sq2,... sql SH 1 SHq Parameter p is the number of helices ( h i ) and q is the number of sheets ( SH j ) in the structure. Sheets are presented with the strands, which compose them ( s jk ). In this case the sets of structure elements for both proteins can be defined as: (6) A SSE P ) ( A B SSE( P B ) 4) Combination of both of the previous. When the level is specified, the type of distances, which would be compared and superimposed, can further define the model. If the model consists of set of points or secondary structure elements, then the intermolecular distances are computed between elements, which are fixed to be equivalent. This type of model does not usually consider the flexibility of the molecules. The other type of model uses intramolecular distances (distance matrices or contact maps). The model consists of all relative positions or distances between elements from one protein molecule, which have to be compared with the relative distances and positions in the other protein molecule. The oldest and the most common method to compare protein structures is to select a set of equivalent points usually Cα atoms and superimpose them in 3D space to minimize the least-squares distances between corresponding atoms from both proteins. Proteins are presented with sequences of their Cα atoms as points in 3D space in RAND and RAG [6], proposed by Akutsu and in MINRMS [7]. Their sets of structure elements to be compared are defined with eq. (2). igure 1 shows such kind of representation of two protein molecules P A and with their corresponding sets A and B of coordinates, which are rotated and translated to find the alignment {, b, a, b, a, b, a, b, a, b, a b } a , igure 1. Protein structure alignment by selecting equivalent points in 3D space These models are detailed and precise, if the only requirement is to compare points in 3D space, which are the atomic coordinates of the two different macromolecules. However, a 6 PB common comparison of coordinates of Cα atoms is not sufficient, when these are biological macromolecules. The disadvantage of such models is that they could miss similarities between divergent structures, which are biologically similar. These models are very detailed and the useful information that they bring in some cases exceeds the necessary one. Combinatorial extension (CE) of Shindyalov and Bourne [8], lexprot [9] and ATCAT [10] are also comparison methods, which examine the protein structure at the low atomic (Cα atoms) level, but they group the atoms in fragments parts with fixed size of compared proteins. ragment pairs with a size, defined by the number of residues, which compose them, are aligned. Models of CE, lexprot and ATCAT are defined with eq. (4). Bostick and Vaisman made an interesting proposal for a model [11]. They applied Delaunay-based topological mapping for protein structure comparison. The model of the protein is based upon the topology of its cores especially Cα atoms eq. (2), which form the protein backbone. our-body nearest neighbors of Cα atoms are clustered and used for comparison. Delaunay Tessellation is made to represent the set of points that compose Amino Acid residues the four nearest neighbors are arranged at the vertices of irregular tetrahedral, which is called Delaunay simplex. The length of a simplex edge is defined as the number of Cα atoms, which compose the segment of the protein between its simplex vertices. If all residues are enumerated with consecutive integers according their occurrence in primary structure of proteins, the length of a simplex edge d ij is defined as: (7) d ij = j i 1 Indices i and j in eq. (7) are the integer numbers, which residues have obtained by the enumeration. Each simplex in the tessellation has three lengths, associated with the four vertices. Some protein structure comparison methods use coordinates of Cα atoms to construct model, which is called distance matrix. The structure of compared proteins is examined at the level of Cα atoms as the group discussed above. The intramolecular distances between the sets, defined in eq. (2) are computed and results are distance matrices, which present the geometrical properties of compared proteins. The idea of DALI method [12], proposed by Holm and Sander is that similar 3D structures have similar inter-residue distances. This method uses the distance matrix as a 2D representation of 3D structure, which contains all pair wise distances between all residues centres i. e. Cα atoms of a protein. Szustakowski and Weng [13] also use intramolecular distance matrices, based on elastic similarity score developed by Holm and Sander [12], to evaluate the alignments between compared proteins. Another type of matrix, which is applied as a model, is the contact map. The contact map is a matrix of distance pairs between atoms, residues or secondary structure elements of proteins according the preferred model for analysis. The comparison of two contact maps is defined as the alignment of the set A of structure elements from the one contact map that describes first protein with corresponding set B from the other contact map from the second protein. Aligned components from

3 the sets are considered equivalent. In maximal contact map overlap problem, the degree of similarity between two proteins is defined as the number of equivalent contacts between the proteins. This number is called the overlap of the contact maps and the goal is to maximize this value. The Max CMO problem is proven to be NP-complete [14]. Carr and Hart [15] use a way for simplification of this representation of protein s structure and define a contact as a pair of residues that are closer than a given threshold, which ranges usually between 2 and 9 Angstroms. The result is a binary contact map, which has the advantage that structural properties of proteins can be more easily visualized and compared. The detailed models, discussed above have the disadvantage to be complicated for comparison, because of the huge number of elements to be compared (Cα atoms, fragments of Cα atoms, or distances between them) the problem belongs to NP-complete class. It is challenge for comparison methods, which use these models to be fast and exact at the same time. Different simplifications are searched to decrease the complexity of the model, as Carr and Hart proposed or heuristic methods for comparison are used to achieve reasonable times. Another approach for hastening the comparison is to use not so detailed model and to examine the protein structure at the level of helices and sheets, which are secondary structure elements. Vectors of secondary structure elements are preferred for the initial model in MATRAS [16]. Kawabata and Nishikawa use the Markov transition model for evolution, which is similar to the Dayhoff s substitution model [17] between amino acids. The transition probability matrix of the Markov transition is generated by transforming the numbers of structure transitions within pairs of proteins with similar sequences. The protein structure is examined first at the level of secondary structure elements and then the similarity is detailed at the residues level. Sets, defined with eq. (2) and eq. (6) are used to construct the models of the proteins. Other methods use graph models for protein secondary structure representation. When the scope of interest is the atomic level and the average number of objects, which have to be represented is about (the common number of amino acids in a protein with one chain) the graph model is not appropriate. Such number of vertices is huge enough and the time for comparison would be enormous. The situation is different when the level is high and secondary structure elements are represented as vertices of a graph the average number is about 10 20, which makes the graph model suitable. VAST [18] such as MATRAS, first detects and aligns secondary structure elements and then refines this alignment by finding the residue equivalences, but VAST uses graph model. All pairs of secondary structure elements (one from each protein) that are of the same type are presented as vertices in a graph. An edge is defined between two vertices if the distance and angle between corresponding secondary structure elements are within some threshold. The resulting model of the proteins is a graph, which is composed of structure elements, defined with eq. (6). The graph model represents the correspondence between pairs of secondary structure elements that have the same type, relative orientation and connectivity. Taylor [19] also proposes graph model for protein structure comparison. The model is defined by the interaction between secondary structure elements, which are presented with their line segments, and other geometric properties of the protein molecules. The degree of interaction between two line segments is evaluated by the degree of overlap between the segments and these interactions are presented with a bipartite graph. igure 2. Line segment overlap measure Two line segments corresponding to secondary structure elements are shown (A -> B and C -> D) as thick lines in figdure 2, with their mutually perpendicular connecting line (p and q). A series of fine lines cover the span in which the line segments overlap, the end points of which are equidistant from their corresponding ends of the mutual perpendicular. A measure of interaction is calculated from this as a summation of the lengths (x) of these lines. Bipartite graph is constructed to compare protein structure P A and protein structure P B in the method proposed by Wang, Makedon and ord [20]. Parts of the vertices in this graph represent structural elements form both proteins left part for protein P A and right part for protein P B respectively. In contrast with the previous discussed graph models, here the level of representation can vary - the structural elements can be all atoms, only Cα atoms, amino acids residues or secondary structure elements. Each vertex is connected with all of the vertices in the opposite part. The weight of an edge between two vertices is defined by the similarity measure, which can include geometric and chemical properties of compared protein structures. Given the two sets of structure elements A and B for protein P A and protein P B, undirected weighted bipartite graph G (V, E) can be constructed, where V = A U B and E = e }, for i = 1,2,...n and j = 1,2,... m figure 3. igure 3. Bipartite graph matching proposed by Wang, Makedon and ord { ij

4 Each edge e ij corresponds to an weighted connection between à i and b j, and the weight w(e ij ) shows the degree of similarity between à i and b j. Edges between nodes from the same part are not allowed. In the graph model proposed by Krissinel and Henrick [21] the secondary structure elements (helices and strands eq. 6) are used as graph vertices with composite labels, which have a part for the type of the element and a part for the number of residues, which compose it. Any two vertices of the graph are connected by an edge, whose label describes the geometry of mutual position an orientation of the connected elements. igure 4 shows the properties, which are considered when the graph is constructed. Vertices v i and v j are represented by vectors r SSE ; edge e ij connects their centers. Edge length k p ij and angles α ij, k = 1..4, define mutual positions and orientations of all vertices in the graph. Models for proteins, which examine the structure at the igure 4. Properties of vertices and edges of the SSE graph level of helices and sheets/strands, are more compact and easy for construction and service. The disadvantage of using only a model of secondary structure elements is that the information may be not enough to make precise comparison. This is the reason why many methods use first this model for fast comparison and then refine it with detailed comparison at the atomic level, but with the alignment of secondary structure elements already available. At the same time geometric properties of the structures are considered when the model at the SSE level is constructed by detecting the mutual position and orientation of secondary structure elements each against the other including distances, angles, connectivity and overlaps. B: Comparison Algorithm /Strategy The variety of models for protein structure representation brings a variety of comparison algorithms. There are cases, when the models are almost the same, but the serving algorithms are different or the models are different, yet the search strategy for equivalences is the same. Comparison algorithms can be grouped into different classes, each class with specific characteristics, which determine the advantages and disadvantages of the algorithms and their applications. According the dependence of the chain order alignment algorithms can be: sequence-order dependent use the order of atoms in the protein chain, thus reducing a problem to 3D curve matching. The comparison task becomes easier, when the order of the chain is considered. sequence-order independent the structural similarity between compared models is measured without requiring that each residue of the one protein to be structurally matched with the corresponding residue of the other protein. Since these algorithms do not exploit the chain order, they can detect nonsequential motifs in proteins, such as molecular surface motifs, especially binding sites. One of the important advantages of such algorithms is that they can be applied to other molecular structures (drugs for example), not only for proteins. Protein structure alignment algorithms can search global when the purpose is comparison of the molecules as a whole, or local similarity. Global comparison algorithms are mainly used when protein structure classification and identification of evolutionary links between distant homologues are needed. or the purposes of protein function prediction the local structural comparison methods are applied. Local structural comparison refers to the possibility of detecting a similar 3D arrangement of a small set of residues, possibly in the context of completely different protein structures. More detailed comparison between alignment algorithms is made here according the strategies, which they use: branchand-bound, dynamic programming, geometric hashing, genetic algorithms, subgraph isomorphism, bipartite graph matching technique, etc. Some methods may use combination of two strategies, when comparison is made at different levels. The proposed algorithms can be compared and evaluated according their complexity. The problem for protein structure comparison is NP-Complete and any reported achievements in this field are marked below. Branch-and-Bound is a widely used strategy for solving large-scale NP combinatorial optimization problems and many comparison methods preferred it. This technique consists of a systematic enumeration of all candidate solutions by using upper and lower estimated bounds of the quantity being optimized, while in the process of searching large subsets of useless candidates are discarded together at the same time.this is done by a recursive procedure that is used to extend initial candidate solutions or matching seed. The extension stops when the algorithm determines that the current path cannot lead to solutions that are better than the current best one. In such case, the recursion goes one step back and the candidate is extended in another direction or another candidate is selected. The running time of these methods depends dramatically on how similar the proteins to be compared are. If the structures are very similar, then there will be a large number of seed matches to explore. One of the comparison methods, which use such technique, is DALI. The alignment algorithm compares protein A and protein B in two steps: 1) their distance matrices are first decomposed into elementary contact patterns hexapeptide submatrices. All elementary contact patterns in protein P A are pair wise compared with all elementary contact patterns in protein P B. Similar contact patterns are stored in a non-exclu

5 sive list of pairs which is the raw material for structural alignment; 2) The goal of the second steps is to assemble pairs of contact patterns into larger consistent set of pairs (larger alignment), maximizing the similarity score. A Monte Carlo procedure is used to build up the full alignment. MATRAS also uses branch-and-bound algorithm for initial alignment of SSEs. Then a residue-based alignment is iteratively performed by dynamic programming using the previous results to refine them. Combinatorial Extension CE, proposed by Shindyalov and Bourne, finds an optimal alignment between two protein structures using combinatorial extension of an alignment path, defined by aligned fragment pairs. It is based on local similarity detection. The algorithm first applies rigid-body superpositioning of the fragment pairs. Then it tries to extend this alignment using a greedy heuristics followed by an optimization of the best alignment. ATCAT also alignes fragment pairs and then uses dynamic programming to connect them, while considering the protein molecule flexibility. ATCAT aligns flexible protein molecules by including the possibility of twists in the peptide backbone within the alignment algorithm. This allows an alignment of two domains that are structurally similar but have local structural differences that preclude a full alignment when each domain is treated as a rigid body. lexprot is a sequence-order dependent proposal for alignment of two proteins structures; one of them can be a flexible molecule. irst lexprot detects congruent fragment pairs - one from each protein, which can be superimposed with minimum RMSD. Matching atom pairs are extended, following the protein backbone with one or more atom pairs until the RMSD and the length of matching fragments are within some thresholds. Then lexprot composes an acyclic directed graph, where vertices are fragment pairs and edges show the order of fragments (according the Amino Acid sequence). Weights are assigned to edges to award long matching fragments and to penalize big gaps. Single source shortest paths algorithm is applied to this graph and proceeded paths are compared regarding the total size and minimum RMSD. The last step of the algorithm clusters the consecutive fragment pairs that have a similar 3D transformation. The first step of the algorithm takes O ( n 2 ), the second step takes O ( n 4 ), and the clustering step takes O ( n 2 ). Thus, the overall complexity is bounded by O ( n 4 ), where n is the number of Cα atoms in the larger protein. (8) T : d a if if d = 0, d = 1, if d = 2, if d = 3, if 4 d 6, if 7 d 11, if 12 d 20, if 21 d 49, if 50 d 100, if d 101. Bostick and Vaisman use comparison of 3D arrays to find similarity between protein molecules. They apply a transformation T on the Delaunay simplices, which are the models for compared proteins. T is used to map each length of an edge of a simplex to an integer value. Equation (8) defines the transformation. Each simplex is mapped into a 3D array M, where M npr is the number of simplices, whose edges satisfy the following conditions: (1) the Euclidean length of each simplex edge is less than 10 A; (2) d ij = n; (3) d jk = p; (4) d ki = r. The comparison of proteins, presented with arrays Ì and ' M is computed by the evaluation of the difference between their corresponding elements (Q is the score value and measure for similarity): ' (9) Q = M npr M npr r= 1 p= 1 n= 1 The geometric hashing [22] is a technique, which originates from the computer vision [23], [24] and first has been applied for structural biology data comparison by ischer [25]. The coordinates of one structure are expressed relative to several local reference frames, which can be any triplets of points (Cα atoms) of the protein. Since the points used as a reference belong to the structure itself this representation is invariant under both rotation and translation. The positions in which the other points are situated for each frame are used as keys in a hash table. When such representation has been calculated it is possible to compare two structures using a series of fast searches - the hash table is queried with structural features from the second molecule. Each hit in the table identifies a transformation between the two molecules. Transformations that eceive many hits are those that are likely to superimpose essential structural features of both molecules. The complexity of this algorithm is O ( n 3 ), where n is the numbers of atoms, which compose the protein to be compared. The method does not assume the order of the protein Cα atoms and has the advantage of sequence-order independent algorithms - can be applied to any molecule type, not only to proteins. Some of the alignment techniques, which examine the protein structure at the atomic level, can be summarized in two distinguished steps: 1) Generation of all initial superpositions and 2) Identification of optimal alignment by RMSD. MINRMS uses comparison of all consecutive fragments of four residues from one protein with all such residues from the other to generate initial superspositions. Then, dynamic programming algorithm is used to evaluate the similarity at this step between two protein structures. MINRMS generates and fills a score pyramid, which is composed of matrices, stacked on

6 top of one another figure 5. The lowest layer represents the score matrix for alignment of a pair of residues. Each layer above is the score matrix for alignment of newly added pair of residues and is evaluated using the matrix from the layer below. The value of a cell in the pyramid is derived from one of three adjacent cells: by row, by column or by diagonal from the layer below. The optimal alignment can be reconstructed by backtracking the maximum value of each scoring matrix. The complexity of MINRMS algorithm is O ( m 3 n 2 ), where m and n are the number of Cα atoms in both protein molecules to be compared. Akutsu proposes two algorithms RAND and RAG, which consist of the two steps, described above. RAND algorithm finds the initial superposition by a random sampling technique, while RAG uses fragment search method. The part, which is common for RAND and RAG, is the second step the use of bipartite graph matching technique for protein structure alignment. igure 5. Alignment with MINRMS After finding an initial superposition between A and B, a bipartite graph G ( A, B, E) is composed by the sequences A and B, where Å is the set of edges between A and B figure 6. The edge ( ai, bj ) AxB is contained in Å, if the distance between ai and bj is less than δ (δ = A). The complexity of this method is O (mns), where m and n are the sequence lengths and s is the number of the initial superpositions. The genetic algorithms are a general purpose, global optimization technique that provides promising results in the entire area of computational structural biology [26]. The genetic algorithms mimic the process of evolution. A generation within this process comprises a set of configurations that are coded via chromosomes. Chromosomes are subjected to manipulation by some genetic operators such as crossover and mutation. The information content of the chromosomes varies depending on the application. Typically, it comprises the intramolecular matches or a coding of the orientation degrees of freedom and a coding of the torsion degrees of freedom in the case of considered molecular flexibility. The fitness function used to enable the process of selection typically comprises an efficiently computable similarity function. Szustakowski and Weng propose a genetic algorithm to determine the optimal alignment between protein structures. The genetic algorithm is applied after the initial superposition of all secondary structure elements from compared proteins which igure 6. Bipartite graph matching, proposed by Akutsu

7 generate all initial populations of possible alignments between secondary structure elements. Each population is altered with genetic operators mutate, hop, swap and crossover to make recombination between randomly chosen alignments. Resulting alignments are accepted or rejected according rules, which are defined to obey the validity of the alignment. Carr and Hart, whose goal is to maximize the number of equivalent contacts (which number defines the degree of similarity) between compared proteins in their contact maps, also use a genetic algorithm to solve maximal contact map overlap problem. The chromosome is presented by a vector c of dimension n, where each position can take values in the range [-1,..., m-1], where m is the length of the longer protein, n is the length of the shorter. The position j in c, c[j] specifies that the j-th residue in the longer protein is aligned to the c[j]-th residue of the shorter protein. The value -1 in the same position specifies that j-th residue in one protein is not aligned to any of the residues of the other protein. An important aspect of the method is that unfeasible configurations are not allowed. or this purpose genetic operators are defined to preserve the feasibilities. When models of compared proteins are graphs there are two different types of comparison algorithms, which can be applied subgraph isomorphism detection or bipartite graph matching technique. In complexity theory, the maximum common subgraph-isomorphism (MCS) is an optimization problem that is known to be NP-hard and this method is suitable for graph models with small number of vertices, in other cases the comparison would be slow and clumsy. That is the reason common subgraph isomorpism to be searched for models, which represent protein structure at the level of secondary strucure elements. After this initial alignment some of the algorihms make a refinement at the atomic level. VAST is one of the techniques which uses subgraph isomorphism to detect similiarity between protein molecules. The model of the proteins is a graph, which represents the correspondence between pairs of secondary structure elements that have the same type, relative orientation and connectivity. This graph is searched for cliques and the detection of cliques is the starting point for alignment. This initial alignment is extended to a residue level one with Gibbs sampling technique. Since the max clique problem is known to be NP-hard, it is only feasible for small graphs with about less than 30 vertices. SSM [21] is another tool which includes an original procedure of matching graphs built on the protein s secondarystructure elements, followed by an iterative three-dimensional alignment of protein backbone Cα atoms. This matching technique uses a described by the authors optimal backtracking algorithm for common subgraph isomorphism [27], CSIA, which represents an advancement of the widely known algorithm of Ullman [28] for exact subgraph isomorphism. The time complexity of CSIA is bounded by O( m n+ 1 n), which makes it applicable to graphs having up to n, m 70 unlabelled vertices. One of the main difficulties when aligning structures with sequences, which are not similar, is to determine the correspondence between equivalent residues. Typically the process is either iterated between residue assignment and a minimization step or it uses a stochastic optimization procedure to find the maximal subset of equivalent residues within some constraints. Some methods first use fast and not so accurate filter to define the initial alignment and slower and more accurate residuebased alignment, which is performed only for subsets, which satisfy the first step. One of the most intuitive methods for initial alignment is first to detect and align secondary structure elements and then to refine this alignment by finding the residue equivalences. This method is used by VAST and SSM. In contrast, Taylor uses representation at the secondary structure level without any refinement at the atomic one and BGMT instead of heavy search for the full subgraph isomorphism to construct a fast filter for protein structure comparison. The interactions between secondary structure elements are presented with a graph and a bipartite graph-matching algorithm ( stable marriage ) is used for searching between the two sets of interactions. This method takes 1/ 10 of a second for a typical comparison between two protein structures and this makes it suitable as a fast filter for slower and more complex algorithms for protein structure comparison. Wang, Macedon and ord also use bipartite graph matching in their framework for finding correspondences between structural elements (atoms residues or secondary structure elements) in two proteins. They define a maximum weight matching as a matching such that the sum of the weights of the edges in the matching is maximized. Then a maximum weight maximum cardinality matching is a matching with the maximum number of edges with the greatest weight. igure 3 shows an example of the maximum weight bipartite matching. In the context of protein structural elements correspondence, a maximum weight matching would return the correspondence with the maximum weight, but there is no guarantee of maximum cardinality. Therefore, some elements in the smaller protein may not be matched to any element in the other protein. In other words, a maximum weight matching favors good local matches. On the other hand, a maximum weight maximum cardinality matching, would always return the matching with maximum cardinality, even if some edges in the matching have relatively small weights. It guarantees that every element in the smaller protein will be matched to an element in the other protein. In other words, a maximum weight maximum cardinality matching favors good global matches. The best-known strongly polynomial time bound algorithm for weighted bipartite matching is the classical Hungarian method due to Kuhn [29], which runs in time O( V ( E + V logv ). The weighted bipartite matching algorithms can be implemented efficiently, and can be applied to graphs of reasonably large size (about 100,000 vertices) [30]. When the comparison algorithms produce their results a similarity measure is needed to evaluate them and to make them suitable for any conclusions for presence or absence of similarity between compared objects. C. Similarity Measure It is important for similarity measure to be sensitive and to rank more similar structures higher than more different ones when the measure is positive and the opposite, when the mea

8 sure is negative. Another important fact is that structural alignment lacks a theory that defines and describes the distribution of structural similarity scores. The most commonly used metric in this category is the root-mean-square deviation, RMSD, in which the root-meansquare distance between corresponding residues is calculated after an optimal rotation of one structure to another. This metric has a lower score if the structures are similar and higher in the other case. RMSD is defined as follows: RMSD A B 1 N N i 1 x i (10) (, ) = ( ( ) = y( i) 2 ) In eq. (10) N is the number of aligned atoms and x and y are their coordinates. Since the RMSD weights the distances between all residue pairs equally, a small number of local structural deviations could result in a high RMSD, even when the global topologies of the compared structures are similar. urthermore, the average RMSD of randomly related proteins depends on the length of compared structures, which makes the absolute magnitude of RMSD meaningless [31]. Carugo showes [32] that the root-mean-square distance (RMSD) of an alignment is linearly related to the resolution of the compared domains. Alignment of two domains with low or significantly different resolution would result in a higher RMSD than alignment of two domains with high resolution. In order to be the RMSD meaningful the number of aligned residues has to be considered and the metric should be normalized. or their relative RMSD, Betancourt and Skolnick [31] normalize the RMSD by the average RMSD from random structure pairs with similar size. (11) RRMSD ( A, = RMSD( A, / D( A, RMSD ( A, is the RMSD between proteins A and B and D ( A, is an estimate of the average RMSD between two random protein fragments with the same length when the proteins are aligned. or the definition of RMSD 100 score [33] Carugo and Pongor divide the RMSD by a factor of 1+ N / 100, with N representing the protein length. RMSD( A, (12) RMSD100 ( A, = 1+ ln N /100 Zemla [34,35] propose two different metrics which are used to evaluate protein structure prediction and service as major assessment criteria for the results of CASP experiments: LCS the longest common segment and GDT global distance test. The LCS is a measure that shows the longest continuous segment that can be aligned with RMSD between Cα atoms less than a specified value 1, 2 or 5 A. The GDT score is calculated as the largest set of amino acid residues Cα atoms in the model structure falling within a defined distance cutoff of their position in the experimental structure. It is typical to calculate the GDT score under several cutoff distances (1, 2, 4, and 8 Å are used in the CASP5 experiment), and scores generally increase with increasing cutoff. GDT is aimed to identify any accurately, not necessary continuous similar substructures. It attempts to find the maximum number of residues in the one protein that can be superimposed over the other protein within a given threshold. This measure can be applied to find the largest similar subsets. Levitt and Gerstein [36] propose different structure similarity measure, which takes into account the numbers of gaps, when the structures are aligned: (13) S str = M ( 1/(1 + ( dij / d 0 ) ) N gap / 2) d ij in eq. (13) is the distance between aligned Cα atoms, N gap is the number of gaps, M = 20 and d 0 = 20 A. This measure and GDT are bases for other protein structure similarity measures. With MaxSub [37] Siew, Elofsson, Rychlewski and ischer try to identify the maximum substructure in which the distances between equivalent residues of two structures after superposition are below some threshold value, such as 3.5 Å. MaxSub counts only the residues in the substructure and the spatial information of the templates outside this substructure is omitted. This measure is based on similar principles as GDT. It computes a single scalar in the range of 0 to 1, which measures the similarity between compared structures. This scalar is a normalization of the size of the most similar subset and is computed using the variation of a formula, suggested by Levitt and Gerstein [36]. And while RMSD considers intermolecular distances Holm and Sander propose DALI similarity score [12], which is based upon intramolecular rather than intermolecular distances: (14) S( A, = i A B dij dij d 0.2. e avg AB d j avg 2 AB ( / 20) 2 Kawabata and Nishikava [17] propose theory to evaluate protein structure similarity by log-odds score, which is based on the Markov transition model of evolution. Their similarity score between structures i and j is defined as: P( i j) (15) S( i, j) = log P( j) P( i j) is the probability that structure i changes to structure j during the evolutionary process, and P(j) is the probability that structure j appears by chance. This is a reasonable definition of structure similarity, especially for finding evolutionarily related (homologous) similarity. The probability P( i j) is estimated by the Markov transition model, which is similar to the Dayhoff s substitution model between amino acids. To evaluate pairwise similarities between proteins A and B Kawabata and Nishikawa define the next equation (10): S( A, S min (16) R( A, = * 100 S max Smin S(A, is the log-odds score between proteins A and B and Smin and Smax are the minimum and maximus scores. This score is similar to that, proposed by eng and Doolittle [38]. Tempalte modeling or TM-score [39] also uses a variation of Levitt Gerstein (LG) weight factor that weights the residue pairs at smaller distances relatively stronger than those at larger

9 distances. or that reason the TM-score is more sensitive to the global topology than to the local structural variations. Its value is normalized in a way that the score magnitude relative to random structures is not dependent on the protein s size, with a value of 0.17 for an average pair of randomly related structures. or aligning a template model to a native structure TMscore is defined as follows: (17) 1 TM score = Max L L T N i= 1 1 d 1 + d i 0 2 L N is the length of the native structure, L T is the length of the aligned residues to the template structure, d i is the distance between the i-th pair of aligned residues and d 0 is a scale to normalize the match difference. Yang and Honig propose the Protein structural distance [40]: (, ) a s A B log ( max(, ) (, ), ) log a b s A A RMSD PSD A B = + x y In previous equation a is the number of the SSEs for protein A, b is the number of SSEs for protein B, s(a, A) is the self-alignment score for protein A, s (A, is the score for alignment of SSEs of both proteins, x and y are adjustable parameters. The PSD is designed to describe relationships between protein structures in quantitative rather than descriptive terms and is applicable both when two structures are very similar, and when they are very different. It is calculated with a structural alignment procedure that uses double dynamic programming to align secondary structure elements and an iterative rigid body superposition that minimizes the root-mean-square deviation of Cα atoms. Krasnogor and Pelta showed [41] how Universal Similarity Metric (USM), introduced by Li in [42] can be used to calculate similarities between protein pairs. The USM approximates every possible similarity metric (i.e. those that exist today and those that are yet to be defined) and is based on the concept of Kolmogorov complexity. The Kolmogorov complexity K( ) of an object o is defined by the length of the shortest program for a Universal Turing Machine U that is needed to output o. It is an objective measure of the amount of information contained in a given object. A related measure is the conditional Kolmogorov complexity of o1 given o2, which measures how much information is needed to produce object 1 if object 2 is known: (18) K ( o 1 o 2 ) = min{ P, P program, U ( P, o 2 ) = o 1 } The information distance between two objects is equivalent to: (19) ID ( o1, o2 ) = max{ K( o1 o2 ), K ( o2 o1 )} The Universal Similarity Measure is a proper metric; it is universal and also normalized. The metric is formally defined as: 2 2 (20) * * max{ K ( o1 o2), K ( o2 o1 )} d ( o1, o2) = max{ K ( o1), K ( o2 )} * 2 1, o indicates a shortest program for 1 o ( or 2 o ). Magnitudes of some of the metrics, discussed above, depend on the evaluated proteins size [39], which makes the absolute magnitude of these scoring functions meaningless. To eliminate the dependence on protein size, some of the authors (Levitt and Gerstein, Ortiz, Strauss and Olmea [43]) convert their structure alignment score into a statistical significance score, called the P-value, on the basis of the statistics of their random structure database. VAST also uses a P-value, which is based upon the likelihood of aligning a given number of secondary structure elements with a certain length. A Z-score is defined and used by Shindyalov and Bourne for their CE and by Holm and Sander for DALI. Z-score of CE is based upon a Gaussian distribution of similarity score between aligned fragments of proteins. The information about the protein structure comparison methods, which are discussed in this paper, is summarized in table 1. The properties of the three major components of each method model of the protein structure, comparison algorithm and similarity measure are considered. Based on the preferred model, appropriate algorithm and chosen similarity score most of protein structure comparison methods will detect similar proteins and will possess positive score for their alignment. The situation is different when the comparison is made between less similar protein structures or the so called challenging sets [44]. In such case the alignment algorithms sometimes produce different results about the degree of detected similarity. Some of works, which are dedicated to that problem, are [44,45,46]. Novotni proposes [45] a comprehensive and critical comparative analysis and evaluation of 11 publicly available, Webbased servers for automatic fold comparison. The conclusion is that the tested algorithms differ in their performance - i.e., how well established structural similarities are recognized. Shierk and Pearsor also examine the sensitivity and selectivity of protein structure comparison methods in [46]. Seven protein structure comparison methods and two sequence comparison programs are evaluated on their ability to detect either protein homologs or domains with the same topology (fold) as defined by the CATH structure database. The programs show distinct differences in their misclassifications according to structural class. After analysis of the results, the authors conclude that with some exceptions, the relative performance of the methods tested is the same regardless of the error model, and that these results accurately reflect the general characteristics of the methods. Mayr, Domingues and Lackner [44] also propose comparative analysis of protein structure alignments methods. They analyze and compare several methods regarding the performance in the identification of structurally/evolutionary related proteins. Three sets of pairs of structurally related proteins are used, including remote homologous proteins according to the

10 Name Properties of the model Properties of the comparison algorithm Proteins are presented with sequences Generation of all initial superpositions RAND and RAG [6] of their Cá atoms as points in 3D and refinement with bipartite graph space. matching technique. MINRMS [7] CE [8] lexprot [9] ATCAT [10] Delaunay tessellation [11] DALI [12] Szustakowski and Weng s method [13] Carr and Hart s method [15] MATRAS [16] VAST [18] Taylor [19] MWBM [20] SSM [21] Geometric Hashing [25] Proteins are presented with sequences of their Cá atoms as points in 3D space. Proteins are presented with fragments of amino acids with fixed size. Proteins are presented with fragments of amino acids with fixed size. Proteins are presented with fragments of amino acids with fixed size. Topology is presented with Delaunay tessellation composition of simplexes. Cá atoms, which are the nearest neighbors are grouped and define a simplex. Proteins are presented at the atomic level with distance matrices. Proteins are presented at the atomic level with distance matrices. Proteins are presented at the atomic level with contact maps. Different levels of representation are used: secondary structure elements for initial model, followed by a model at the amino acid level. Markov transition model of evolution is applied. Different levels of representation are used: secondary structure elements for initial model, followed by a model at the amino acid level. Graph model is preferred here. Proteins are presented with bipartite graph with vertices, which present secondary structure elements. The level is chosen among the following: atoms, residues or secondary structure elements. Proteins are presented with graph model, whose vertices are secondary structure elements. Hash table with positions of all points (Cá atoms), situated according different reference frames. SCOP database (ASTRAL40 set), SISY set - derived from the SISYPHUS database, which includes 69 protein pairs and 40 pairs that are challenging to align (RIPC set). Two methods are applied to align the proteins in the ASTRAL40 set and the resulting alignments agree on average in more than half of the aligned positions. 6 methods are compared using the SISY and RIPC sets. The alignments generated by the different methods on average match more than half of the reference alignments in the SISY set. The alignments obtained in the more challenging RIPC set tend to differ considerably and match reference alignments less successfully than the SISY set alignments. The authors come to the conclusion that the alignments produced by different methods tend to agree to a considerable extent, but the agreement is lower for the more challeng- Generation of all initial superpositions, dynamic programming to evaluate the similarity and refinement with min RMSD. Combinatorial extension of the optimal path with greedy algorithm. Superposition with min RMSD, singlesource shortest path and clustering. It takes flexibility into account. Dynamic programming. It takes flexibility into account Comparison of corresponding elements in 3D arrays Branch-and-Bound strategy Similarity measure RMSD RMSD Z-score RMSD RMSD Q-score Z-score Genetic algorithm Elastic similarity score Genetic algorithm Maximum contact map overlap Branch-and-Bound strategy for Log-odds score secondary structure elements comparison and dynamic programming for residue-based alignment. Subgraph isomorphism algorithm by clique detection. P-value Bipartite graph matching technique. Score, which include type, distance, angle and packing Bipartite graph matching technique. Weight of the match Common subgraph isomorphism detection. Geometric hashing. Table 1. Protein structure comparison methods P-value RMSD ing pairs. The results for the comparison to reference alignments are encouraging, but also indicate that there is still room for improvement. Conclusion The comparison of protein structures is a task with many applications it is a part of protein classification methods, protein structure and protein function predictions. Different approaches are used when models are constructed and suitable algorithms are chosen according to a certain application. The detailed models are precise and bring a lot of structural information, but sometimes it exceeds the necessary one

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

Structural Alignment of Proteins

Structural Alignment of Proteins Goal Align protein structures Structural Alignment of Proteins 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Motif Prediction in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix) Computat onal Biology Lecture 21 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely

More information

Bio nformatics. Lecture 23. Saad Mneimneh

Bio nformatics. Lecture 23. Saad Mneimneh Bio nformatics Lecture 23 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr

More information

Selecting protein fuzzy contact maps through information and structure measures

Selecting protein fuzzy contact maps through information and structure measures Selecting protein fuzzy contact maps through information and structure measures Carlos Bousoño-Calzón Signal Processing and Communication Dpt. Univ. Carlos III de Madrid Avda. de la Universidad, 30 28911

More information

Finding Similar Protein Structures Efficiently and Effectively

Finding Similar Protein Structures Efficiently and Effectively Finding Similar Protein Structures Efficiently and Effectively by Xuefeng Cui A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Efficient Protein Tertiary Structure Retrievals and Classifications Using Content Based Comparison Algorithms

Efficient Protein Tertiary Structure Retrievals and Classifications Using Content Based Comparison Algorithms Efficient Protein Tertiary Structure Retrievals and Classifications Using Content Based Comparison Algorithms A Dissertation presented to the Faculty of the Graduate School University of Missouri-Columbia

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

A Novel Text Modeling Approach for Structural Comparison and Alignment of Biomolecules

A Novel Text Modeling Approach for Structural Comparison and Alignment of Biomolecules A Novel Text Modeling Approach for Structural Comparison and Alignment of Biomolecules JAFAR RAZMARA, SAFAAI B. DERIS Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia 83,

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Analysis and Prediction of Protein Structure (I)

Analysis and Prediction of Protein Structure (I) Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng

More information

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics. Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond

More information

Protein Structure Prediction

Protein Structure Prediction Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

3D HP Protein Folding Problem using Ant Algorithm

3D HP Protein Folding Problem using Ant Algorithm 3D HP Protein Folding Problem using Ant Algorithm Fidanova S. Institute of Parallel Processing BAS 25A Acad. G. Bonchev Str., 1113 Sofia, Bulgaria Phone: +359 2 979 66 42 E-mail: stefka@parallel.bas.bg

More information

Protein Structure Prediction, Engineering & Design CHEM 430

Protein Structure Prediction, Engineering & Design CHEM 430 Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

Non-sequential Structure Alignment

Non-sequential Structure Alignment Chapter 5 Non-sequential Structure Alignment 5.1 Introduction In the last decades several structure alignment methods have been developed but most of them ignore the fact that structurally similar proteins

More information

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach

Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach Author manuscript, published in "Journal of Computational Intelligence in Bioinformatics 2, 2 (2009) 131-146" Reconstructing Amino Acid Interaction Networks by an Ant Colony Approach Omar GACI and Stefan

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome

More information

Basics of protein structure

Basics of protein structure Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004 CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004 Lecture #2: 1 April 2004 Topics: Kinematics : Concepts and Results Kinematics of Ligands and

More information

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon A Comparison of Methods for Assessing the Structural Similarity of Proteins Dean C. Adams and Gavin J. P. Naylor? Dept. Zoology and Genetics, Iowa State University, Ames, IA 50011, U.S.A. 1 Introduction

More information

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri RNA Structure Prediction Secondary

More information

A General Model for Amino Acid Interaction Networks

A General Model for Amino Acid Interaction Networks Author manuscript, published in "N/P" A General Model for Amino Acid Interaction Networks Omar GACI and Stefan BALEV hal-43269, version - Nov 29 Abstract In this paper we introduce the notion of protein

More information

The Universal Similarity Metric, Applied to Contact Maps Comparison in A Two-Dimensional Space

The Universal Similarity Metric, Applied to Contact Maps Comparison in A Two-Dimensional Space The Universal Similarity Metric, Applied to Contact Maps Comparison in A Two-Dimensional Space by Sara Rahmati A thesis submitted to the School of Computing in conformity with the requirements for the

More information

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB) Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

Bayesian Models and Algorithms for Protein Beta-Sheet Prediction

Bayesian Models and Algorithms for Protein Beta-Sheet Prediction 0 Bayesian Models and Algorithms for Protein Beta-Sheet Prediction Zafer Aydin, Student Member, IEEE, Yucel Altunbasak, Senior Member, IEEE, and Hakan Erdogan, Member, IEEE Abstract Prediction of the three-dimensional

More information

Molecular Modeling lecture 17, Tue, Mar. 19. Rotation Least-squares Superposition Structure-based alignment algorithms

Molecular Modeling lecture 17, Tue, Mar. 19. Rotation Least-squares Superposition Structure-based alignment algorithms Molecular Modeling 2019 -- lecture 17, Tue, Mar. 19 Rotation Least-squares Superposition Structure-based alignment algorithms Matrices and vectors Matrix algebra allows you to express multiple equations

More information

Ant Colony Approach to Predict Amino Acid Interaction Networks

Ant Colony Approach to Predict Amino Acid Interaction Networks Ant Colony Approach to Predict Amino Acid Interaction Networks Omar Gaci, Stefan Balev To cite this version: Omar Gaci, Stefan Balev. Ant Colony Approach to Predict Amino Acid Interaction Networks. IEEE

More information

Protein Structure Overlap

Protein Structure Overlap Protein Structure Overlap Maximizing Protein Structural Alignment in 3D Space Protein Structure Overlap Motivation () As mentioned several times, we want to know more about protein function by assessing

More information

Protein structure similarity based on multi-view images generated from 3D molecular visualization

Protein structure similarity based on multi-view images generated from 3D molecular visualization Protein structure similarity based on multi-view images generated from 3D molecular visualization Chendra Hadi Suryanto, Shukun Jiang, Kazuhiro Fukui Graduate School of Systems and Information Engineering,

More information

Structure to Function. Molecular Bioinformatics, X3, 2006

Structure to Function. Molecular Bioinformatics, X3, 2006 Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

Prediction and refinement of NMR structures from sparse experimental data

Prediction and refinement of NMR structures from sparse experimental data Prediction and refinement of NMR structures from sparse experimental data Jeff Skolnick Director Center for the Study of Systems Biology School of Biology Georgia Institute of Technology Overview of talk

More information

A Simple Topological Representation of Protein Structure: Implications for New, Fast, and Robust Structural Classification

A Simple Topological Representation of Protein Structure: Implications for New, Fast, and Robust Structural Classification PROTEINS: Structure, Function, and Bioinformatics 56:487 501 (2004) A Simple Topological Representation of Protein Structure: Implications for New, Fast, and Robust Structural Classification David L. Bostick,

More information

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP) Joana Pereira Lamzin Group EMBL Hamburg, Germany Small molecules How to identify and build them (with ARP/wARP) The task at hand To find ligand density and build it! Fitting a ligand We have: electron

More information

proteins Refinement by shifting secondary structure elements improves sequence alignments

proteins Refinement by shifting secondary structure elements improves sequence alignments proteins STRUCTURE O FUNCTION O BIOINFORMATICS Refinement by shifting secondary structure elements improves sequence alignments Jing Tong, 1,2 Jimin Pei, 3 Zbyszek Otwinowski, 1,2 and Nick V. Grishin 1,2,3

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling

More information

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Lecture 18 Generalized Belief Propagation and Free Energy Approximations Lecture 18, Generalized Belief Propagation and Free Energy Approximations 1 Lecture 18 Generalized Belief Propagation and Free Energy Approximations In this lecture we talked about graphical models and

More information

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia

More information

Better Bond Angles in the Protein Data Bank

Better Bond Angles in the Protein Data Bank Better Bond Angles in the Protein Data Bank C.J. Robinson and D.B. Skillicorn School of Computing Queen s University {robinson,skill}@cs.queensu.ca Abstract The Protein Data Bank (PDB) contains, at least

More information

Reconstruction of Protein Backbone with the α-carbon Coordinates *

Reconstruction of Protein Backbone with the α-carbon Coordinates * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 26, 1107-1119 (2010) Reconstruction of Protein Backbone with the α-carbon Coordinates * JEN-HUI WANG, CHANG-BIAU YANG + AND CHIOU-TING TSENG Department of

More information

Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are

Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are 1 Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are as close to each other as possible. Structural similarity

More information

Introduction to" Protein Structure

Introduction to Protein Structure Introduction to" Protein Structure Function, evolution & experimental methods Thomas Blicher, Center for Biological Sequence Analysis Learning Objectives Outline the basic levels of protein structure.

More information

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC The precise definition of a dihedral or torsion angle can be found in spatial geometry Angle between to planes Dihedral

More information

Universal Similarity Measure for Comparing Protein Structures

Universal Similarity Measure for Comparing Protein Structures Marcos R. Betancourt Jeffrey Skolnick Laboratory of Computational Genomics, The Donald Danforth Plant Science Center, 893. Warson Rd., Creve Coeur, MO 63141 Universal Similarity Measure for Comparing Protein

More information

Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling

Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling 63:644 661 (2006) Multiple Mapping Method: A Novel Approach to the Sequence-to-Structure Alignment Problem in Comparative Protein Structure Modeling Brajesh K. Rai and András Fiser* Department of Biochemistry

More information

Protein Complex Identification by Supervised Graph Clustering

Protein Complex Identification by Supervised Graph Clustering Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie

More information

Heuristics for The Whitehead Minimization Problem

Heuristics for The Whitehead Minimization Problem Heuristics for The Whitehead Minimization Problem R.M. Haralick, A.D. Miasnikov and A.G. Myasnikov November 11, 2004 Abstract In this paper we discuss several heuristic strategies which allow one to solve

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

1.5 Sequence alignment

1.5 Sequence alignment 1.5 Sequence alignment The dramatic increase in the number of sequenced genomes and proteomes has lead to development of various bioinformatic methods and algorithms for extracting information (data mining)

More information

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018

More information

Efficient Processing of 3D Protein Structure Similarity Queries

Efficient Processing of 3D Protein Structure Similarity Queries Efficient Processing of 3D Protein Structure Similarity Queries PhD Confirmation Report Zi Helen Huang Supervisors: Prof. Xiaofang Zhou and Prof. Peter Bruza (DSTC) School of Information Technology and

More information

Probabilistic Arithmetic Automata

Probabilistic Arithmetic Automata Probabilistic Arithmetic Automata Applications of a Stochastic Computational Framework in Biological Sequence Analysis Inke Herms PhD thesis defense Overview 1 Probabilistic Arithmetic Automata 2 Application

More information

Ab-initio protein structure prediction

Ab-initio protein structure prediction Ab-initio protein structure prediction Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center, Cornell University Ithaca, NY USA Methods for predicting protein structure 1. Homology

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Automated Identification of Protein Structural Features

Automated Identification of Protein Structural Features Automated Identification of Protein Structural Features Chandrasekhar Mamidipally 1, Santosh B. Noronha 1, Sumantra Dutta Roy 2 1 Dept. of Chemical Engg., IIT Bombay, Powai, Mumbai - 400 076, INDIA. chandra

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Protein Structure: Data Bases and Classification Ingo Ruczinski

Protein Structure: Data Bases and Classification Ingo Ruczinski Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References

More information

Context of the project...3. What is protein design?...3. I The algorithms...3 A Dead-end elimination procedure...4. B Monte-Carlo simulation...

Context of the project...3. What is protein design?...3. I The algorithms...3 A Dead-end elimination procedure...4. B Monte-Carlo simulation... Laidebeure Stéphane Context of the project...3 What is protein design?...3 I The algorithms...3 A Dead-end elimination procedure...4 B Monte-Carlo simulation...5 II The model...6 A The molecular model...6

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html

More information

Residue Contexts: Non-sequential Protein Structure Alignment Using Structural and Biochemical Features

Residue Contexts: Non-sequential Protein Structure Alignment Using Structural and Biochemical Features Residue Contexts: Non-sequential Protein Structure Alignment Using Structural and Biochemical Features Jay W. Kim and Rahul Singh 2,* Department of Biology 2 Department of Computer Science, San Francisco

More information

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES Protein Structure W. M. Grogan, Ph.D. OBJECTIVES 1. Describe the structure and characteristic properties of typical proteins. 2. List and describe the four levels of structure found in proteins. 3. Relate

More information

Docking. GBCB 5874: Problem Solving in GBCB

Docking. GBCB 5874: Problem Solving in GBCB Docking Benzamidine Docking to Trypsin Relationship to Drug Design Ligand-based design QSAR Pharmacophore modeling Can be done without 3-D structure of protein Receptor/Structure-based design Molecular

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C.

CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C. CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring 2006 Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C. Latombe Scribe: Neda Nategh How do you update the energy function during the

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

FoldMiner: Structural motif discovery using an improved superposition algorithm

FoldMiner: Structural motif discovery using an improved superposition algorithm FoldMiner: Structural motif discovery using an improved superposition algorithm JESSICA SHAPIRO 1 AND DOUGLAS BRUTLAG 1,2 1 Biophysics Program and 2 Department of Biochemistry, Stanford University, Stanford,

More information

Some Problems from Enzyme Families

Some Problems from Enzyme Families Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems

More information

University of Toronto Department of Electrical and Computer Engineering. Final Examination. ECE 345 Algorithms and Data Structures Fall 2016

University of Toronto Department of Electrical and Computer Engineering. Final Examination. ECE 345 Algorithms and Data Structures Fall 2016 University of Toronto Department of Electrical and Computer Engineering Final Examination ECE 345 Algorithms and Data Structures Fall 2016 Print your first name, last name, UTORid, and student number neatly

More information

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Supporting Information Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Christophe Schmitz, Robert Vernon, Gottfried Otting, David Baker and Thomas Huber Table S0. Biological Magnetic

More information

Automated Identification of Protein Structural Features

Automated Identification of Protein Structural Features Automated Identification of Protein Structural Features Chandrasekhar Mamidipally 1, Santosh B. Noronha 1, and Sumantra Dutta Roy 2 1 Dept. of Chemical Engg., IIT Bombay, Powai, Mumbai - 400 076, India

More information

Towards Detecting Protein Complexes from Protein Interaction Data

Towards Detecting Protein Complexes from Protein Interaction Data Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,

More information

BLAST: Target frequencies and information content Dannie Durand

BLAST: Target frequencies and information content Dannie Durand Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

More information

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices Communities Via Laplacian Matrices Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices The Laplacian Approach As with betweenness approach, we want to divide a social graph into

More information

Multiple structure alignment with mstali

Multiple structure alignment with mstali Multiple structure alignment with mstali Shealy and Valafar Shealy and Valafar BMC Bioinformatics 2012, 13:105 Shealy and Valafar BMC Bioinformatics 2012, 13:105 SOFTWARE Open Access Multiple structure

More information