BIOINF 472 Drug Design 2 Jens Krüger and Philipp Thiel Summer 2014 Lecture 5: D Structure Comparison Part 1: Rigid Superposition, Pharmacophores
Overview Comparison of D structures Rigid superposition RMSD as distance measure Optimal rigid superposition RigFit Pharmacophores Definition Pharmacophore identification as a graph problem Bron Kerbosch algorithm for MAX_CLIQUE Other methods 2
Structure Comparison in D In contrast to fingerprint or graph based methods, D structure comparison also considers geometry Comparison of two structures A and B Consider the relative orientation of the atom positions in space (relative to each other) N N N
Structure Comparison in D Basically two variants Rigid Find an optimal superposition of A and B Compute similarity by some similarity measure Implicit consideration of flexibility possible by considering several conformers Semiflexible/flexible Compute alignment of A and B Partial or full flexibility of structures Compare quality of different alignments using a distance/similarity measure 4
Similarity in D form follows function Form follows function is a principle associated with modern architecture and industrial design in the 20th century. The principle is that the shape of a building or object should be primarily based upon its intended function or purpose. Conversely, one should be able to draw conclusions about the function from the form/shape! http://en.wikipedia.org/wiki/form_follows_function [accessed 05/12/2014, 12:0 CET] 5
Similarity in D According to the Lock and Key Principle, two ligands binding to the same receptor should be geometrically similar How is similarity defined in D? What distance/similarity measures? Can we apply some of the concepts used for topological similarity? How to identify similarities in D?? 6
Similarity of Conformations Simplest case of structure comparison: similarity of two topologically identical molecules (e.g., for clustering in conformational analysis) Results in a simplified problem: Every atom of A has an equivalent atom in B Find a transformation T that rotates and translates A such that A and B are optimally superimposed 7
Distance Measures RMSD If there is a bijection mapping each atom of A onto an atom of B, the simplest distance measure is the root mean square deviation (RMSD) of the atom coordinates where X = (x 1, x 2,..., x N ) and Y = (y 1, y 2,..., y N ) are the coordinate vectors of A and B. RMSD is either computed for all atoms (all atom RMSD) or only for heavy atoms (heavy atom RMSD), hence care must be taken when comparing published values! 8
Transformations RMSD depends on the coordinates of both molecules, we are interested in the minimum RMSD, so we need to find the transformation minimizing the distance Such a transformation can be decomposed into a rotation and a translation Rotations around the coordinate origin can be described by orthogonal x matrices R is orthogonal rows (and columns) of R form an orthonormal set 9
Transformations For a molecule with coordinate vectors x 1 x n, a rotation around the origin can be expressed by a matrix multiplication x 1 = Rx 1 In the general case, an additional translation t = (t 1, t 2, t ) has to be applied x 1 = x 1 + t R R, t HO O CH H O O CH CH t HO O OH H O O OH OH CH CH CH 10
Transformations If atom positions are represented by homogeneous coordinates, translation and rotation can both be represented by a 4x4 matrix composed of R and t: x = (x 1, x 2, x ) x H = (x 1, x 2, x, 1) Applying rotation R and translation t x = Rx + t thus simply corresponds to x H = Tx H in homogeneous coordinates 11
Transformations We are only interested in transformations that keep the internal geometry of the molecule intact, i.e., that do not change intramolecular distances Such transformations are rigid transformations They correspond to orthogonal matrices Optimal superposition of two molecules thus corresponds to the determination of an optimal rigid transformation T 12
Kabsch Algorithm Given two conformations A and B Find rigid transformation T min that maps A onto B such that the RMSD is minimized: Additional constraint: matrix T min has to be a rigid transformation T has to be an orthogonal matrix T has to satisfy W. Kabsch, Acta Cryst. (1976), A 2, 922 1
Kabsch Algorithm Objective function minimizes the squared distance between pairs of atoms of A and B and thus the RMSD An analytical solution to this optimization problem was suggested by Kabsch in 1976 Solution of the minimization problems is based on Lagrange relaxation Solution is then determined by solving an eigenvalue problem W. Kabsch, Acta Cryst. (1976), A 2, 922 W. Kabsch, Acta Cryst. (1978), A 4, 827 14
Superposition of Different Structures If different topologies are considered, then it is often difficult to find a bijection mapping atoms of A to atoms of B This mapping would be required to compute the RMSD One has to resort to different distance measures instead H 2 N H 2 N N N N OH N NH 2 H N N N N NH Dihydrofolate CH N -OOC -OOC O NH NH COO- COO- Methotrexate O 15
Superposition of Different Structures If different topologies are considered, then it is often difficult to find a bijection mapping atoms of A to atoms of B HO HO O O CH OH This mapping would be required to compute the RMSD One has to resort to different distance measures instead HO CH OH O 16
Overlap Volume Idea Represent atoms by three dimensional Gaussians Molecule = sum of Gaussians centered around different positions in space If two molecules overlap significantly (are properly aligned and similar), their Gaussians overlap Correlation of the Gaussians of A and B are used as a measure of similarity Carbó, Leyda, Arnau, Int. J. Quantum Chem., 1980, 17, 1185 17
Overlap Volume Volume of the molecule is described by a threedimensional density function, a linear combination of Gaussians centered on the atom positions Carbó, Leyda, Arnau, Int. J. Quantum Chem., 1980, 17, 1185 18
Overlap Volume Overlap of density functions is measure by the correlation Correlation of two density functions A and B yields a similarity measure Z AB, which is a measure for the overlap volume The larger Z AB, the more the density functions overlap, the more similar are A and B Carbó, Leyda, Arnau, Int. J. Quantum Chem., 1980, 17, 1185 19
RigFit Lemmen et al. developed RigFit, which is based on the correlation of density functions RigFit identifies the optimal superposition of two arbitrary rigid molecules Physicochemical properties of the molecules can also be taken into consideration; we will address this issue later Without loss of generality, A is in a fixed position, B can be moved by a translation t and a rotation Goal of the algorithm: identification of a t max and an max that maximize Z AB (t, ), i.e. for which the density functions of A and transformed B overlap the most Lemmen, Hiller, Lengauer, J. Comput.-Aided. Mol. Des., 1998, 12, 491 20
RigFit Z AB (t, ) is not normalized, it thus depends strongly on the size of the molecules In order to compensate for the size of the molecules, we use the Hodgkin index H AB, where Z AA and Z BB are the autocorrelations of A and B Since Z AA and Z BB are independent of t and, the optimum of this new objective function remains the same as the optimum of Z AB 21
RigFit Substituting the Gaussian density functions in Z AB yields: http://mathworld.wolfram.com/convolution.html 22
RigFit Algorithm (overview): Optimize rotation (in Fourier space) Find a good set of rotations Ignore translation This is achieved by the separation of rotation/translation Optimize translation (in Fourier space) Find the optimal translation for each rotation Remove unsuitable combinations of t/ Final optimization (real space) Perform a (local) six dimensional optimization of t, in real space to obtain the best transformation 2
RigFit Optimizing the Rotation Correlation function can be Fourier transformed to: For periodic density functions, the integral can be converted to a sum: 24
RigFit Optimizing the Rotation In Fourier space, the Gaussians of can be easily transformed into (equivalent) Patterson functions : Patterson functions are well known from crystallography and have the convenient property of being translationally invariant. 25
RigFit Optimizing the Rotation Objective function P AB ( ) is independent of the translation t Rotation can thus be optimized independently of the translation Calculations can also be performed very efficiently in Fourier space Optimization of is done by systematic search (grid search) based on regular rotation angles 26
RigFit Optimizing the Rotation After identifying good rotations, a nonlinear optimization using quasi Newton methods is performed Gradient is approximated by quotient of differences of R AB ( ) Optimization is good at finding local minima Good sampling ensures identification of global minimum 27
RigFit Optimizing the Translation After rotation search identified a set of good rotations, the translations for these rotations are determined Again, calculations are speeded up by Fourier transformation For a constant rotation, our objective functions depends on the translation t alone: 28
RigFit Optimizing the Translation This function can be evaluated very efficiently in Fourier Space by applying the convolution theorem: The Fourier transform of the similarity function Z can thus be computed by a simple multiplication in Fourier space instead of integration: 29
RigFit Optimizing the Translation Maxima of Z AB (t) in real space can be found by optimization again The figure on the right shows a two dimensional cut through an example objective function Z These optimizations are performed systematically for all good orientations and their corresponding translations Lemmen, Hiller, Lengauer, J. Comput.-Aided. Mol. Des., 1998, 12, 491 0
RigFit Final Optimization For the best combinations /t a final local optimization in sixdimensional space (t and ) is performed at the end The best combination of /t is selected from the results of this optimization RigFit yields excellent results for the superposition of small, rigid structures It is also possible to search for small active substructures in larger databases Lemmen, Hiller, Lengauer, J. Comput.-Aided. Mol. Des., 1998, 12, 491 1
RigFit Results Two ligands that have been superimposed optimally using RigFit On the left: the Gaussians (represented as spheres, ligand A: bright spheres, ligand B: dark spheres), on the right: stick model 2
Pharmacophore Definitions Paul Ehrlich (1909): "a molecular framework that carries (phoros) the essential features responsible for a drug s (=pharmacon's) biological activity" (Ehrlich. Dtsch. Chem. Ges. 1909, 42: p.17) Peter Gund (1977): "a set of structural features in a molecule that is recognized at a receptor site and is responsible for that molecule's biological activity (Gund. Prog. Mol. Subcell. Biol. 1977, 5: pp 117 14). IUPAC definition: "an ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target and to trigger (or block) its biological response" http://en.wikipedia.org/wiki/pharmacophore [accessed 05/12/2014, 1:20 CET]
Example: ACE Inhibitors O O HS N O N N O OH HS HO O N HS CH O N HO O O NH O HO N O NH CH N O OH CH SH O HO O What is common to these structures? What are the parts interacting with the lock? BKK, p. 14 4
Example: ACE Inhibitors O N N N N CH N HS O HO O HS O HO O SH O HO O O NH OH O HO N CH O O SH NH CH O N HO O Common to all structures is the occurrence of a carboxylate group, a carbonyl group, and a thiol or carboxyl group at similar distances. BKK, p. 14 5
Example: Opiates HO O O N O H H N CH Methadone HO Morphine Alignment of the molecules maps relevant parts of the molecules onto each other (here: phenyl and amino groups) LG, p. 11 6
Pharmacophore Yet another definition: The spatial arrangement of functional groups contributing to receptor binding in a ligand is called pharmacophore. O N d d 2 d 1 Problems: Which groups are part of the pharmacophore? How to identify it efficiently? Which other molecules contain the pharmacophore? 7
Pharmacophore Mapping Pharmacophore Mapping = derivation of a pharmacophore from a given set of structures Manually Manual comparison/superposition of structures Automatically DISCO Clique based methods HipHop/Catalyst Maximum Likelihood method GASP (Genetic Algorithm Superposition Program) 8
DISCO DISCO (DIStance COmparisons) Classifies heavy atoms into H bond acceptors H bond donors Positively charged Negatively charged Hydrophobic Consider a set of conformers Compute commonalities of the molecules as a maximal clique in an association graph Martin et al., J. Comput.-Aided Mol. Des., 199, 7, 8 9
DISCO Simplified problem: just two structures A and B Represent classified atoms in A and B as graphs Nodes are labeled by their atom class Here: two classes only (A, a = acceptor and D, d= donor) Edges connect all pairs of atoms, labeled by atom distance complete graph Here: only integer distances (for simplicity) A 1 7 4 5 D 1 4 d 2 a 1 5 8 10 d 1 A D 2 7 A 2 B 7 a 2 LG, S. 44 40
DISCO Molecule A is represented by graph G 1 (V 1, E 1, 1, 1 )and B by G 2 (V 2, E 2, 2, 2 ) with node/edgelabels 1 / 1 and 2 / 2 Nodes u V 1 and v V 2 are compatible 1 (u) = 2 (v) Edges s E 1 and t E 2 are compatible 1 (s) = 2 (t) A 1 7 4 5 D 1 4 d 2 a 1 5 8 10 d 1 A D 2 7 A 2 B 7 a 2 LG, p. 44 41
DISCO Compatible nodes and edges of G 1 and G 2 define the association graph G A (V A, E A ) with V A V 1 V 2 and E A E 1 E 2 An edge e A = ((u 1 v 1 ), (u 2 v 2 )) in G A implies that two pairs of nodes (u 1, u 2 ) and (v 1, v 2 ) and their induced edges (u 1 v 1 ) and (u 2 v 2 ) are compatible A 1 7 4 5 D 1 4 d 2 a 1 5 8 10 d 1 A D 2 7 A 2 B 7 a 2 LG, p. 44 42
DISCO G A (V A, E A )withv A V 1 V 2 and E A E 1 E 2 V A = {(uv) u V 1, v V 2 (u) = 2 (v)} E A = {((uv)(st)) (uv) E 1, (st) E 2 1 ((uv)) = 2 ((st)) 1 (u) = 2 (s) 1 (v) = 2 (t)} A 1 7 4 5 D 1 4 d 2 a 1 5 8 10 d 1 A D 2 7 A 2 B 7 a 2 LG, p. 44 4
DISCO A 1 a 1 D 1 d 2 D 2 d 1 D 1 d 1 D 2 d 1 A 1 a 2 A 2 a 1 A 2 a 2 A 1 7 4 5 D 1 4 d 2 a 1 5 8 10 d 1 A D 2 7 A 2 B 7 a 2 LG, p. 44 44
CLIQUE Clique: Complete subgraph of G(V, E) Maximum clique: Clique for which there is no node in G that can be added to the clique (including its induced edges) such that the resulting graph is again a clique. 45
DISCO A 1 a 1 D 1 d 2 D 2 d 1 D 1 d 1 D 2 d 1 A 1 a 2 A 2 a 1 A 2 a 2 Maximum clique in G A corresponds to the largest subgraphs in G 1 and G 2 that are entirely compatible to each other These largest subgraphs correspond to the largest possible pharmacophore LG, S. 44 46
DISCO A 1 a 1 D 1 d 2 D 2 d 1 D 1 d 1 D 2 d 1 A 1 a 2 A 2 a 1 A 2 a 2 A 1 4 5 D 1 4 d 2 a 1 5 d 1 A D 2 A 2 B a 2 LG, p. 44 47
MAX_CLIQUE The problem MAX_CLIQUE is the problem of finding a maximum clique in a graph MAX_CLIQUE is reducible to SAT and thus NP complete Real world problem instances for pharmacophore search are nevertheless computable in acceptable time There are many well known algorithms for the solution of MAX_CLIQUE A very popular (simple) algorithm used in chemoinformatics is the Bron Kerbosch algorithm (although much more efficient algorithms exist) 48
Bron Kerbosch Algorithm 2 1 4 5 Let us consider the simple example G=(V, E) above Bron Kerbosch implements a simple recursive tree search with back tracking It uses three sets (lists) of nodes for this purpose C candidate nodes for a clique M current maximum clique N Nots : already tested, but not part of the maximum clique Bron, Kerbosch, Comm. ACM, 197, 16, 575 49
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 50
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 51
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 52
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 1 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 5
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 1 2,,4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 54
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 1 2,,4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 55
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 1 2,,4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 56
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 1 2,,4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 57
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 58
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 59
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 1,2 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 60
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 1,2 4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 61
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 1,2 4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 62
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 1,2 4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 6
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 64
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 65
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 66
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 67
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 68
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 69
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2-4 D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 70
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2-4 D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 71
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2-4 D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 72
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 1,2-4 END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 7
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1,2-4 END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 74
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1,2-4 END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 75
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1, - 4 END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 76
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1, 4 4 END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 77
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1, 4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 78
Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1, 4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 79
Bron Kerbosch Algorithm 1 2 4 5 Max. cliques: {1,2,4} {1,,4} {2,5} {,5}... M D C D N D D=0-1,2,,4,5-1 1 2,,4-2 1,2 4-1,2,4 - - 2 1,2-4 1 1,4 2 2 1, 4-1,,4 - - 2 1, - 4 1 1 4 2,............ Bron, Kerbosch, Comm. ACM, 197, 16, 575 80
Bron Kerbosch Algorithm Generates all maximum cliques in a graph Runtime exponential in the number of nodes Also used for related problems (e.g., maximum common substructure) The popularity of the algorithm is due to its trivial implementation Much more advanced algorithms exist (c.f. 2 nd DIMACS Challenge), however there are often very tricky to implement Good algorithms for approximate cliques often yield very good results as well David S. Johnson and Michael A. Trick (Hrsg.): Cliques, Coloring, and Satisfiability, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 1996, Vol. 26, AMS 81
MCS vs. DISCO DISCO is a variant of MCS (Maximum Common Substructure) MCS can be seen as DISCO without distance constraints on the edges DISCO isthus easier than MCS Bron Kerbosch algorithm (or similar) also allows the solution of MCS (see lecture ) 82
Pharmacophore Mapping Other methods Maximum likelihood methods HipHop, Barnum et. al. (1996) Systematic search Motoc et al. (1986) Evolutionary algorithms GASP (Jones et al., 1995) Motoc et al., QSAR, 1986, 5, 99 Jones et al., J. Comput.-Aided Mol. Des., 1995, 9, 52 Barnum et al., J. Chem. Inf. Comput. Sci., 1996, 6, 56 8
Summary D similarity in the simplest case maps rigid structures onto each other This is trivial for topologically identical structures Popular distance measures: RMSD, overlap volume Only a substructure (the pharmacophore) is responsible for the biological activity Pharmacophore mapping identifies the common structure in a set of given (active) structures DISCO is an algorithm for pharmacophore mapping Related to MCS Idea: find maximum cliques in an association graph MAX_CLIQUE is NP complete Method: Bron Kerbosch algorithm 84
References Books [Lea] Andrew Leach: Molecular Modelling: Principles and Applications, 2nd ed., Prentice Hall, 2001 [LG] Andrew Leach, Valerie Gillet: An Introduction to Chemoinformatics, Kluwer, 200 [GE] Johann Gasteiger, Thomas Engel: Chemoinformatics. A Textbook, Wiley VCH, 200 [BKK] Böhm, Klebe, Kubinyi: Wirkstoffdesign, Spektrum 2002 Johann Gasteiger (Hrsg.): Handbook of Chemoinformatics, Wiley VCH, 200 Review papers Christian Lemmen, Thomas Lengauer: Computational methods for the structural alignment of molecules, J. Comput. Aided Mol. Des. (2000) 14, 215 Andrew Brint, Peter Willett: Algorithms for the Identification of Three dimensional Maximal Common Substructures, J. Chem. Inf. Comput. Sci. (1987), 27, 152 85