BIOINF Drug Design 2. Jens Krüger and Philipp Thiel Summer Lecture 5: 3D Structure Comparison Part 1: Rigid Superposition, Pharmacophores

Similar documents
Efficient overlay of molecular 3-D pharmacophores

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Similarity Search. Uwe Koch

JCICS Major Research Areas

Structure-Activity Modeling - QSAR. Uwe Koch

Molecular Interactions F14NMI. Lecture 4: worked answers to practice questions

Structural biology and drug design: An overview

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Docking. GBCB 5874: Problem Solving in GBCB

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

Data Mining in the Chemical Industry. Overview of presentation

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

BIOINF 4371 Drug Design 1 Oliver Kohlbacher & Jens Krüger

Common Pharmacophore Identification Using Frequent Clique Detection Algorithm

fconv Tutorial Part 2

Machine Learning Concepts in Chemoinformatics

Biologically Relevant Molecular Comparisons. Mark Mackey

Performing a Pharmacophore Search using CSD-CrossMiner

Protein-Ligand Docking

Author Index Volume

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part I: Motivation, Basics, Descriptors

ENERGY MINIMIZATION AND CONFORMATION SEARCH ANALYSIS OF TYPE-2 ANTI-DIABETES DRUGS

In silico pharmacology for drug discovery

Ping-Chiang Lyu. Institute of Bioinformatics and Structural Biology, Department of Life Science, National Tsing Hua University.

Pose and affinity prediction by ICM in D3R GC3. Max Totrov Molsoft

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Plan. Day 2: Exercise on MHC molecules.

Protein-Ligand Docking Evaluations

Medicinal Chemistry/ CHEM 458/658 Chapter 4- Computer-Aided Drug Design

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

STA 4273H: Statistical Machine Learning

Creating a Pharmacophore Query from a Reference Molecule & Scaffold Hopping in CSD-CrossMiner

Joana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Overview. Descriptors. Definition. Descriptors. Overview 2D-QSAR. Number Vector Function. Physicochemical property (log P) Atom

Fondamenti di Chimica Farmaceutica. Computer Chemistry in Drug Research: Introduction

Introduction to Chemoinformatics

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Using Bayesian Statistics to Predict Water Affinity and Behavior in Protein Binding Sites. J. Andrew Surface

SYSTEMATIC SEARCH AND ANGIOTENSIN CONVERTING ENZYME INHIBITORS Ligand-Based SAR

Solvent & geometric effects on non-covalent interactions

The Long and Rocky Road from a PDB File to a Protein Ligand Docking Score. Protein Structures: The Starting Point for New Drugs 2

CAP 5510 Lecture 3 Protein Structures

STRUCTURAL BIOINFORMATICS II. Spring 2018

4 Carbon and the Molecular Diversity of Life

ICM-Chemist-Pro How-To Guide. Version 3.6-1h Last Updated 12/29/2009

Spacer conformation in biologically active molecules*

Introduction to Chemoinformatics and Drug Discovery

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

Generating Small Molecule Conformations from Structural Data

Nonlinear QSAR and 3D QSAR

Exploring the energy landscape

DOCKING TUTORIAL. A. The docking Workflow

Supplementary Information: Construction of Hypothetical MOFs using a Graph Theoretical Approach. Peter G. Boyd and Tom K. Woo*

Scoring functions for of protein-ligand docking: New routes towards old goals

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

STA 414/2104: Machine Learning

Machine learning for ligand-based virtual screening and chemogenomics!

Computer Graphics Applications on Molecular Biology and Drug Design

Similarity methods for ligandbased virtual screening

Lecture 6 Positive Definite Matrices

Dynamic Programming on Trees. Example: Independent Set on T = (V, E) rooted at r V.

Wigner s semicircle law

Structural Bioinformatics (C3210) Molecular Docking

What is Protein-Ligand Docking?

Physical Science Q2, U4: Chemical Bonding (This unit builds student capacity to engage Keystone Biology Eligible Content.)

The Calculation of Physical Properties of Amino Acids Using Molecular Modeling Techniques (II)

Structure-Based Comparison of Biomolecules

Carbon and the Molecular Diversity of Life

Softwares for Molecular Docking. Lokesh P. Tripathi NCBS 17 December 2007

Bioisosteres in Medicinal Chemistry

Module 10: Finite Difference Methods for Boundary Value Problems Lecture 42: Special Boundary Value Problems. The Lecture Contains:

CS281A/Stat241A Lecture 19

Probabilistic Graphical Models

Virtual screening in drug discovery

Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule

* Author to whom correspondence should be addressed; Tel.: ; Fax:

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Receptor Based Drug Design (1)

Selecting protein fuzzy contact maps through information and structure measures

Solutions and Non-Covalent Binding Forces

Chapter 3. Loop and Cut-set Analysis

Solved and Unsolved Problems in Chemoinformatics

CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C.

Introduction to" Protein Structure

Molecular Mechanics, Dynamics & Docking

On a Polynomial Fractional Formulation for Independence Number of a Graph

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

CPSC 320 (Intermediate Algorithm Design and Analysis). Summer Instructor: Dr. Lior Malka Final Examination, July 24th, 2009

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

The Conformation Search Problem

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Marvin. Sketching, viewing and predicting properties with Marvin - features, tips and tricks. Gyorgy Pirok. Solutions for Cheminformatics

Molecular Modelling for Medicinal Chemistry (F13MMM) Room A36

Graph coloring, perfect graphs

Chapter 3 Deterministic planning

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Spatial chemical distance based on atomic property fields

Transcription:

BIOINF 472 Drug Design 2 Jens Krüger and Philipp Thiel Summer 2014 Lecture 5: D Structure Comparison Part 1: Rigid Superposition, Pharmacophores

Overview Comparison of D structures Rigid superposition RMSD as distance measure Optimal rigid superposition RigFit Pharmacophores Definition Pharmacophore identification as a graph problem Bron Kerbosch algorithm for MAX_CLIQUE Other methods 2

Structure Comparison in D In contrast to fingerprint or graph based methods, D structure comparison also considers geometry Comparison of two structures A and B Consider the relative orientation of the atom positions in space (relative to each other) N N N

Structure Comparison in D Basically two variants Rigid Find an optimal superposition of A and B Compute similarity by some similarity measure Implicit consideration of flexibility possible by considering several conformers Semiflexible/flexible Compute alignment of A and B Partial or full flexibility of structures Compare quality of different alignments using a distance/similarity measure 4

Similarity in D form follows function Form follows function is a principle associated with modern architecture and industrial design in the 20th century. The principle is that the shape of a building or object should be primarily based upon its intended function or purpose. Conversely, one should be able to draw conclusions about the function from the form/shape! http://en.wikipedia.org/wiki/form_follows_function [accessed 05/12/2014, 12:0 CET] 5

Similarity in D According to the Lock and Key Principle, two ligands binding to the same receptor should be geometrically similar How is similarity defined in D? What distance/similarity measures? Can we apply some of the concepts used for topological similarity? How to identify similarities in D?? 6

Similarity of Conformations Simplest case of structure comparison: similarity of two topologically identical molecules (e.g., for clustering in conformational analysis) Results in a simplified problem: Every atom of A has an equivalent atom in B Find a transformation T that rotates and translates A such that A and B are optimally superimposed 7

Distance Measures RMSD If there is a bijection mapping each atom of A onto an atom of B, the simplest distance measure is the root mean square deviation (RMSD) of the atom coordinates where X = (x 1, x 2,..., x N ) and Y = (y 1, y 2,..., y N ) are the coordinate vectors of A and B. RMSD is either computed for all atoms (all atom RMSD) or only for heavy atoms (heavy atom RMSD), hence care must be taken when comparing published values! 8

Transformations RMSD depends on the coordinates of both molecules, we are interested in the minimum RMSD, so we need to find the transformation minimizing the distance Such a transformation can be decomposed into a rotation and a translation Rotations around the coordinate origin can be described by orthogonal x matrices R is orthogonal rows (and columns) of R form an orthonormal set 9

Transformations For a molecule with coordinate vectors x 1 x n, a rotation around the origin can be expressed by a matrix multiplication x 1 = Rx 1 In the general case, an additional translation t = (t 1, t 2, t ) has to be applied x 1 = x 1 + t R R, t HO O CH H O O CH CH t HO O OH H O O OH OH CH CH CH 10

Transformations If atom positions are represented by homogeneous coordinates, translation and rotation can both be represented by a 4x4 matrix composed of R and t: x = (x 1, x 2, x ) x H = (x 1, x 2, x, 1) Applying rotation R and translation t x = Rx + t thus simply corresponds to x H = Tx H in homogeneous coordinates 11

Transformations We are only interested in transformations that keep the internal geometry of the molecule intact, i.e., that do not change intramolecular distances Such transformations are rigid transformations They correspond to orthogonal matrices Optimal superposition of two molecules thus corresponds to the determination of an optimal rigid transformation T 12

Kabsch Algorithm Given two conformations A and B Find rigid transformation T min that maps A onto B such that the RMSD is minimized: Additional constraint: matrix T min has to be a rigid transformation T has to be an orthogonal matrix T has to satisfy W. Kabsch, Acta Cryst. (1976), A 2, 922 1

Kabsch Algorithm Objective function minimizes the squared distance between pairs of atoms of A and B and thus the RMSD An analytical solution to this optimization problem was suggested by Kabsch in 1976 Solution of the minimization problems is based on Lagrange relaxation Solution is then determined by solving an eigenvalue problem W. Kabsch, Acta Cryst. (1976), A 2, 922 W. Kabsch, Acta Cryst. (1978), A 4, 827 14

Superposition of Different Structures If different topologies are considered, then it is often difficult to find a bijection mapping atoms of A to atoms of B This mapping would be required to compute the RMSD One has to resort to different distance measures instead H 2 N H 2 N N N N OH N NH 2 H N N N N NH Dihydrofolate CH N -OOC -OOC O NH NH COO- COO- Methotrexate O 15

Superposition of Different Structures If different topologies are considered, then it is often difficult to find a bijection mapping atoms of A to atoms of B HO HO O O CH OH This mapping would be required to compute the RMSD One has to resort to different distance measures instead HO CH OH O 16

Overlap Volume Idea Represent atoms by three dimensional Gaussians Molecule = sum of Gaussians centered around different positions in space If two molecules overlap significantly (are properly aligned and similar), their Gaussians overlap Correlation of the Gaussians of A and B are used as a measure of similarity Carbó, Leyda, Arnau, Int. J. Quantum Chem., 1980, 17, 1185 17

Overlap Volume Volume of the molecule is described by a threedimensional density function, a linear combination of Gaussians centered on the atom positions Carbó, Leyda, Arnau, Int. J. Quantum Chem., 1980, 17, 1185 18

Overlap Volume Overlap of density functions is measure by the correlation Correlation of two density functions A and B yields a similarity measure Z AB, which is a measure for the overlap volume The larger Z AB, the more the density functions overlap, the more similar are A and B Carbó, Leyda, Arnau, Int. J. Quantum Chem., 1980, 17, 1185 19

RigFit Lemmen et al. developed RigFit, which is based on the correlation of density functions RigFit identifies the optimal superposition of two arbitrary rigid molecules Physicochemical properties of the molecules can also be taken into consideration; we will address this issue later Without loss of generality, A is in a fixed position, B can be moved by a translation t and a rotation Goal of the algorithm: identification of a t max and an max that maximize Z AB (t, ), i.e. for which the density functions of A and transformed B overlap the most Lemmen, Hiller, Lengauer, J. Comput.-Aided. Mol. Des., 1998, 12, 491 20

RigFit Z AB (t, ) is not normalized, it thus depends strongly on the size of the molecules In order to compensate for the size of the molecules, we use the Hodgkin index H AB, where Z AA and Z BB are the autocorrelations of A and B Since Z AA and Z BB are independent of t and, the optimum of this new objective function remains the same as the optimum of Z AB 21

RigFit Substituting the Gaussian density functions in Z AB yields: http://mathworld.wolfram.com/convolution.html 22

RigFit Algorithm (overview): Optimize rotation (in Fourier space) Find a good set of rotations Ignore translation This is achieved by the separation of rotation/translation Optimize translation (in Fourier space) Find the optimal translation for each rotation Remove unsuitable combinations of t/ Final optimization (real space) Perform a (local) six dimensional optimization of t, in real space to obtain the best transformation 2

RigFit Optimizing the Rotation Correlation function can be Fourier transformed to: For periodic density functions, the integral can be converted to a sum: 24

RigFit Optimizing the Rotation In Fourier space, the Gaussians of can be easily transformed into (equivalent) Patterson functions : Patterson functions are well known from crystallography and have the convenient property of being translationally invariant. 25

RigFit Optimizing the Rotation Objective function P AB ( ) is independent of the translation t Rotation can thus be optimized independently of the translation Calculations can also be performed very efficiently in Fourier space Optimization of is done by systematic search (grid search) based on regular rotation angles 26

RigFit Optimizing the Rotation After identifying good rotations, a nonlinear optimization using quasi Newton methods is performed Gradient is approximated by quotient of differences of R AB ( ) Optimization is good at finding local minima Good sampling ensures identification of global minimum 27

RigFit Optimizing the Translation After rotation search identified a set of good rotations, the translations for these rotations are determined Again, calculations are speeded up by Fourier transformation For a constant rotation, our objective functions depends on the translation t alone: 28

RigFit Optimizing the Translation This function can be evaluated very efficiently in Fourier Space by applying the convolution theorem: The Fourier transform of the similarity function Z can thus be computed by a simple multiplication in Fourier space instead of integration: 29

RigFit Optimizing the Translation Maxima of Z AB (t) in real space can be found by optimization again The figure on the right shows a two dimensional cut through an example objective function Z These optimizations are performed systematically for all good orientations and their corresponding translations Lemmen, Hiller, Lengauer, J. Comput.-Aided. Mol. Des., 1998, 12, 491 0

RigFit Final Optimization For the best combinations /t a final local optimization in sixdimensional space (t and ) is performed at the end The best combination of /t is selected from the results of this optimization RigFit yields excellent results for the superposition of small, rigid structures It is also possible to search for small active substructures in larger databases Lemmen, Hiller, Lengauer, J. Comput.-Aided. Mol. Des., 1998, 12, 491 1

RigFit Results Two ligands that have been superimposed optimally using RigFit On the left: the Gaussians (represented as spheres, ligand A: bright spheres, ligand B: dark spheres), on the right: stick model 2

Pharmacophore Definitions Paul Ehrlich (1909): "a molecular framework that carries (phoros) the essential features responsible for a drug s (=pharmacon's) biological activity" (Ehrlich. Dtsch. Chem. Ges. 1909, 42: p.17) Peter Gund (1977): "a set of structural features in a molecule that is recognized at a receptor site and is responsible for that molecule's biological activity (Gund. Prog. Mol. Subcell. Biol. 1977, 5: pp 117 14). IUPAC definition: "an ensemble of steric and electronic features that is necessary to ensure the optimal supramolecular interactions with a specific biological target and to trigger (or block) its biological response" http://en.wikipedia.org/wiki/pharmacophore [accessed 05/12/2014, 1:20 CET]

Example: ACE Inhibitors O O HS N O N N O OH HS HO O N HS CH O N HO O O NH O HO N O NH CH N O OH CH SH O HO O What is common to these structures? What are the parts interacting with the lock? BKK, p. 14 4

Example: ACE Inhibitors O N N N N CH N HS O HO O HS O HO O SH O HO O O NH OH O HO N CH O O SH NH CH O N HO O Common to all structures is the occurrence of a carboxylate group, a carbonyl group, and a thiol or carboxyl group at similar distances. BKK, p. 14 5

Example: Opiates HO O O N O H H N CH Methadone HO Morphine Alignment of the molecules maps relevant parts of the molecules onto each other (here: phenyl and amino groups) LG, p. 11 6

Pharmacophore Yet another definition: The spatial arrangement of functional groups contributing to receptor binding in a ligand is called pharmacophore. O N d d 2 d 1 Problems: Which groups are part of the pharmacophore? How to identify it efficiently? Which other molecules contain the pharmacophore? 7

Pharmacophore Mapping Pharmacophore Mapping = derivation of a pharmacophore from a given set of structures Manually Manual comparison/superposition of structures Automatically DISCO Clique based methods HipHop/Catalyst Maximum Likelihood method GASP (Genetic Algorithm Superposition Program) 8

DISCO DISCO (DIStance COmparisons) Classifies heavy atoms into H bond acceptors H bond donors Positively charged Negatively charged Hydrophobic Consider a set of conformers Compute commonalities of the molecules as a maximal clique in an association graph Martin et al., J. Comput.-Aided Mol. Des., 199, 7, 8 9

DISCO Simplified problem: just two structures A and B Represent classified atoms in A and B as graphs Nodes are labeled by their atom class Here: two classes only (A, a = acceptor and D, d= donor) Edges connect all pairs of atoms, labeled by atom distance complete graph Here: only integer distances (for simplicity) A 1 7 4 5 D 1 4 d 2 a 1 5 8 10 d 1 A D 2 7 A 2 B 7 a 2 LG, S. 44 40

DISCO Molecule A is represented by graph G 1 (V 1, E 1, 1, 1 )and B by G 2 (V 2, E 2, 2, 2 ) with node/edgelabels 1 / 1 and 2 / 2 Nodes u V 1 and v V 2 are compatible 1 (u) = 2 (v) Edges s E 1 and t E 2 are compatible 1 (s) = 2 (t) A 1 7 4 5 D 1 4 d 2 a 1 5 8 10 d 1 A D 2 7 A 2 B 7 a 2 LG, p. 44 41

DISCO Compatible nodes and edges of G 1 and G 2 define the association graph G A (V A, E A ) with V A V 1 V 2 and E A E 1 E 2 An edge e A = ((u 1 v 1 ), (u 2 v 2 )) in G A implies that two pairs of nodes (u 1, u 2 ) and (v 1, v 2 ) and their induced edges (u 1 v 1 ) and (u 2 v 2 ) are compatible A 1 7 4 5 D 1 4 d 2 a 1 5 8 10 d 1 A D 2 7 A 2 B 7 a 2 LG, p. 44 42

DISCO G A (V A, E A )withv A V 1 V 2 and E A E 1 E 2 V A = {(uv) u V 1, v V 2 (u) = 2 (v)} E A = {((uv)(st)) (uv) E 1, (st) E 2 1 ((uv)) = 2 ((st)) 1 (u) = 2 (s) 1 (v) = 2 (t)} A 1 7 4 5 D 1 4 d 2 a 1 5 8 10 d 1 A D 2 7 A 2 B 7 a 2 LG, p. 44 4

DISCO A 1 a 1 D 1 d 2 D 2 d 1 D 1 d 1 D 2 d 1 A 1 a 2 A 2 a 1 A 2 a 2 A 1 7 4 5 D 1 4 d 2 a 1 5 8 10 d 1 A D 2 7 A 2 B 7 a 2 LG, p. 44 44

CLIQUE Clique: Complete subgraph of G(V, E) Maximum clique: Clique for which there is no node in G that can be added to the clique (including its induced edges) such that the resulting graph is again a clique. 45

DISCO A 1 a 1 D 1 d 2 D 2 d 1 D 1 d 1 D 2 d 1 A 1 a 2 A 2 a 1 A 2 a 2 Maximum clique in G A corresponds to the largest subgraphs in G 1 and G 2 that are entirely compatible to each other These largest subgraphs correspond to the largest possible pharmacophore LG, S. 44 46

DISCO A 1 a 1 D 1 d 2 D 2 d 1 D 1 d 1 D 2 d 1 A 1 a 2 A 2 a 1 A 2 a 2 A 1 4 5 D 1 4 d 2 a 1 5 d 1 A D 2 A 2 B a 2 LG, p. 44 47

MAX_CLIQUE The problem MAX_CLIQUE is the problem of finding a maximum clique in a graph MAX_CLIQUE is reducible to SAT and thus NP complete Real world problem instances for pharmacophore search are nevertheless computable in acceptable time There are many well known algorithms for the solution of MAX_CLIQUE A very popular (simple) algorithm used in chemoinformatics is the Bron Kerbosch algorithm (although much more efficient algorithms exist) 48

Bron Kerbosch Algorithm 2 1 4 5 Let us consider the simple example G=(V, E) above Bron Kerbosch implements a simple recursive tree search with back tracking It uses three sets (lists) of nodes for this purpose C candidate nodes for a clique M current maximum clique N Nots : already tested, but not part of the maximum clique Bron, Kerbosch, Comm. ACM, 197, 16, 575 49

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 50

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 51

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 52

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 1 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 5

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 1 2,,4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 54

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 1 2,,4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 55

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 1 2,,4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 56

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=0-1,2,, 4,5 - D=1 1 2,,4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 57

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 58

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 59

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 1,2 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 60

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 1,2 4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 61

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 1,2 4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 62

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 1,2 4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 6

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 64

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= - - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 65

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 66

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 67

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 68

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2 4 - D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 69

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2-4 D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 70

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2-4 D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 71

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=2 1,2-4 D= 1,2,4 - - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 72

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1 2,,4 - D=2 1,2-4 END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 7

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1,2-4 END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 74

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1,2-4 END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 75

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1, - 4 END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 76

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1, 4 4 END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 77

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1, 4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 78

Bron Kerbosch Algorithm 1 2 4 5 FINDCLIQUES(M D, C D, N D ): IF v N D adjacent to all u C D THEN RETURN FORALL v C D : M D C D N D D=1 1,4 2 D=2 1, 4 - END M D+1 M D {v} // add candidate to M D C D+1 {u C D (uv) E} // new candidates are adjacent to v N D+1 {u N D (uv) E} // new non-candidates are adjacent to IF C D+1 = N D+1 = {} THEN PRINT M D+1 // M D+1 is max. clique ELSE FINDCLIQUES(M D+1, C D+1, N D+1 ) N D N D {v} // finished with v Bron, Kerbosch, Comm. ACM, 197, 16, 575 79

Bron Kerbosch Algorithm 1 2 4 5 Max. cliques: {1,2,4} {1,,4} {2,5} {,5}... M D C D N D D=0-1,2,,4,5-1 1 2,,4-2 1,2 4-1,2,4 - - 2 1,2-4 1 1,4 2 2 1, 4-1,,4 - - 2 1, - 4 1 1 4 2,............ Bron, Kerbosch, Comm. ACM, 197, 16, 575 80

Bron Kerbosch Algorithm Generates all maximum cliques in a graph Runtime exponential in the number of nodes Also used for related problems (e.g., maximum common substructure) The popularity of the algorithm is due to its trivial implementation Much more advanced algorithms exist (c.f. 2 nd DIMACS Challenge), however there are often very tricky to implement Good algorithms for approximate cliques often yield very good results as well David S. Johnson and Michael A. Trick (Hrsg.): Cliques, Coloring, and Satisfiability, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 1996, Vol. 26, AMS 81

MCS vs. DISCO DISCO is a variant of MCS (Maximum Common Substructure) MCS can be seen as DISCO without distance constraints on the edges DISCO isthus easier than MCS Bron Kerbosch algorithm (or similar) also allows the solution of MCS (see lecture ) 82

Pharmacophore Mapping Other methods Maximum likelihood methods HipHop, Barnum et. al. (1996) Systematic search Motoc et al. (1986) Evolutionary algorithms GASP (Jones et al., 1995) Motoc et al., QSAR, 1986, 5, 99 Jones et al., J. Comput.-Aided Mol. Des., 1995, 9, 52 Barnum et al., J. Chem. Inf. Comput. Sci., 1996, 6, 56 8

Summary D similarity in the simplest case maps rigid structures onto each other This is trivial for topologically identical structures Popular distance measures: RMSD, overlap volume Only a substructure (the pharmacophore) is responsible for the biological activity Pharmacophore mapping identifies the common structure in a set of given (active) structures DISCO is an algorithm for pharmacophore mapping Related to MCS Idea: find maximum cliques in an association graph MAX_CLIQUE is NP complete Method: Bron Kerbosch algorithm 84

References Books [Lea] Andrew Leach: Molecular Modelling: Principles and Applications, 2nd ed., Prentice Hall, 2001 [LG] Andrew Leach, Valerie Gillet: An Introduction to Chemoinformatics, Kluwer, 200 [GE] Johann Gasteiger, Thomas Engel: Chemoinformatics. A Textbook, Wiley VCH, 200 [BKK] Böhm, Klebe, Kubinyi: Wirkstoffdesign, Spektrum 2002 Johann Gasteiger (Hrsg.): Handbook of Chemoinformatics, Wiley VCH, 200 Review papers Christian Lemmen, Thomas Lengauer: Computational methods for the structural alignment of molecules, J. Comput. Aided Mol. Des. (2000) 14, 215 Andrew Brint, Peter Willett: Algorithms for the Identification of Three dimensional Maximal Common Substructures, J. Chem. Inf. Comput. Sci. (1987), 27, 152 85