Similarity Search. Uwe Koch

Similar documents
Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Machine Learning Concepts in Chemoinformatics

KNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös

Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Structural biology and drug design: An overview

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Introduction. OntoChem

How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space

Overview. Descriptors. Definition. Descriptors. Overview 2D-QSAR. Number Vector Function. Physicochemical property (log P) Atom

Structure-Activity Modeling - QSAR. Uwe Koch

Design and characterization of chemical space networks

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

Biologically Relevant Molecular Comparisons. Mark Mackey

BIOINF Drug Design 2. Jens Krüger and Philipp Thiel Summer Lecture 5: 3D Structure Comparison Part 1: Rigid Superposition, Pharmacophores

Universities of Leeds, Sheffield and York

Performing a Pharmacophore Search using CSD-CrossMiner

Cheminformatics analysis and learning in a data pipelining environment

Virtual screening in drug discovery

Similarity methods for ligandbased virtual screening

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Solved and Unsolved Problems in Chemoinformatics

An Integrated Approach to in-silico

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Chemical Databases: Encoding, Storage and Search of Chemical Structures

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

Spatial chemical distance based on atomic property fields

Supplementary information

est Drive K20 GPUs! Experience The Acceleration Run Computational Chemistry Codes on Tesla K20 GPU today

Hit Finding and Optimization Using BLAZE & FORGE

Data Mining in the Chemical Industry. Overview of presentation

Clustering Ambiguity: An Overview

Receptor Based Drug Design (1)

SmallWorld: Efficient Maximum Common Subgraph Searching of Large Chemical Databases

Development of a Structure Generator to Explore Target Areas on Chemical Space

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology

The Schrödinger KNIME extensions

Developing CAS Products for Substructure Searching by Chemists. Linda Toler

Creating a Pharmacophore Query from a Reference Molecule & Scaffold Hopping in CSD-CrossMiner

The PhilOEsophy. There are only two fundamental molecular descriptors

In silico pharmacology for drug discovery

Chemical Similarity Searching

Plan. Day 2: Exercise on MHC molecules.

Fast similarity searching making the virtual real. Stephen Pickett, GSK

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

Spacer conformation in biologically active molecules*

Studying the effect of noise on Laplacian-modified Bayesian Analysis and Tanimoto Similarity

Patrick: An Introduction to Medicinal Chemistry 5e Chapter 01

Interactive Feature Selection with

Ultrafast Shape Recognition to Search Compound Databases for Similar Molecular Shapes

Using Self-Organizing maps to accelerate similarity search

Characterization of Pharmacophore Multiplet Fingerprints as Molecular Descriptors. Robert D. Clark 2004 Tripos, Inc.

Bioisosteres in Medicinal Chemistry

Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification

Analysis of Activity Landscapes, Activity Cliffs, and Selectivity Cliffs. Jürgen Bajorath Life Science Informatics University of Bonn

Molecular Substructure Mining Approaches for Computer-Aided Drug Discovery: A Review

Introduction to Chemoinformatics

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

Chemoinformatics: the first half century

Molecular Descriptors Theory and tips for real-world applications

has its own advantages and drawbacks, depending on the questions facing the drug discovery.

Design and Synthesis of the Comprehensive Fragment Library

Nonlinear QSAR and 3D QSAR

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Research Article. Chemical compound classification based on improved Max-Min kernel

Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data

György M. Keserű H2020 FRAGNET Network Hungarian Academy of Sciences

ICM-Chemist-Pro How-To Guide. Version 3.6-1h Last Updated 12/29/2009

Analysis of Random Fragment Profiles for the Detection of Structure-Activity Relationships

Development and application of ligand-based computational methods for de-novo drug. design and virtual screening. Alexander Richard Geanes.

Analyzing Building Blocks Diversity for DNA Encoded Library Design. Cresset User Group Meeting Nik Stiefl & Finton Sirockin, Novartis

This is a repository copy of Chemoinformatics techniques for data mining in files of two-dimensional and three-dimensional chemical molecules.

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal

Universities of Leeds, Sheffield and York

Three-Dimensional Compound Comparison Methods and Their Application in Drug Discovery

Chemoinformatics and Drug Discovery

Notes of Dr. Anil Mishra at 1

Hydrogen Bonding & Molecular Design Peter

Xia Ning,*, Huzefa Rangwala, and George Karypis

Pipeline Pilot Integration

DATA FUSION APPROACHES IN LIGAND-BASED VIRTUAL SCREENING: RECENT DEVELOPMENTS OVERVIEW

JCICS Major Research Areas

Towards Physics-based Models for ADME/Tox. Tyler Day

Author Index Volume

DivCalc: A Utility for Diversity Analysis and Compound Sampling

Alkane/water partition coefficients and hydrogen bonding. Peter Kenny

Universities of Leeds, Sheffield and York

Introduction to Spark

Chemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013

Protein-Ligand Docking

Quest for the Rings. In Silico Exploration of Ring Universe To Identify Novel Bioactive Heteroaromatic Scaffolds

Using AutoDock for Virtual Screening


DISCRETE TUTORIAL. Agustí Emperador. Institute for Research in Biomedicine, Barcelona APPLICATION OF DISCRETE TO FLEXIBLE PROTEIN-PROTEIN DOCKING:

Jonathan S. Mason,, Isabelle Morize, Paul R. Menard,*, Daniel L. Cheney, Christopher Hulme, and Richard F. Labaudiniere

Drug Informatics for Chemical Genomics...

Transcription:

Similarity Search Uwe Koch

Similarity Search The similar property principle: strurally similar molecules tend to have similar properties. However, structure property discontinuities occur frequently. Relevance in medicinal chemistry: ligand based virtual screening identify new series to avoid liabilities in back up Identify dissimilar compounds for screening collection

Similarity Search Similarity search probably, together with substructure searches, the cheminformatic method most used by chemists All similarity measures comprise three basic components: - the representation that characterizes each molecule, the molecular descriptors - the weighting scheme that is used to (de)prioritise different parts of the representation to reflect their relative importance - the similarity coefficient that provides a numeric value for the degree of similarity between two weighted representations

Similarity Search:Descriptors Representation of a molecule molecular descriptors: numerical values describing the properties of a molecule Descriptors representing properties of complete molecules: - log P, dipole moment, polarizability Descriptors calculated from 2D graphs: - topological indices, 2D fingerprints, maximum common substructures Descriptors requiring 3D representations: - Pharmacophore descriptors, fingerprints, molecular fields, shape Todeschini & Consonni, Handbook of molecular descroptors, Wiley-VCH 2009

Similarity Search:Descriptors 2D fingerprints: binary vectors MACCS: Each bit in the bit string represents one molecular fragment, where 1 indicates the presence of the functional group, 0 ist absence. 166 strcutural fragments. Other types of fingerprint: path (daylight), circular (EFCP, Morgan) Similarity is quantified by determining the number of common bits Todeschini & Consonni, Handbook of molecular descroptors, Wiley-VCH 2009

Similarity Search:Descriptors ECFP2, ECFP4, FCPF2 : encoding circular substructure Encodes information on the arrangement of heavy atoms around each central atom. Layer 0: Carbonyl C (sp2) Layer 1: Aliphatic carbon (sp3) Oxygen (sp2) Nitrogen (sp2) ECFP fragments encode atomic type, charge and mass FCFP fragments encode six generalized atom types 2, 4, 6 denotes the diameter (in bonds) of the circular substructure Todeschini & Consonni, Handbook of molecular descroptors, Wiley-VCH 2009

Maximum Common Subgraph similarity Pattern (graph) matching. Graph matching is based on node (atom) and edge (bond) correspondence using the graph theoretic concept of a clique. Clique is a sub-graph in which every node is connected to every other node Application: Maximum common substructure Similarity search Clustering Reaction mapping Molecule Alignment

Similarity Coefficient The similarity coefficient measures the overlap between descriptors for two compounds. The Tanimoto coefficient (Tc) is calculated as the ratio between conserved features and the total number of features of each molecule. The reaches from 1 (total similarity) to 0 no overlap. But neighbourhood behavior depends on both the fingerprint being uses and the similarity coefficient. a: features compound A b: features compound B c: features common to A and B Tanimoto: Tc = c/(a+b-c) = c /((a-c)+(b-c)+c) Tc is the fraction of features shared by A and B and the total number of features

Similarity Coefficient a: features compound A b: features compound B c: features common to A and B Tanimoto: Tc = c/(a+b-c) = c /((a-c)+(b-c)+c) Tc is the fraction of features shared by A and B and the total number of features Tversky: Tv = c / (α (a-c) + β (b-c) + c ) Introduces user defined weighing factors. If α = 1 and β = 0, Tv = c / a: fraction of features it has in common with reference compound. It would be 1 if all features of A are present also in B.

Similarity Search: An example Descriptor Highest ranked 2nd 3rd Fp atom pairs (AP) Reference compound Search for similars using the same Chembl data set FP radial (ECFP4) Tanimoto = 0.73 (AP) 0.11 (rad), 0.96 (MACCS) Tanimoto = 0.6 (AP) 0.14 (rad), 0.83 (MACCS) Tanimoto = 0.55 (AP) 0.12(rad), 0.82 (MACCS) MACCS Tanimoto = 0.26 (rad) 0.36 (AP), 0.69 (MACCS) Tanimoto = 0.24 (rad) 0.29 (AP), 0.58 (MACCS) Tanimoto = 0.23 (rad) 0.29 (AP), 0.6 (MACCS) Tanimoto = 0.96 (MACCS) 0.11 (rad), 0.73 (AP) Tanimoto = 0.86 (MACCS) 0.07 (rad), 0.28 (AP) Tanimoto = 0.83 (MACCS) 0.14 (rad), 0.6 (AP) Ranking depends on descriptors used

Similarity Search: An example Similarity values are strongly dependent on molecular description. Many MACCS features are often found in compounds. ECFP4 encodes often rare molecular environments. Thus Tc(MACCS) is often larger than Tc(ECFP4). Similarity values also tend to increase with molecular size and complexity due to increasing fingerprint bit density. Different combinations of fingerprints and and similarity coeffcients produce different similarity value distributions. Difficult to relate a specific similarity value to a probability of having similar biological activity.

Scaffold hopping Usually the compounds most similar to a reference are close structural analogs To identify alternative lead series if problems due to ADME, Tox or IP arise requires identification of structurally different compounds by modifying the core stcucture of the molecule (Scaffold hopping) Descriptors suitable for scaffold hopping: - Reduced graphs - Topological pharmacophore keys - 3D descriptors Lagdon, Ertl & Brown, Molecular Informatics 2010, 29, 366

Scaffold hopping What is a scaffold? Definition by Murcko & Bemis: The molecule is dissected into ring systems, linkers and side chains. The Murcko framework is the union of ring systems and linkers. Raloxifen (SERM) Osteoporosis and Invasive breast canver Red:rings Blue: linkers Green: side chains Murcko scaffold Graph Markush scaffold Bemis & Murcko, J. Med. Chem. 1996, 39, 2887.

Scaffold hopping Scaffold hopping example: Raloxifen (SERM) Osteoporosis and Invasive breast cancer Tamoxifen (SERM) Invasive breast cancer 2D fingerprints: For several cases it has been shown that the top 1% of a screening database select on average 25% of the scaffolds -> iterative focused screening. However, it is not possible to identify a generally preferred similarity value range. Bajorath et al., J. Med. Chem. 2010, 53, 5707.

Reduced graph Reduced graphs provide summary representations of chemical structures by collapsing groups of connected atoms into single nodes while preserving the topology of the original structures. Shaded spheres map common functional groups revealing similarities notevident when using coventional 2D fingerprints Similarities based on daylight FPs and the Tanimoto coefficient Linkers and feature nodes are used to describe molecules. Different levels of hierarchy according to a more detailed description of node properties Gillet et al. JCICS 2003, 43, 338

3D similarity search Molecules with similar 3D shape and properties could share biological activity, even while their 1D and 2D representations are not similar. Conformational flexibility to be taken into account. Five classes of 3D representations of molecules Pharmacophore Atomic distance Gaussian function Surface Field

Pharmacophore searching Pharmacophore is an abstract description of molecular features that are necessary for molecular recognition of a ligand by a biological macromolecule. A pharmacophore model explains how structurally diverse ligands can bind to a common receptor site (Wikipedia). Pharmacophore generation: Requires set of actives Molecular features: HB Donors, HB Acceptors, hydrophobic groups Active Ligands aligned such that corresponding features are overlaid Conformational space explored Scoring of pharmacophore hypothesis taking into account: number of features, goodness of fit, volume of overlay

Pharmacophore searching Raloxifene Lasofoxifene Tamoxifene

Pharmacophore searching Raloxifene Lasofoxifene Tamoxifene

Pharmacophore searching Raloxifene Lasofoxifene Tamoxifene Cyp2d6

Pharmacophore searching Tamoxifene Lipophilic HB-Donor Basic Features: Lipohilic, ring, HB donor, basic. Each feature is represented as a sphere. The radius indicates the toleranceon the deviation from the exact position. The pharamcophore features are used as queries for searching databases.

3D similarity search Atomic distance: ESHAPE3D uses a distance matrix with distances between all heavy atoms. Eigenvalues are calculated. Difficult to include physicochemical properties. Gaussian functions: Similarity as volume overlap between two molecules after superimposition. ROCS searches for superposition that maximizes Volume overlap. Similarity is quantified by Volume Tamimoto. Several atom types are used and considered for similarity. Surface similarity: Surface calculation eg by triangularization. Surface points may be characterized by interaction energies using GRID. Each surface point associated by a vector. Similarity by comparing the vectors.

3D similarity search Field based: Blaze (Cresset): Electrostatic, hydrophobic and shape properties of molecules are represented by field patterns. Field patterns of a reference comcpound are compared to a database of precalculated field patterns of commercial compounds. Pharmacophore based: Initiates with identification of pharmacophore including features such as HBdonors and acceptors, hydrophobicity core and charges. Three of four points are connected to form a triangle of tetrahedron.

Molecular similarity depends on molecular representation, similarity measures and compound class. It is not always clear how molecular similarity is related to biological activity. However as a tool to identify groups of compounds to further the knowledge on structure property relationships similarity search is indispensable.

Thank you very much Questions?