Overview. Descriptors. Definition. Descriptors. Overview 2D-QSAR. Number Vector Function. Physicochemical property (log P) Atom

Similar documents
Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Similarity Search. Uwe Koch

Structural biology and drug design: An overview

Statistical concepts in QSAR.

Coefficient Symbol Equation Limits

Structure-Activity Modeling - QSAR. Uwe Koch

Notes of Dr. Anil Mishra at 1

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Data Mining in the Chemical Industry. Overview of presentation

Identification of Active Ligands. Identification of Suitable Descriptors (molecular fingerprint)

5.1. Hardwares, Softwares and Web server used in Molecular modeling

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Plan. Day 2: Exercise on MHC molecules.

Nonlinear QSAR and 3D QSAR

Relative Drug Likelihood: Going beyond Drug-Likeness

Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt

Chapter 8: Introduction to QSAR

BioSolveIT. A Combinatorial Approach for Handling of Protonation and Tautomer Ambiguities in Docking Experiments

Machine Learning Concepts in Chemoinformatics

An Integrated Approach to in-silico

Universities of Leeds, Sheffield and York

Introduction to Chemoinformatics and Drug Discovery

In silico pharmacology for drug discovery

Creating a Pharmacophore Query from a Reference Molecule & Scaffold Hopping in CSD-CrossMiner

Structure Determination. How to determine what compound that you have? One way to determine compound is to get an elemental analysis

Characterization of Pharmacophore Multiplet Fingerprints as Molecular Descriptors. Robert D. Clark 2004 Tripos, Inc.

Description of Molecules with Molecular Interaction Fields (MIF)

BioSolveIT. A Combinatorial Docking Approach for Dealing with Protonation and Tautomer Ambiguities

Quantum Mechanical Models of P450 Metabolism to Guide Optimization of Metabolic Stability

Universities of Leeds, Sheffield and York

Open PHACTS Explorer: Compound by Name

Hydrogen Bonding & Molecular Design Peter

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Translating Methods from Pharma to Flavours & Fragrances

Performing a Pharmacophore Search using CSD-CrossMiner

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Patrick: An Introduction to Medicinal Chemistry 5e Chapter 01

Introduction. OntoChem

Using Bayesian Statistics to Predict Water Affinity and Behavior in Protein Binding Sites. J. Andrew Surface

Three-dimensional molecular descriptors and a novel QSAR method

Bridging the Dimensions:

* Author to whom correspondence should be addressed; Tel.: ; Fax:

Functional Group Fingerprints CNS Chemistry Wilmington, USA

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

Medicinal Chemistry/ CHEM 458/658 Chapter 3- SAR and QSAR

Exploring the black box: structural and functional interpretation of QSAR models.

QSAR of Microtubule Stabilizing Dictyostatins

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

Research Article. Chemical compound classification based on improved Max-Min kernel

CHEM 4170 Problem Set #1

Quiz QSAR QSAR. The Hammett Equation. Hammett s Standard Reference Reaction. Substituent Effects on Equilibria

T. J. Hou, Z. M. Li, Z. Li, J. Liu, and X. J. Xu*,

György M. Keserű H2020 FRAGNET Network Hungarian Academy of Sciences

Identifying Interaction Hot Spots with SuperStar

Supplementary information

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

COMPUTER AIDED DRUG DESIGN (CADD) AND DEVELOPMENT METHODS

CHAPTER 6 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) ANALYSIS

Molecular Descriptors Theory and tips for real-world applications

Ligand-based QSAR Studies on the Indolinones Derivatives Bull. Korean Chem. Soc. 2004, Vol. 25, No

Drug Informatics for Chemical Genomics...

Data Quality Issues That Can Impact Drug Discovery

Bioinformatics Workshop - NM-AIST

C. Correct! The abbreviation Ar stands for an aromatic ring, sometimes called an aryl ring.

This doctoral thesis is based on the following papers, which will be referred to in the text by their Roman numerals (I-V):

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options of the structure similarity

Biologically Relevant Molecular Comparisons. Mark Mackey

1. (18) Multiple choice questions. Please place your answer on the line preceding each question.

Practical QSAR and Library Design: Advanced tools for research teams

CHAPTER-2. Drug discovery is a comprehensive approach wherein several disciplines

Web tools for Monomer selection, Library Design and Compound Acquisition. Andrew Leach GlaxoSmithKline Research and Development Stevenage

Docking. GBCB 5874: Problem Solving in GBCB

Chemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013

Cheminformatics analysis and learning in a data pipelining environment

Medicinal Chemistry/ CHEM 458/658 Chapter 4- Computer-Aided Drug Design

3D QSAR analysis of quinolone based s- triazines as antimicrobial agent

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration

molecules ISSN

Design and Synthesis of the Comprehensive Fragment Library

BIOINF Drug Design 2. Jens Krüger and Philipp Thiel Summer Lecture 5: 3D Structure Comparison Part 1: Rigid Superposition, Pharmacophores

Fast similarity searching making the virtual real. Stephen Pickett, GSK

Computational Methods and Drug-Likeness. Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004

A Review on Computational Methods in Developing Quantitative Structure-Activity Relationship (QSAR)

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

Analyzing Small Molecule Data in R

Kinome-wide Activity Models from Diverse High-Quality Datasets

Alkane/water partition coefficients and hydrogen bonding. Peter Kenny

Non-linear Prediction of Quantitative Structure Activity Relationships

Table 8.2 Detailed Table of Characteristic Infrared Absorption Frequencies

Kernel-based Machine Learning for Virtual Screening

Hologram and Receptor-Guided 3D QSAR Analysis of Anilinobipyridine JNK3 Inhibitors

Chemical library design

The reuse of structural data for fragment binding site prediction

Xia Ning,*, Huzefa Rangwala, and George Karypis

Molecular Dynamics Graphical Visualization 3-D QSAR Pharmacophore QSAR, COMBINE, Scoring Functions, Homology Modeling,..

Transcription:

verview D-QSAR Definition Examples Features counts Topological indices D fingerprints and fragment counts R-group descriptors ow good are D descriptors in practice? Summary Peter Gedeck ovartis Institutes for BioMedical Research RC, orsham, UK V A R T I S V A R T I S Definition What are descriptors? Atom Group umber Vector Function Physicochemical property (log P) Derived properties (distribution of surface electrostatic potential) D-s are based on descriptors derived from a twodimensional graph representation of a molecule 1D - molecular formula D - molecular connectivity / topology D - molecular geometry / stereochemistry D/D/ - conformational ensembles C 1 1 MW =. Molecule Abstract properties (fingerprint = fragment count) V A R T I S V A R T I S verview Good descriptors should characterize molecular properties important for molecular interactions ydrophobic, electronic, steric / size / shape, hydrogen bonding A recently published encyclopaedia describes more then 000 molecular descriptors used in QSAR and molecular modelling. R. Todeschini, V. Consonni, andbook of Molecular, Wiley, 000 Definition Examples Features counts Topological indices D fingerprints and fragment counts R-group descriptors ow good are D descriptors in practice? Summary We cannot cover all! So, here is a selection V A R T I S V A R T I S 1

Feature counts Feature counts ydrogen bond donor ydrogen bond acceptor umber of rings umber or rotatable bonds Features are usually defined using substructures or SMARTS 1 [,] [!#;!0] 1 SMILES and SMARTS tutorial can be found at www.daylight.com Feature counts Application Ghose and Crippen developed an atom-based model for logp (alogp) Ghose AK, Crippen GM. Atomic physicochemical parameters for three-dimensional structure-directed quantitative structure-activity relationships. I. Partition coefficients as a measure of hydrophobicity. J Comput Chem (1) -. Wildman SA, Crippen GM. Prediction of physicochemical parameters by atomic contributions. J Chem Inf Comput Sci (1) -. The atoms of a molecule are classified into 1 different atom types aromatic carbon, primary, secondary aliphatic carbon, Linear model for logp logp = f i n i atomtype i Extension of the approach to molar refractivity V A R T I S V A R T I S Feature counts Application: Polar Surface Area (PSA) Feature counts Application: Polar Surface Area (PSA) Polar Surface Area - PSA is the sum of surface contributions of polar atoms (usually oxygens, nitrogens and attached hydrogens). This descriptor is easy to interpret and what is most important, it provides very good correlation with drug transport properties. Ertl P, Rohde B, Selzer P, J Med Chem (000) 1 V A R T I S V A R T I S Feature counts Application: Polar Surface Area (PSA) PSA vs. D-PSA for molecules. n =, r = 0. Generalisation of feature counts based on D fingerprints Fragment dictionary fingerprints Defined structural features (public) keys Pre-defined fragments may not be suitable for dataset ashed fingerprints Automatically generated fragments Convert fragment to unique number (0- ) fingerprint Fold large fingerprint into short representation (e.g. ): Daylight, QSAR, or use as is: SciTegic V A R T I S V A R T I S

Cl I QSAR Fragment based fingerprints occurrences Break structure into fragments and count occurrences occurrences 1 occurrence List of unique fragments Combine all counts for all possible fragments into a vector of numbers = hologram 0 1 0 V A R T I S 1 Convert to unique number 10,0,1,1,0,0,1,0, Interpret numbers as bits or counts Reduce length of vector by folding = 0 reduced hologram V A R T I S QSAR Application Level of detail encoded Atoms: CCC Atoms/Bonds Atoms/Bonds Connections Atoms/Bonds Connections Chirality Minimum and maximum size of fragments (R/S) Will describe a large scale QSAR study comparing various methods later Length of reduced hologram 0 1 0 0 V A R T I S V A R T I S Scientific Rationale What determines binding? Example of a descriptor developed for a very specific application irons L, olliday JD, Jelfs SP, Willett P, Gedeck P. Use f The R-Group Descriptor For Alignment-Free QSAR. QSAR Comb. Sci (00) 11-1. Lead optimisation datasets Series of compounds with common core structure Systematic variation of substituents Modification often localised at a small part of the molecule Cl R1 Glutamate + Glutamine R Isoleucine V A R T I S Muszynski IC et al.. QSAR 1 (1) - R R R1 R R V A R T I S Protein-Ligand interactions hydrophobic hydrogen bond electrostatic Position of pharmacophore in space important

1.0 0. -0. -1.0 0 1 1.0 0. -0. -1.0 1.0 0. -0. -1.0 1.0 0. -0. -1.0 0 1 0 1 0 1 0. 0. -0. -0. 0 1 0. 0. -0. -0. 0. 0. -0. -0. 0. 0. -0. -0. 0 1 0 1 0 1 Scientific Rationale ow to capture binding information? Scientific Rationale Influence of core on binding is constant for lead series + Differences of substituent properties cause difference in binding S R Influence of substituents for same binding mode almost additive Descriptor needs to encode position of pharmacophoric features in space have properties that correlate with binding interactions hydrophobic: atomic polarisability hydrogen bond: hydrogen bond donor/acceptor counts, polar surface area electrostatic: atomic charge Distance of functional groups from core important for binding nly interested in substituents with single attachment point Substituents are fairly small R1 R S V A R T I S V A R T I S R-Group Assign properties to atoms of descriptor Determine distance of atoms to attachment point 0. 0. 0.1 0.1-0. 0. 0. Example Phenyl Phenyl -Aminophenyl -Aminophenyl Combine properties and distance to form the descriptor Descriptor: (0.1, 0., 0., 0.1, -0., 0, 0, 0) 1 1 F -Fluorophenyl -ydroxycyclohexyl -Fluorophenyl -ydroxycyclohexyl 0-1 0 1 Atomic Polarisability Atomic Charge V A R T I S V A R T I S Variations QSAR Atom-Based Based upon the sum of atomic properties: Atomic weights and partial charges Atomic contributions to LogP, MR and PSA -Bond Acceptor and Donor counts (BA and BD) Data Surface-Based Based upon maximum-positive and minimum-negative surface potentials: Molecular Electrostatic Potentials (MEP) Molecular Lipophilicity Potentials (MLP) Structure Model Field-Based Based upon the molecular interaction fields (MIF, GRID) Dry probe - hydrophobic interactions Carbonyl oxygen probe - BD interactions 1 Amide nitrogen probe - BA interactions represent properties of the structure Predictions V A R T I S V A R T I S

QSAR QSAR R-group QSAR Development Descriptor generation: R descriptors R R R 1 Compounds R-groups Atomic properties R descriptors Me R 1 descriptors Compound 1 Compound R 1 R R R 1 R R R 1 R R Function relating descriptors to biological activity: activity = f (Molecular descriptors) X x = explain which molecular features are responsible to activity help to design new compounds with enhanced features Compound Property 1 Property Property variables V A R T I S V A R T I S QSAR Data sets QSAR results Four data sets selected from the literature: Data set Benzodiazepines QSAR PLS R 0. 0. Q 0. 0.1 pred-r 0. -0.1 R R R 1 R R R 1 R R 1 R R R 1 benzodiazepines serotonin triazines tropanes R 1 R Cl Serotonin Triazines Tropanes T Tropanes A Tropanes DA QSAR QSAR QSAR CoMFA QSAR CoMFA QSAR CoMFA 0. 0.0 0. 0. 0. 0. 0. 0. 0.1 0. 0. 0. 0. 0. 0. 0.1 0.1 0. 0. 0. 0. 0.1 0. 0. 0. 0.0 0.1 0. 0.1 0. 0. 0. 0. 0.0 0. 0. 0. 0. V A R T I S V A R T I S QSAR Serotonin data set Simulated Lead-ptimisation Exception Serotonin data set (q = 0.) not surprising Literature result using CoMFA: r=0.1, q=0. Substituents large and structurally very diverse Demonstrates limitation of R-Group descriptors Retrieve initial lead compounds Initialisation R1 R R1: (cores) S S Remaining compounds? Generate QSAR [false] [true] Select best predictions ptimisation V A R T I S V A R T I S

Simulated Lead-ptimisation Retrospective analysis using three in-house datasets with known timecourse programme programme programme Distribution of activities (pic0 values) Iterations of 0 compounds activity Simulated Lead-ptimisation Box-plots improve clarity of visualisation activity outliers upper adjacent value upper quartile median lower quartile lower adjacent value 1 1 iteration 1 1 1 iteration 1 V A R T I S V A R T I S Simulated Lead-ptimisation chemist chemist chemist verview Two strategies Chronological starting point Diverse starting point QSAR supported lead optimisation identifies potent compounds more rapidly activity chronos chronos chronos Definition Examples Features counts Topological indices D fingerprints and fragment counts R-group descriptors ow good are D descriptors in practice? Summary diverse diverse diverse 1 1 iteration 1 V A R T I S V A R T I S contain between 0 and 000 datapoints Approximately 0 datasets extracted from corporate database contain estimated data (e.g. > µm, full DS) contain only exact measurements (pruned DS) verlap 0 datasets Average 00 datapoints Average 1. log(mol/l) different descriptors studied D descriptors (GRID): single conformation used (Concord); default settings DRY,, 1 probe D descriptors : Counts of atom types FCFCx (x=,,; SciTegic): Counts of extended connectivity fragments using pharmacophore atom typing; three levels of complexity QSAR: Count of fragment occurrences; default settings, 01 length Similog: Descriptor based on counts of pharmacophore triplets Fingerprint : public key fingerprint. : ovartis developed fingerprint, optimised for searching/filtering in corporate database. PCA required for FCFCx, and Similog due to large number of descriptors V A R T I S V A R T I S

sorted by activity split into training and test set 0-0: Every other data point used for test set (interpolation) -: Top and bottom % of dataset used for testing (extrapolation) PLS model (implementation Sybyl) ptimal number of components determined using crossvalidation of training set Characterisation of model performance Predictive performance of model on test set Multivariate predictive r pred Correlation actual versus predicted r corr pred act ( yi yi ) i test act act ( yi y ) rpred = 1 i test pred pred act act ( yi y )( yi y ) i test pred pred act act ( yi y ) ( yi y ) rcorr = i test i test V A R T I S V A R T I S Validation through randomisation experiments datasets using the descriptors Random test/training set splits Median std dev of r pred values: y-scrambling r pred values dropped to - with median std dev of Dataset 0 0 0 0 datasets 00 data points descriptors 1.000 s 0 0 experiments 0-0. 0. 0. 0. 0. 1.0 r pred V A R T I S V A R T I S 1. Performance of individual descriptors. Dependence on dataset characteristics. Comparing descriptors. r pred or r corr? 1. Performance of individual descriptors 0-0 experiment, full dataset Dependence on cut-off one of the descriptors is best all the times QSAR and perform best; descriptor are biased towards features of the dataset FCFCx should be similar, but too many features introduce too much noise FCFC slightly better than FCFC and FCFC AlogP, FCFC, FCFC,, and Similog occupy middle ground performs worst Percentage of good models 0 Method 0 FCFC FCFC FCFC QSAR 0 Similog Total 0 0 0 0. 0. 0. 0. 1.0 r pred cut-off for good models V A R T I S V A R T I S

1. Performance of individual descriptors umber of good models: r pred > 0. Similog All descriptors: QSAR 0-0 experiment FCFC % (11) pruned DS FCFC FCFC % (11) full DS - experiment % () pruned DS % () full DS Adding estimated data Similog (=inactives) improves QSAR models (0-0 experiment) FCFC FCFC FCFC Pruned dataset - Full dataset - 1 0 0 Pruned dataset 0-0 Full dataset 0-0. Descriptor dependence Dataset size for 0-0 experiment Red line is local LESS regression Similar results obtained for experiment Trend as expected Good models are easier to achieve for larger datasets 1 0 0 Percentage V A R T I S V A R T I S. Descriptor dependence Spread of biological activities for 0-0 experiment Red line is local LESS regression Similar results obtained for experiment Trend as expected log(1 mol/l) minimum requirement for good models log( mol/l) better. Comparing descriptors Example: versus 0-0 experiment, full dataset, negative r pred values set to 0 ighly correlated, yet more complex descriptors consistently better V A R T I S V A R T I S. Comparing descriptors. Comparing descriptors Graphs compare r pred values calculated for different descriptors using densities 0-0 experiment, full dataset and high correlation, but shifted curve verview shows Similog and descriptors are different FCFC and FCFC behave very similar Visualisation too complex Visualisation of correlation matrices. r pred used to cluster descriptors and to calculate correlation matrix. are reordered in each graph using hierarchical clustering. Colours correspond to correlation coefficients: 0. (blue), 0. (green), 0. (yellow), 0. (orange). FCFC, Similog and descriptors are very different and highly correlated, but quality of models is very different FCFC Similog QSAR FCFC FCFC FCFC Similog QSAR FCFC FCFC V A R T I S V A R T I S

. r pred or r correl QSAR, full dataset Black line is identity Similar result obtained for other descriptors 0-0 experiment: nly little difference between the two statistical measures - experiment: Accurate prediction of extrapolated activity data difficult Summary Study compares different QSAR methods using 00 real-life datasets ine different types of descriptors used Some descriptors are better than others, but none is perfect Why only -% good models? Quality of biological data Small dataset QSAR unreliable (but not useless!) Maybe it looks worse than it is; % good models for cut-off r pred>0. s performance was disappointing, but it may be improved if better care is used to identify correct conformations For a new dataset, try QSAR (and ) first: fast and often a good performance V A R T I S V A R T I S Acknowledgements Christian Bartels, Bernd Rohde, GPS, IBR, Basel, Switzerland Large-scale QSAR study Peter Ertl, Paul Selzer, GPS, IBR, Basel, Switzerland PSA model Steven Jelfs, Prof. Peter Willett, Dr. John olliday, Linda irons, University of Sheffield, UK R-group descriptors V A R T I S