Exploring the black box: structural and functional interpretation of QSAR models.

Similar documents
Structural interpretation of QSAR models a universal approach

Interpretation of QSAR models

Structure-Activity Modeling - QSAR. Uwe Koch

Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule

UniStra activities within the BigChem project:

Machine learning for ligand-based virtual screening and chemogenomics!

In Silico Prediction of ADMET properties with confidence: potential to speed-up drug discovery

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

In silico pharmacology for drug discovery

An Integrated Approach to in-silico

Screening and prioritisation of substances of concern: A regulators perspective within the JANUS project

Virtual screening in drug discovery

Chemical Space: Modeling Exploration & Understanding

Machine Learning Concepts in Chemoinformatics

OCHEM. Product features and highlights

Plan. Day 2: Exercise on MHC molecules.


Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Introduction. OntoChem

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

Chemical library design

E. Muratov 1, E. Varlamova 2, A. Artemenko 2, D. Fourches 1, V. Kuz'min 2, A. Tropsha 1

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

Translating Methods from Pharma to Flavours & Fragrances

OECD QSAR Toolbox v.4.1. Step-by-step example for building QSAR model

Predicting Binding Affinity of CSAR Ligands Using Both Structure- Based and Ligand-Based Approaches

DOCKING TUTORIAL. A. The docking Workflow

Structural biology and drug design: An overview

Hierarchical QSAR technology based on the Simplex representation of molecular structure

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Applications of multi-class machine

Materials Informatics: Statistical Modeling in Material Science

Emerging patterns mining and automated detection of contrasting chemical features

Docking. GBCB 5874: Problem Solving in GBCB

KNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös

Estimating Predictive Uncertainty for Ensemble Regression Models by Gamma Error Analysis

Ligand-receptor interactions

Advanced Medicinal Chemistry SLIDES B

The PhilOEsophy. There are only two fundamental molecular descriptors

Quantitative Structure-Activity Relationship (QSAR) computational-drug-design.html

CS6220: DATA MINING TECHNIQUES

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods

(e.g.training and prediction set, algorithm, ecc...). 2.9.Availability of another QMRF for exactly the same model: No other information available

Overview. Descriptors. Definition. Descriptors. Overview 2D-QSAR. Number Vector Function. Physicochemical property (log P) Atom

QSAR in Green Chemistry

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Biologically Relevant Molecular Comparisons. Mark Mackey

User Guide for LeDock

CheS-Mapper 2.0 for visual validation of (Q)SAR models

Pose and affinity prediction by ICM in D3R GC3. Max Totrov Molsoft

Qsar study of anthranilic acid sulfonamides as inhibitors of methionine aminopeptidase-2 using different chemometrics tools

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity

Towards Physics-based Models for ADME/Tox. Tyler Day

A Review on Computational Methods in Developing Quantitative Structure-Activity Relationship (QSAR)

RSC Publishing. Principles and Applications. In Silico Toxicology. Liverpool John Moores University, Liverpool, Edited by

ICM-Chemist-Pro How-To Guide. Version 3.6-1h Last Updated 12/29/2009

Introduction to Chemoinformatics and Drug Discovery

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build a userdefined

BioSolveIT. A Combinatorial Approach for Handling of Protonation and Tautomer Ambiguities in Docking Experiments

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Development of a Structure Generator to Explore Target Areas on Chemical Space

CS6220: DATA MINING TECHNIQUES

5.1. Hardwares, Softwares and Web server used in Molecular modeling

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Kristin P. Bennett. Rensselaer Polytechnic Institute

Bridging the Dimensions:

molecules ISSN

1.QSAR identifier 1.1.QSAR identifier (title): QSAR model for Toxicokinetics, Transfer Index (TI) 1.2.Other related models:

Molecular Descriptors Family on Structure Activity Relationships 5. Antimalarial Activity of 2,4-Diamino-6-Quinazoline Sulfonamide Derivates

Click Prediction and Preference Ranking of RSS Feeds

BioSolveIT. A Combinatorial Docking Approach for Dealing with Protonation and Tautomer Ambiguities

OECD QSAR Toolbox v.4.1. Step-by-step example for predicting skin sensitization accounting for abiotic activation of chemicals

Quantitative structure activity relationship and drug design: A Review

Machine-Learning Methods in Property Predictions: Quo Vadis?

Practical QSAR and Library Design: Advanced tools for research teams

Using AutoDock for Virtual Screening

Molecular Modeling Study of Some Anthelmintic 2-phenyl Benzimidazole-1- Acetamides as β-tubulin Inhibitor

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Identification of Active Ligands. Identification of Suitable Descriptors (molecular fingerprint)

Data Mining in the Chemical Industry. Overview of presentation

Coefficient Symbol Equation Limits

Gradient Boosting, Continued

Statistical concepts in QSAR.

(Big) Data analysis using On-line Chemical database and Modelling platform. Dr. Igor V. Tetko

Use of data mining and chemoinformatics in the identification and optimization of high-throughput screening hits for NTDs

Generating Small Molecule Conformations from Structural Data

Estimation of Melting Points of Brominated and Chlorinated Organic Pollutants using QSAR Techniques. By: Marquita Watkins

PROVIDING CHEMINFORMATICS SOLUTIONS TO SUPPORT DRUG DISCOVERY DECISIONS

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Tutorials on Library Design E. Lounkine and J. Bajorath (University of Bonn) C. Muller and A. Varnek (University of Strasbourg)

DivCalc: A Utility for Diversity Analysis and Compound Sampling

Xia Ning,*, Huzefa Rangwala, and George Karypis

Quantitative Structure Activity Relationships: An overview

Building predictive unbound brain-to-plasma concentration ratio (K p,uu,brain ) models

Transcription:

EMBL-EBI Industry workshop: In Silico ADMET prediction 4-5 December 2014, Hinxton, UK Exploring the black box: structural and functional interpretation of QSAR models. (Automatic exploration of datasets using QSAR) Pavel Polishchuk A.V. Bogatsky Physico-Chemical Institute of NAS of Ukraine, Odessa, Ukraine pavel_polishchuk@ukr.net

Outline Introduction and existed approaches Structural QSAR interpretation theory and practical examples Functional QSAR interpretation theory, practical examples and comparison with docking studies Automatic exploration of chemical dataset 2

QSAR interpretation: interpretability vs. complexity Model interpretability Popular misbelief MLR PLS DT knn RF SVM ANN models ensembles Model complexity 3

Importance of QSAR structural interpretation Extract SAR information in a chemically meaningful way detection of structural alerts, creation of structural filters or set of rules fragment-based drug design Model validation interpretation results should not contradict with experimental observations 4

QSAR interpretation approaches Model-specific approaches: Rule-based (Decision tree) Regression coefficients (MLR, PLS) Latent variables (PLS) Weights and biases (ANN) Model-independent approaches: Local gradients or partial derivatives I.I. Baskin et al., SAR QSAR Environ Sci, 2002, 35-41 G. Marcou et al., Molecular informatics, 2012, 639-642 C i f(x i ) f(x x i i Δx i ) 5

QSAR interpretation: common workflow Model Variables contributions Structureproperty relationship f(x) Var_1 Var_2 Mol_1-0.23 1.82 Mol_2 2.36 1.27 Mol_3 5.01 2.30 Mol_4 0.69-0.58 6

Matched molecular pairs & molecular transformations logs = -3.18 logs = -0.60 ΔlogS = 2.58 H OH ΔlogS = 1.59 logs = -2.21 logs = -0.62 Leech A.G. et al, J. Med. Chem. 2006, 49, 6672-6682 Sheridan R.P. et al., J. Chem. Inf. Model. 2006, 46, 180-192 7

Exemplified dataset 8

Structural QSAR interpretation - = logs pred = -1.55 logs pred = -1.61 ΔlogS pred = 0.06 - = logs pred = -1.55 logs pred = -1.35 ΔlogS pred = -0.20 Polishchuk P.G. et al. Molecular Informatics 2013,32, 843-853 9

Structural QSAR interpretation - = logs pred = -1.93 logs pred = -4.32 ΔlogS pred = 2.39 Polishchuk P.G. et al. Molecular Informatics 2013,32, 843-853 10

Limitations of existed descriptors (Dragon, etc) Dragon fails - = Dragon is OK - = Computational MMP H COOH Polishchuk P.G. et al. Molecular Informatics 2013,32, 843-853 11

Simplex representation of molecular structure (SiRMS) Simplex generation example Atom-property labeling Kuz min, V. E. et al, Journal of Molecular Modeling 2005, 11, 457-467. Kuz min, V. E. et al, Journal of Computer-Aided Molecular Design 2008, 22, 403-421. 12

Local and global interpretation Local interpretation analysis of single compounds Global interpretation reveal trends 13

Interpretation: fragmentation Case number Do specific interactions of a ligand with its target exist or important? Is an orientation of a ligand relatively its target known? Fragments selection and grouping 1 NO (e.g. passive diffusion through membranes, solubility, lipophilicity, etc) not relevant can be done by the researcher based on his own knowledge 2 YES (ligand-receptor interactions, host-guest complexes, etc) YES 3 NO consider fragments positions relatively to the target and observed or predicted interactions MMP can be applied, silently assumed that all compounds have the same interaction mode 14

Examples of structural interpretation 15

Solubility (1033 compounds) Endpoint Solubility, logs 5-fold external cross validation results SiRMS Dragon Model R 2 CV RMSE R 2 CV RMSE PLS 0.84 0.82 0.91 0.60 RF 0.88 0.71 0.91 0.62 SVM 0.87 0.72 0.92 0.59 Polishchuk P.G. et al., Molecular Informatics, 2013, 843-853 16

Mutagenicity (Ames, 4361 compounds) 5-fold external cross validation results Descriptors Algorithm Balanced Accuracy SiRMS RF 0.817 SVM 0.800 Dragon RF 0.816 SVM 0.793 Polishchuk P.G. et al. Molecular Informatics, 2013, 843-853 17

Combined contribution (effect) of fragments RF+SiRMS Contribution = 0 (non-mutagen) Contribution = 0 (non-mutagen) Contribution = 1 (mutagen) 18

Questions How does the fragment influence the property? WHY?! 19

Functional interpretation of QSAR models Structural interpretation A pic 50 = f(a 1, A 2, A 3 ) = x - = B pic 50 = f(b 1, B 2, B 3 ) = y C Contribution(C) = x - y Functional interpretation A - = B C pic 50 = f(a 1, A 2, A 3 ) = x pic 50 = f(a 1, A 2, B 3 ) = y Contribution 3 (C) = x - y 1, 2, 3 groups of descriptors represented different physico-chemical factors (charge, H-bonding, etc) of compound A and B. 20

Antagonists of fibrinogen receptor (functional interpretation example) 21

Fragment examples Antagonists of fibrinogen receptor: dataset Arg-Gly-Asp Arg-mimetic Linker Asp-mimetic 338 compounds 22

Antagonists of fibrinogen receptor: models 5-fold external cross validation results Algorithm R 2 RMSE RF 0.72 0.80 SVM (RBF kernel) 0.69 0.84 SVM (linear) 0.67 0.88 PLS 0.67 0.87 23

Structural interpretation (global) Linker Asp-mimetic Arg-mimetic 24

Functional interpretation (global) Arg-mimetic Linker Asp-mimetic 25

Functional interpretation of RF model (local) RF model Asp224 Phe160 Tyr190 Arg214 0.14 0.29 0.04 0.81 Ser225 Mg 2+ 0.34 0.74 0.07 1.77 0.05 0.6 0.42 1.03 0.05 0.46 1.02 1.09 electrostatic H-bonding hydrophobicity polarizability 26

Functional interpretation of SVM model (local) SVM-RBF model Asp224 Phe160 Tyr190 Arg214 0.39 0.14 0.13 0.1 Ser225 Mg 2+ 0.22 0.21 0.72 0.71 0.56 0.16 0.51 0.08 0.81 0.22 0.46 0.11 electrostatic H-bonding hydrophobicity polarizability 27

Automatic exploration of datasets of chemical compounds (dataset mining) 28

SiRMS-QSAR software http://qsar4u.com/pages/sirms_qsar.php QSAR model building 1 2 utilizes ncpu - 1 3 4 29

SiRMS-QSAR software Calculation of fragments contributions not implemented yet 30

SiRMS-QSAR software Plot fragments contributions structural interpretation functional interpretation 31

SiRMS-QSAR software External visualization tool https://pavel.shinyapps.io/sirms-qsar-vis/ 32

Interpretation workflow scheme Create sdf file with property values Build models (regression or classification) Look at models stat (if all models are bad reconsider dataset) Calculate fragment contributions Plot contributions of desired models selected from statistically significant ones 33

ADME/Tox examples (SAR trends, global interpretation) Datasets taken from: 1) Cheng W. et al., J. Chem. Inf. Model., 2012, 3099-3105 2) Kovdienko N.A. et al., Molecular informatics, 2010, 394-406 3) Polishchuk P.G. et al., J. Chem. Inf. Model., 2009, 2481-2488 4) in-house data 34

Permeability (structural interpretation) consensus of RF, GBM, SVM models 35

Permeability (functional interpretation) consensus of RF, GBM, SVM models 36

Toxicity (structural interpretation) consensus of RF, GBM, SVM models 37

Toxicity (functional interpretation) consensus of RF, GBM, SVM models 38

Summary SiRMS Descriptors Others (Dragon, CDK, etc) Models Fragments Interpretation Regression + + Classification + + Terminal (substituent) + + Scaffold/linker + - Structural + + Functional +? 39

Conclusions Almost any QSAR model can be interpreted using the proposed schemes. Results of structural and functional interpretation obtained from different models are well correlated between models and correspond to observed trends. Structural interpretation allows to reveal trends in SAR, rank fragments, find potential structural alerts, etc. Functional interpretation may provide a guess about factors which are dominated and influence on the investigated property. 40

Perspectives Smart automatic fragmentation approaches Detection of potential activity cliffs in local interpretation Testing on other types of descriptors Usage of datasets which include mixtures of compounds Application of this approach for wider range of structurally diverse datasets with different end-points and comparison to MMP 41

Useful web links A.V. Bogatsky Physico-Chemical Institute, Chemoinformatic group: http://qsar4u.com SiRMS project on GitHub: https://github.com/drrdom/sirms SiRMS-QSAR (dataset analysis): http://qsar4u.com/pages/sirms_qsar.php External web-based visualization: https://pavel.shinyapps.io/sirms-qsar-vis/ 42

Acknowledgement A.V. Bogatsky Physico-Chemical Institute (Odessa, Ukraine) Strasbourg University (France) Prof. V. Kuz min Dr. T. Khristova Dr. L. Ognichenko A. Kosinskaya E. Mokshina M. Kulinskiy Prof. A. Varnek Dr. D. Horvath 43