Emerging patterns mining and automated detection of contrasting chemical features

Similar documents
Automated detection of structural alerts (chemical fragments) in (eco)toxicology

Virtual affinity fingerprints in drug discovery: The Drug Profile Matching method

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

Exploring the black box: structural and functional interpretation of QSAR models.

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Chemical library design

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

UniStra activities within the BigChem project:

Computational chemical biology to address non-traditional drug targets. John Karanicolas

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule

Σοφια: how to make FCA polynomial?

Molecular descriptors and chemometrics: a powerful combined tool for pharmaceutical, toxicological and environmental problems.

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Research Article. Chemical compound classification based on improved Max-Min kernel

Structural biology and drug design: An overview

Notes of Dr. Anil Mishra at 1

Enamine Golden Fragment Library

IN SILICO STUDY OF NEW STRUCTURAL ALERTS OF AGENTS CLASTOGENIC SUMMARY

Antonio de la Vega de León, Jürgen Bajorath

Priority Setting of Endocrine Disruptors Using QSARs

In silico pharmacology for drug discovery

Antonio de la Vega de León, Jürgen Bajorath

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

Evaluation of Different Methods for Identification of Structural Alerts Using Chemical Ames Mutagenicity Data Set as a Benchmark

BIOAUTOMATION, 2009, 13 (4),

Mechanism-based Molecular Design of Chemicals with Low Carcinogenicity Potential

An Integrated Approach to in-silico

OECD QSAR Toolbox v.4.1

OECD QSAR Toolbox v.3.3

Introduction to Chemoinformatics and Drug Discovery

1.QSAR identifier 1.1.QSAR identifier (title): QSAR for female rat carcinogenicity (TD50) of nitro compounds 1.2.Other related models:

Machine learning for ligand-based virtual screening and chemogenomics!

OECD QSAR Toolbox v.3.4

Data Mining in the Chemical Industry. Overview of presentation

Mining (Soft-)Skypatterns using Constraint Programming

Structural interpretation of QSAR models a universal approach

Docking. GBCB 5874: Problem Solving in GBCB

OECD QSAR Toolbox v.4.1. Step-by-step example for building QSAR model

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database

OECD QSAR Toolbox v.3.2. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

QSAR in Green Chemistry

Plan. Day 2: Exercise on MHC molecules.

Organometallics & InChI. August 2017

LIFE Project Acronym and Number ANTARES LIFE08 ENV/IT/ Deliverable Report. Deliverable Name and Number

Chemical Data Retrieval and Management

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods

Analysis of Activity Landscapes, Activity Cliffs, and Selectivity Cliffs. Jürgen Bajorath Life Science Informatics University of Bonn

Dispensing Processes Profoundly Impact Biological, Computational and Statistical Analyses

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

OECD QSAR Toolbox v.3.4. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

Statistical concepts in QSAR.

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

(e.g.training and prediction set, algorithm, ecc...). 2.9.Availability of another QMRF for exactly the same model: No other information available

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Using Web Technologies for Integrative Drug Discovery

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build a userdefined

De Novo molecular design with Deep Reinforcement Learning

Data Analysis in the Life Sciences - The Fog of Data -

Use of data mining and chemoinformatics in the identification and optimization of high-throughput screening hits for NTDs

QMRF# Title. number and title in JRC QSAR Model Data base 2.0 (new) number and title in JRC QSAR Model Data base 1.0

Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Biclustering Gene-Feature Matrices for Statistically Significant Dense Patterns

Screening and prioritisation of substances of concern: A regulators perspective within the JANUS project

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Introduction. OntoChem

Quantitative Structure Activity Relationships: An overview

Chemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013

Raed Saeed Khashan. Chapel Hill Approved by: Dr. Alexander Tropsha. Dr. Weifan Zheng. Dr. Alexander Golbraikh. Dr. Wei Wang. Dr.

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

Read-Across or QSARs?

The In Silico Model for Mutagenicity

Drug Informatics for Chemical Genomics...

Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data

Design and characterization of chemical space networks

Molecular Fragment Mining for Drug Discovery

1.3.Software coding the model: QSARModel Molcode Ltd., Turu 2, Tartu, 51014, Estonia

Machine Learning Concepts in Chemoinformatics

ICH M7 Impact on DMF Review Process. David Green Quality Assessment Lead (acting) CDER/OPQ/ONDP/DLCAPI November 4, 2015

Mining (Soft-) Skypatterns Using Dynamic CSP

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction

OECD QSAR Toolbox v.3.3. Predicting acute aquatic toxicity to fish of Dodecanenitrile (CAS ) taking into account tautomerism

Structure-Activity Modeling - QSAR. Uwe Koch

György M. Keserű H2020 FRAGNET Network Hungarian Academy of Sciences

Data Quality Issues That Can Impact Drug Discovery

QMRF identifier (JRC Inventory): QMRF Title: QSAR model for female rat carcinogenicity (TD50) of nitro compounds Printing Date:21.01.

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012

Analysis of the ToxCast & Tox21 compound set using regulatorderived GHS toxicity annotations and in silico-derived protein-target descriptors

Visualization and manipulation of Matched Molecular Series for decision support

A Submodular Framework for Graph Comparison

RSC Publishing. Principles and Applications. In Silico Toxicology. Liverpool John Moores University, Liverpool, Edited by

Transcription:

Emerging patterns mining and automated detection of contrasting chemical features Alban Lepailleur Centre d Etudes et de Recherche sur le Médicament de Normandie (CERMN) UNICAEN EA 4258 - FR CNRS 3038 INC3M - SF 4206 ICORE Université de Caen Normandie, France Chemoinformatics Strasbourg Summer School 2016

Overview Emerging Pattern (EP) mining Generalities Our contributions Case study I : Detection of structural alerts for the mutagenicity endpoint Case study II : Polypharmacology of kinases Chemoinformatics Strasbourg Summer School 2016 2

Data mining in chemoinformatics With the explosion of the availability of data, we need new methods to identify structure-activity relationships in large databases LeadScope, ChEMBL, PubChem, The calculation of the frequency of a descriptor is often at the core of the process Algorithms for the calculation of frequent descriptors often lead to the generation of myriads of such descriptors Chemoinformatics Strasbourg Summer School 2016 3

Emerging patterns mining To limit the number of generated descriptors, methods have been proposed for finding representative and significant subsets Emerging pattern mining is a data mining technique introduced by Dong and Li [1,2] that captures differentiating features between 2 classes of data Descriptors d1 d2 d3 d4 d5 mol1 X X mol2 X X X X mol3 X mol4 X X mol5 X X X mol6 X X X mol7 X mol8 X mol9 X X X mol10 X X [1] Dong G. & Li J. Efficient mining of Emerging Patterns: Discovering trends and differences. Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Min., 5th 1999, 43-52. [2] Contrast Data Mining: Concepts, Algorithms, and Applications; Dong G. & Bailey J., Eds.; CRC Press: Boca Raton, FL, 2013. Chemoinformatics Strasbourg Summer School 2016 4

Emerging patterns mining To limit the number of generated descriptors, methods have been proposed for finding representative and significant subsets Emerging pattern mining is a data mining technique introduced by Dong and Li [1,2] that captures differentiating features between 2 classes of data Descriptors d1 d2 d3 d4 d5 mol1 X X mol2 X X X X mol3 X mol4 X X mol5 X X X mol6 X X X mol7 X mol8 X mol9 X X X mol10 X X Emerging Pattern (EP) {d1,d2} is supported by molecules [2,4,5] and molecule [6] Growth-rate ρ = 3 [1] Dong G. & Li J. Efficient mining of Emerging Patterns: Discovering trends and differences. Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Min., 5th 1999, 43-52. [2] Contrast Data Mining: Concepts, Algorithms, and Applications; Dong G. & Bailey J., Eds.; CRC Press: Boca Raton, FL, 2013. Chemoinformatics Strasbourg Summer School 2016 5

Emerging patterns mining To limit the number of generated descriptors, methods have been proposed for finding representative and significant subsets Emerging pattern mining is a data mining technique introduced by Dong and Li [1,2] that captures differentiating features between 2 classes of data Descriptors d1 d2 d3 d4 d5 mol1 X X mol2 X X X X mol3 X mol4 X X mol5 X X X mol6 X X X mol7 X mol8 X mol9 X X X mol10 X X Jumping Emerging Pattern (JEP) {d4} is supported by molecules [3,5] Growth-rate ρ = [1] Dong G. & Li J. Efficient mining of Emerging Patterns: Discovering trends and differences. Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Min., 5th 1999, 43-52. [2] Contrast Data Mining: Concepts, Algorithms, and Applications; Dong G. & Bailey J., Eds.; CRC Press: Boca Raton, FL, 2013. Chemoinformatics Strasbourg Summer School 2016 6

Applications of EP mining in chemoinformatics Auer and Bajorath were the firsts to apply EP mining in chemoinformatics [3,4] Particularly, they introduced the notion of emerging chemical patterns (ECPs) for molecular classification [3] Auer J. & Bajorath J. Emerging Chemical Patterns: A new methodology for molecular classification and compound selection. J. Chem. Inf. Model. 2006, 46, 2502-2514. [4] Namasivayam V.et al. Classification of compounds with distinct or overlapping multi-target activities and diverse molecular mechanisms using Emerging Chemical Patterns. J. Chem. Inf. Model. 2013, 53, 1272 1281. Sherhod and co-workers also applied EP mining for the identification of toxicophores for various toxicological endpoints [5,6] Their method has been successfully used to implement new structural alerts for mutagenicity in the Derek Nexus expert system [7] [5] Sherhod R.et al. Automating knowledge discovery for toxicity prediction using Jumping Emerging Pattern mining. J. Chem. Inf. Model. 2012, 52, 3074-3087. [6] Sherhod R. et al. Emerging Pattern mining to aid toxicological knowledge discovery. J. Chem. Inf. Model. 2014, 54, 1864-1879. [7] Coquin L. et al. New structural alerts for Ames mutagenicity discovered using Emerging Pattern mining techniques. Toxicol. Res. 2015, 4, 46-56. Our contributions We related the occurrences of jumping fragments to aquatic toxicity data [8] We introduced the enumeration of combinations of chemical fragments [9,10] [8] Lozano S. et al. Introduction of jumping fragments in combination with QSARs for the assessment of classification in ecotoxicology. J. Chem. Inf. Model. 2010, 50, 1330 1139. [9] Poezevara G. et al. Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs. J. Intel. Inf. Syst. 2011, 37, 333 353. [10] Cuissart B. et al. Emerging Patterns as Structural Alerts for Computational Toxicology. In Contrast Data Mining: Concepts, Algorithms and Applications; Dong, G., Bailey, J., Eds.; Chapman and Hall/CRC, 2013; pp 269 281. Chemoinformatics Strasbourg Summer School 2016 7

Our method What makes our method [10] original: We operate directly from the molecular graphs [11] Metivier, J.P. et al. Discovering structural alerts for mutagenicity using Stable Emerging Molecular Patterns. J. Chem. Inf Model. 2015, 55, 925-940 Chemoinformatics Strasbourg Summer School 2016 8

Our method What makes our method [11] original: We operate directly from the molecular graphs [11] Metivier, J.P. et al. Discovering structural alerts for mutagenicity using Stable Emerging Molecular Patterns. J. Chem. Inf Model. 2015, 55, 925-940 Chemoinformatics Strasbourg Summer School 2016 9

Our method What makes our method [11] original: We operate directly from the molecular graphs We enumerate the frequent closed patterns [11] Metivier, J.P. et al. Discovering structural alerts for mutagenicity using Stable Emerging Molecular Patterns. J. Chem. Inf Model. 2015, 55, 925-940 Chemoinformatics Strasbourg Summer School 2016 10

Our method What makes our method [11] original: We operate directly from the molecular graphs [11] Metivier, J.P. et al. Discovering structural alerts for mutagenicity using Stable Emerging Molecular Patterns. J. Chem. Inf Model. 2015, 55, 925-940 We enumerate the frequent closed patterns to extract the emerging patterns Chemoinformatics Strasbourg Summer School 2016 11

Our method What makes our method [11] original: We operate directly from the molecular graphs We enumerate the frequent closed patterns to extract the emerging patterns We organize the patterns in a Hasse diagram [11] Metivier, J.P. et al. Discovering structural alerts for mutagenicity using Stable Emerging Molecular Patterns. J. Chem. Inf Model. 2015, 55, 925-940 Hasse diagram Chemoinformatics Strasbourg Summer School 2016 12

Case study (I): Structural alerts Search of structural alerts One of the most interesting approach of predictive toxicology Define the key features of a molecule that are required to initiate a toxicological pathway Examples of domain experts rules The Tennant and Ashby s set for DNA reactivity The Benigni and Bossa s set for mutagenic and carcinogenic potential ToxAlerts The updating of a knowledge base is very time consuming since it requires strong investment of domain experts and a detailed analysis of the scientific literature EP mining should reduce the time and efforts needed to identify new structural alerts Chemoinformatics Strasbourg Summer School 2016 13

Case study (I): Structural alerts Search of structural alerts for mutagenicity The Hansen benchmark dataset (http://doc.ml.tu-berlin.de/toxbenchmark/) 6512 compounds from the literature annotated with Ames mutagenicity data 3503 Ames + and 3009 Ames - Use of a 0.36% frequency threshold (support of 20 molecules) 15000 Eps 10 JEPs only present in the mutagens Nitro aromatic groups Azide group Polycyclic planar hydrocarbon system Polycyclic aromatic amines Generalized toxicophores Nitrosamine group Chemoinformatics Strasbourg Summer School 2016 14

Case study (I): Structural alerts Search of structural alerts for mutagenicity Comparison with ToxAlerts (https://ochem.eu/) 32 out of 50 toxicophores Chemoinformatics Strasbourg Summer School 2016 15

Case study (I): Structural alerts Search of structural alerts for mutagenicity Extract of the Hasse diagram Example: stimulation [12] of aromatic rings by the addition of a nitro group EP_921 ρ = 3.65 s = 42 EP_13941 ρ = 2.71 s = 166 EP_968 ρ = 4.05 s = 829 EP_920 ρ = 5.26 s = 57 EP_915 ρ = 26.63 s = 32 EP_916 ρ = s = 37 EP_914 ρ = s = 89 EP_16868 ρ = 1.23 s = 4310 EP_13923 ρ = 2,11 s = 256 EP_597 ρ = 4.63 s = 926 EP_13617 ρ = 2.22 s = 86 EP_13620 ρ = 3.36 s = 54 EP_13594 ρ = 2.08 s = 161 [12] Bissell-Siders R.et al. On the stimulation of patterns. Lect. Notes Comput. Sci. 2010, 6208, 56-69. Chemoinformatics Strasbourg Summer School 2016 16

Case study (I): Structural alerts Our interactive visualization tool https://chemoinfo.greyc.fr/2014_metivier/ Navigation in the Hasse diagram List of patterns Extent of the current pattern in the dataset Chemoinformatics Strasbourg Summer School 2016 17

The one drug one target one disease paradigm Case study (II): Polypharmacology Drug Target Phenotype Disease Chemoinformatics Strasbourg Summer School 2016 18

Case study (II): Polypharmacology The one drug one target one disease paradigm Polypharmacological drug behavior Many known drugs elicit their therapeutic effects by acting on multiple targets But such drugs can also bind antitargets responsible for side effects Drug Antitargets Targets Side effect Phenotype Disease Chemoinformatics Strasbourg Summer School 2016 19

Case study (II): Polypharmacology Polypharmacology of kinases [13,14] Most tumors can escape from the inhibition of any single kinase The GSK Published Kinase Inhibitor Set (PKIS) as a source of knowledge Annotation of the kinase inhibitors using affinity data for 220 kinases (active/inactive) 367 small molecules ATP-competitive kinase inhibitors Data available from [13] Knight, Z.A. et al. Targeting the Cancer Kinome through Polypharmacology. Nat. Rev. Cancer 2010, 10, 130 137. [14] Wu, P. et al. Small-Molecule Kinase Inhibitors: An Analysis of FDA-Approved Drugs. Drug Discovery Today 2016, 21 (1), 5 10. Chemoinformatics Strasbourg Summer School 2016 20

Case study (II): Polypharmacology Kinase Miner Interactive tool dedicated to polypharmacology of kinases Chemoinformatics Strasbourg Summer School 2016 21

Case study (II): Polypharmacology Kinase Miner Interactive tool dedicated to polypharmacology of kinases Example: dual inhibition of ERBB2 and EGFR CHEMBL1962281 ERBB2 Tyr kinase family CHEMBL1962259 EGFR Tyr kinase family Chemoinformatics Strasbourg Summer School 2016 22

Case study (II): Polypharmacology Kinase Miner Interactive tool dedicated to polypharmacology of kinases Example: dual inhibition of ERBB2 and EGFR CHEMBL1962281 ERBB2 Tyr kinase family CHEMBL1962259 EGFR Tyr kinase family Molecules in agreement with the dual ERBB2 and EGFR inhibition Chemoinformatics Strasbourg Summer School 2016 23

Kinase Miner Interactive tool dedicated to polypharmacology of kinases Example: dual inhibition of ERBB2 and EGFR Case study (II): Polypharmacology CHEMBL1962281 ERBB2 Tyr kinase family CHEMBL1962259 EGFR Tyr kinase family CHEMBL529066 ERBB2 ERBB4 EGFR 73.03 % inh @ 1μM 96.64 % inh @ 1μM 94.13 % inh @ 1μM Chemoinformatics Strasbourg Summer School 2016 24

Case study (II): Polypharmacology Kinase Miner Interactive tool dedicated to polypharmacology of kinases Example: dual inhibition of ERBB2 and EGFR CHEMBL1962281 ERBB2 Tyr kinase family CHEMBL1962259 EGFR Tyr kinase family Chemoinformatics Strasbourg Summer School 2016 25

Case study (II): Polypharmacology Kinase Miner Interactive tool dedicated to polypharmacology of kinases Example: dual inhibition of ERBB2 and EGFR Ancestors less specific Current Successors more specific CHEMBL1962281 ERBB2 Tyr kinase family CHEMBL1962259 EGFR Tyr kinase family Chemoinformatics Strasbourg Summer School 2016 26

Case study (II): Polypharmacology Kinase Miner Interactive tool dedicated to polypharmacology of kinases Example: dual inhibition of ERBB2 and EGFR Ancestors less specific Current Successors more specific CHEMBL1962281 ERBB2 Tyr kinase family CHEMBL1962259 EGFR Tyr kinase family Chemoinformatics Strasbourg Summer School 2016 27

Case study (II): Polypharmacology Kinase Miner Interactive tool dedicated to polypharmacology of kinases Example: dual inhibition of ERBB2 and EGFR Ancestors less specific Current Successors more specific CHEMBL1962281 ERBB2 Tyr kinase family CHEMBL1962259 EGFR Tyr kinase family Chemoinformatics Strasbourg Summer School 2016 28

Last development ❶ Initial method EPs ❷ Addition of the distances 8 19 13 2 8 3 Topological EPs ❸ Pharmacophoric definitions HBA 13 20 14 HBA 5 Hyd 11 Ar 3 2D-pharmacophoric EPs Chemoinformatics Strasbourg Summer School 2016 29

Conclusion and perspectives The aim of the EP mining described here is to support the knowledge discovery in large and multidimensional data sets Validation Identification of toxicophores Understanding of the polypharmacological profile of kinase inhibitors Development of a workflow tool Poster P17 Métivier et al. Automated Generation of 2D-Pharmacophores from Large Datasets Chemoinformatics Strasbourg Summer School 2016 30

Acknowledgments Bertrand Cuissart Bruno Crémilleux Ronan Bureau Jean-Philippe Métivier Financial supports MinOmics program Aleksey Buzmakov Amedeo Napoli Guillaume Poezevara www.quiid.tech - contact@quiid.tech Sergei Kuznetsov Jérémie Le Goff Chemoinformatics Strasbourg Summer School 2016 31