Development of a Structure Generator to Explore Target Areas on Chemical Space

Similar documents
Bridging the Dimensions:

Molecular Dynamics Graphical Visualization 3-D QSAR Pharmacophore QSAR, COMBINE, Scoring Functions, Homology Modeling,..

Notes of Dr. Anil Mishra at 1

De Novo molecular design with Deep Reinforcement Learning

Machine learning for ligand-based virtual screening and chemogenomics!

Using AutoDock for Virtual Screening

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Structure-Activity Modeling - QSAR. Uwe Koch

QSAR in Green Chemistry

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

Similarity Search. Uwe Koch

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

An Integrated Approach to in-silico

Introduction to Chemoinformatics and Drug Discovery

Machine Learning Concepts in Chemoinformatics

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

Ping-Chiang Lyu. Institute of Bioinformatics and Structural Biology, Department of Life Science, National Tsing Hua University.

What is a property-based similarity?

Xia Ning,*, Huzefa Rangwala, and George Karypis

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS

Chemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013

Marginal Balance of Spread Designs

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

KNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös

In silico pharmacology for drug discovery

Similarity methods for ligandbased virtual screening

Quantitative Structure Activity Relationships: An overview

Improved Machine Learning Models for Predicting Selective Compounds

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Studying the effect of noise on Laplacian-modified Bayesian Analysis and Tanimoto Similarity

Statistical concepts in QSAR.

Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods

Chemical library design

Selecting Diversified Compounds to Build a Tangible Library for Biological and Biochemical Assays

Introduction. OntoChem

Data Mining in the Chemical Industry. Overview of presentation

Lecture for Molekylär bioinformatik X3 Feb Computational Chemistry in Drug Discovery. Mats Kihlén. Head of Research Informatics.

Quantitative Structure-Activity Relationship (QSAR) computational-drug-design.html

Pose and affinity prediction by ICM in D3R GC3. Max Totrov Molsoft

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Priority Setting of Endocrine Disruptors Using QSARs

Functional Group Fingerprints CNS Chemistry Wilmington, USA

Electrical and Computer Engineering Department University of Waterloo Canada

Using Self-Organizing maps to accelerate similarity search

UniStra activities within the BigChem project:

Can we estimate the accuracy of ADMET predictions?

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Translating Methods from Pharma to Flavours & Fragrances

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Identification of Active Ligands. Identification of Suitable Descriptors (molecular fingerprint)

Receptor Based Drug Design (1)

Research Article. Chemical compound classification based on improved Max-Min kernel

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Design and characterization of chemical space networks

Exploring the black box: structural and functional interpretation of QSAR models.

Virtual screening in drug discovery

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Towards Physics-based Models for ADME/Tox. Tyler Day

Quantitative structure activity relationship and drug design: A Review

est Drive K20 GPUs! Experience The Acceleration Run Computational Chemistry Codes on Tesla K20 GPU today

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options of the structure similarity

Deep Learning: What Makes Neural Networks Great Again?

Structural biology and drug design: An overview

MM-GBSA for Calculating Binding Affinity A rank-ordering study for the lead optimization of Fxa and COX-2 inhibitors

Artificial Intelligence Methods for Molecular Property Prediction. Evan N. Feinberg Pande Lab

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit

Hydrogen Bonding & Molecular Design Peter

Development and application of ligand-based computational methods for de-novo drug. design and virtual screening. Alexander Richard Geanes.

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

Interactive Feature Selection with

Statistical learning theory, Support vector machines, and Bioinformatics

Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

Kernel Density Topic Models: Visual Topics Without Visual Words

Prediction in bioinformatics applications by conformal predictors

AMRI COMPOUND LIBRARY CONSORTIUM: A NOVEL WAY TO FILL YOUR DRUG PIPELINE

Quantitative Bioinformatics

Patrick: An Introduction to Medicinal Chemistry 5e Chapter 01

Biologically Relevant Molecular Comparisons. Mark Mackey

Generative Models for Chemical Structures

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

Machine Learning and Deep Learning! Vincent Lepetit!

CHEMINFORMATICS MODELING OF DIVERSE AND DISPARATE BIOLOGICAL DATA AND THE USE OF MODELS TO DISCOVER NOVEL BIOACTIVE MOLECULES

DATA FUSION APPROACHES IN LIGAND-BASED VIRTUAL SCREENING: RECENT DEVELOPMENTS OVERVIEW

Medicinal Chemist s Relationship with Additivity: Are we Taking the Fundamentals for Granted?

A graph based approach to construct target focused libraries for virtual screening

Solved and Unsolved Problems in Chemoinformatics

Computational Methods and Drug-Likeness. Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004

1.QSAR identifier 1.1.QSAR identifier (title): QSAR for Relative Binding Affinity to Estrogen Receptor 1.2.Other related models:

Author Index Volume

Core Electron Binding Energy (CEBE) Shifts Applied to Structure Activity Relationship (SAR) Analysis of Neolignans

Cheminformatics analysis and learning in a data pipelining environment

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

Universities of Leeds, Sheffield and York

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction

Transcription:

Development of a Structure Generator to Explore Target Areas on Chemical Space Kimito Funatsu Department of Chemical System Engineering, This materials will be published on Molecular Informatics

Drug Development Conditions lead compounds are required to meet Biological activity for specific target Various properties (ADME-Tox, synthetic accessibility, etc ) On the first stage of drug development, various structures with high activity are required 1

Structure Generators and SAR Structures Descriptors Electronic properties Molecular weight Number of rings etc. Structure-activity relationships (SAR) Activity Structure generators that aim to generate highly active structures are proposed. [1][2] [1] B. Pirard, Expert Opin. Drug Discov., 6, 225, 2011 [2] H. Mauser, W. Guba, Curr. Opin. Drug Discov. Devel., 11, 365, 2008 2

Chemical Space Chemical space is defined by descriptors. X 2 Active Generated Inactive X 1 It is necessary to consider the distribution of the generated structures on the chemical space 3

Objective Development of a structure generator for searching target areas in chemical space X 2 Target Active Inactive X 1 4

Visualization of chemical space High-dimensional space 2D map x 1 Dimension reduction u 2 x 3 x 2 u 1 A system for structure generation on 2D maps, de novo Design Algorithm for Exploring Chemical Space DAECS 5

u 2 Systems Target u 1 Seed structures Pooled structures 1Input of initial seeds 2Generation of new structures 3Filtering and recording 4Probabilistic selection of new seeds 6

u 2 Systems Target Seed structures Pooled structures u 1 1Input of initial seeds 2Generation of new structures 3Filtering and recording New structures 4Probabilistic selection of new seeds 6

u 2 Systems Target Seed structures Pooled structures u 1 1Input of initial seeds 2Generation of new structures 3Filtering and recording New structures 4Probabilistic selection of new seeds 6

u 2 Systems Target Seed structures Pooled structures u 1 1Input of initial seeds 2Generation of new structures 3Filtering and recording 4Probabilistic selection of new seeds 6

u 2 Systems Target Seed structures Pooled structures u 1 1Input of initial seeds 2Generation of new structures 3Filtering and recording New structures 4Probabilistic selection of new seeds 6

u 2 Systems Target Seed structures Pooled structures u 1 1Input of initial seeds 2Generation of new structures 3Filtering and recording New structures 4Probabilistic selection of new seeds 6

Case Study α2a data :GVK [1] database ligand-binding affinity for α 2A adrenergic receptor training data 300, test data 335 Descriptors : Fingerprints from PubChem [2] (460 bit) Ex. >= 4 H >= 2 saturated or aromatic nitrogen containing ring size 6 C(-C)(-H)(=N) Visualization : Generative Topographic Mapping(GTM) [3] [1] GVK Bio Databases, http://www.gvkbio.com/informatics.html [2] PubChem, http://pubchem.ncbi.nlm.nih.gov/ [3] C. M. Bishop, Markus Svensen, Neaural Computation, 10, 215, 1998. 7

Visualization of Chemical Space Activity Training data Test data Predicted activity calculated with 3-nearest neighbors method RMSE cv = 0.45 RMSE pred = 0.42 8

Structure Generation Structure Generation: 1.DAECS 2.Conventional method : selection of new seeds by predicted activity (SVR [1] ) Initial seeds: 7 structures with > 8 activity values Cycles:100 Trial:10 times Target coordinates [1] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. 9

Structure Generation Distribution of generated structures 1 DAECS 2 Conventional method 258,068 structures 311,975 structures 10

frequency frequency Diversity and Activity Distributions of similarity with seed structures and predicted activity Similarity : tanimoto coefficient Activity : 3-NN method 1 DAECS 2 Conventional method activity similarity activity similarity 11

Example of Generated Structures Similarity = 0.60 Similarity = 0.98 Seed Activity = 8.18 Similarity = 0.79 Predicted activity = 7.48 Predicted activity = 7.84 Predicted activity = 7.85 12

Conclusions A structure generator DAECS was developed. DAECS aims to generate structures in target areas on visualized chemical space. In a case study with GVK data, visualization of the chemical space and structure generation were performed. Distribution, diversity and activity of generated structures were verified. It was showed that DAECS can generate diverse structures, which are distributed in the target area on visualized chemical space. 13

APPENDIX 19

Path Seed Generated 20

Calculation of Similarity Structure A Structure B Fingerprints 0 1 0 1 0 0 0 1 0 0 1 1 1 1 0 1 0 0 1 1 0 0 1 0 1 1 1 1 0 1 0 0 t M 01 M M 11 10 M 11 Number of bits that satisfy below M 01 M 10 M 11 :A 0, B 1 :A 1, B 0 :A 1, B 1 21

Selection of Next Seeds Probabilistic selection (roulette) based on a scoring function SCORE r 2 r, d exp exp 2 2 σr σ d d 2 r : Distance from target on the map d : Distance from the map σ r : Hyper-parameter σ d : Hyper-parameter Example of Scoring function SCORE 22

Distribution of a Specified Descriptor 145 >= 1 saturated or aromatic nitrogen-containing ring size 5 1 1 0.5 0 0.5 0 True False -0.5-0.5-1 -1 23-1 The -0.5 University of 0Tokyo 0.5