De Novo molecular design with Deep Reinforcement Learning

Similar documents
Deep Reinforcement Learning for de-novo Drug Design

In silico generation of novel, drug-like chemical matter using the LSTM deep neural network

In silico generation of novel, drug-like chemical matter using the LSTM neural network

Deep Learning: What Makes Neural Networks Great Again?

Deep Generative Models for Graph Generation. Jian Tang HEC Montreal CIFAR AI Chair, Mila

Virtual screening in drug discovery

Introduction. OntoChem

Development of a Structure Generator to Explore Target Areas on Chemical Space

Machine learning for ligand-based virtual screening and chemogenomics!

Molecular Dynamics Graphical Visualization 3-D QSAR Pharmacophore QSAR, COMBINE, Scoring Functions, Homology Modeling,..

Structure-Activity Modeling - QSAR. Uwe Koch

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Multi objective de novo drug design with conditional graph generative model

Introduction to Chemoinformatics and Drug Discovery

PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit

Chemical Data Retrieval and Management

Introducing a Bioinformatics Similarity Search Solution

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

Xia Ning,*, Huzefa Rangwala, and George Karypis

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Machine Learning Concepts in Chemoinformatics

arxiv: v1 [physics.chem-ph] 6 Apr 2018

Hit Finding and Optimization Using BLAZE & FORGE

Development and application of ligand-based computational methods for de-novo drug. design and virtual screening. Alexander Richard Geanes.

Progress of Compound Library Design Using In-silico Approach for Collaborative Drug Discovery

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.

Capturing Chemistry. What you see is what you get In the world of mechanism and chemical transformations

An Integrated Approach to in-silico

Chemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013

Raed Saeed Khashan. Chapel Hill Approved by: Dr. Alexander Tropsha. Dr. Weifan Zheng. Dr. Alexander Golbraikh. Dr. Wei Wang. Dr.

The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration

CHEMINFORMATICS MODELING OF DIVERSE AND DISPARATE BIOLOGICAL DATA AND THE USE OF MODELS TO DISCOVER NOVEL BIOACTIVE MOLECULES

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery

Cheminformatics analysis and learning in a data pipelining environment

Few-shot learning with KRR

How IJC is Adding Value to a Molecular Design Business

Integrated Cheminformatics to Guide Drug Discovery

CSC321 Lecture 20: Reversible and Autoregressive Models

Using AutoDock for Virtual Screening

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS

Similarity methods for ligandbased virtual screening

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Pose and affinity prediction by ICM in D3R GC3. Max Totrov Molsoft

Supporting Information

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Patent Searching using Bayesian Statistics

Memory-Augmented Attention Model for Scene Text Recognition

Studying the effect of noise on Laplacian-modified Bayesian Analysis and Tanimoto Similarity

AMRI COMPOUND LIBRARY CONSORTIUM: A NOVEL WAY TO FILL YOUR DRUG PIPELINE

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

The Molecule Cloud - compact visualization of large collections of molecules

Deep Learning for Gravitational Wave Analysis Results with LIGO Data

Artificial Intelligence Methods for Molecular Property Prediction. Evan N. Feinberg Pande Lab

Structure based drug design and LIE models for GPCRs

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

DATA FUSION APPROACHES IN LIGAND-BASED VIRTUAL SCREENING: RECENT DEVELOPMENTS OVERVIEW

Similarity Search. Uwe Koch

Increasing the Coverage of Medicinal Chemistry-Relevant Space in Commercial Fragments Screening

A reliable computational workflow for the selection of optimal screening libraries

Juergen Gall. Analyzing Human Behavior in Video Sequences

Introduction to Chemoinformatics

Deep Learning Autoencoder Models

Targeting protein-protein interactions: A hot topic in drug discovery

Topology based deep learning for biomolecular data

Prototype-Based Compound Discovery using. Deep Generative Models

est Drive K20 GPUs! Experience The Acceleration Run Computational Chemistry Codes on Tesla K20 GPU today

Structural biology and drug design: An overview

JCICS Major Research Areas

Docking. GBCB 5874: Problem Solving in GBCB

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

LIBRARY DESIGN FOR COLLABORATIVE DRUG DISCOVERY: EXPANDING DRUGGABLE CHEMOGENOMIC SPACE

PROVIDING CHEMINFORMATICS SOLUTIONS TO SUPPORT DRUG DISCOVERY DECISIONS

Introduction to Machine Learning Midterm Exam

Practical QSAR and Library Design: Advanced tools for research teams

Translating Methods from Pharma to Flavours & Fragrances

DivCalc: A Utility for Diversity Analysis and Compound Sampling

Exploring the chemical space of screening results

Design and Synthesis of the Comprehensive Fragment Library

What I Did Last Summer

Deep learning on 3D geometries. Hope Yao Design Informatics Lab Department of Mechanical and Aerospace Engineering

Tutorials on Library Design E. Lounkine and J. Bajorath (University of Bonn) C. Muller and A. Varnek (University of Strasbourg)

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Unsupervised Learning

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

CHAPTER 6 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) ANALYSIS

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Interactive Feature Selection with

Data Mining. CS57300 Purdue University. March 22, 2018

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

BioSolveIT. A Combinatorial Approach for Handling of Protonation and Tautomer Ambiguities in Docking Experiments

Transcription:

De Novo molecular design with Deep Reinforcement Learning @olexandr Olexandr Isayev, Ph.D. University of North Carolina at Chapel Hill olexandr@unc.edu http://olexandrisayev.com

About me Ph.D. in Chemistry (computational) Minor in CS/ML Worked in Federal research lab on HPC & GPU computing to solve chemical problems Now I am faculty at the University of North Carolina, Chapel Hill We use ML & AI to solve challenging problems in chemistry http://olexandrisayev.com Twitter: @olexandr olexandr@unc.edu

A public-private partnership that supports the discovery of new medicines through open access research www.thesgc.org

The Long and Winding Road to Drug Discovery Data Science approaches useful across the pipeline, but very different techniques aim for success, but if not: fail early, fail cheap

internal rate of return (IRR) Source: Endpoints News https://endpts.com

Drowning in Data but starving for Knowledge

The growing appreciation of molecular modeling and informatics 7

Behold the rise of the machines

Summary of recent AI-based studies on chemical library design Molecular representations Generative models Method of biasing generated compounds Fingerprints SMILES Graphs Autoencoders Generative adversarial models (GANs) Recurrent neural networks (RNNs) Convolutional neural networks (CNNs) None Latent space optimization Fine-tuning on small subset of molecules with the desired property Reinforcement Learning

De Novo molecular design with Deep Reinforcement Learning General Approach Application to Molecular design Tm; LogP; pic50; etc Predictive Deep Network Molecules Generative Deep Network Patent pending arxiv:1711.10907

Drug discovery pipeline CHEMICAL STRUCTURES CHEMICAL DESCRIPTORS PREDICTIVE QSAR MODELS PROPERTY/ ACTIVITY CHEMICAL DATABASE QSAR MAGIC VIRTUAL SCREENING HITS (confirmed actives) ~10 6 10 9 molecules INACTIVES (confirmed inactives)

Design of the ReLeaSE* method Challenges: Generate chemically feasible SMILES Develop SMILESbased QSAR model Employ Predictive ML model to bias library generation *Popova, Mariya, Olexandr Isayev, and Alexander Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Language of SMILEs

Generative model 1.5M molecules from ChEMBL <START>c1ccc(O)cc1<END> c1cc) ( F) cc1<end> c1ccc(o)cc1 NO <START> c1ccc( O) cc1 + loss YES Did the training converge? Softmax loss

Reinforcement learning for chemical design Generative model Predictive model FC(F)COc1ccc2c(Nc3ccc(Cl)c(Cl)c3)ncnc2c1 M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model Predictive model FC(F)COc1ccc2c(Nc3ccc(Cl)c(Cl)c3)ncnc2c1 M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model INACTIVE! Predictive model M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model INACTIVE! Predictive model M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model INACTIVE! Predictive model M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model Predictive model M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model Predictive model Fc1ccc2c(Nc3ccc(F)c(F)c3)ncnc2c1 M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model Predictive model Fc1ccc2c(Nc3ccc(F)c(F)c3)ncnc2c1 M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model ACTIVE! Predictive model M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model ACTIVE! Predictive model M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model ACTIVE! Predictive model M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model Predictive model M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Technical details Models were trained on Nvidia Titan X and Titan V GPUs Training generative model on ChEMBL took ~ 25 days Training predictive models took ~ 2 hours Biasing generative model with reinforcement learning for one property ~ 1 day Generative model produces 1000s compounds per minute

Results: Biasing target properties in the designed libraries Optimized Baseline * -2 0 2 4 6 8 10 12-200 -100 0 100 200 300 JAK2 Inhibition (pic50) Melting temperature (T m ), o C 400 Partition coefficient (logp) M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

JAK2 (Kinase) inhibition Train data distribution Maximized property distribution Minimized property distribution NEW CHEMOTYPE CAS 236-084-2 (buffer reagent) ZINC37859566 SIMILAR SCAFFOLDS New molecule arxiv:1711.10907

Results: analysis of similarity Distribution of Tanimoto similarity to the nearest neighbor in training dataset for compounds predicted to be active for EGFR by consensus of QSAR models: Similarity= 0.69 Similarity= 0.57 Similarity = 0.86 0.5 0.6 0.7 0.8 0.9 Tanimoto similarity 1.0

Results: Synthetic accessibility score* of the designed libraries *Ertl, Peter, and Ansgar Schuffenhauer. "Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions." Journal of cheminformatics 1.1 (2009): 8.

Target predictions for generated compounds using SEA* *Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotech 25 (2), 197-206 (2007).

Target predictions for generated compounds using SEA* *Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotech 25 (2), 197-206 (2007).

Model visualization for JAK2 (projection using t-sne) ZINC19982368 pic50 = 8.64 ZINC66347860 pic50 = 3.31 ZINC469992 pic50 = 8.23 ZINC3549031 pic50 = 3.76 ZINC2876515 pic50 = 8.39 pic50 = 10.37 pic50 = 0.63 10 9 8 7 6 5 4 3 2 1 M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Examples of Stack-RNN cells with interpretable gate activations M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Summary AI methods coupled with SMILES representation afford biased libraries generation The system naturally embeds reinforcement to produce novel structure with the desired property The system can be tuned to bias libraries towards specific property ranges Next phase is experimental validation of hits by UNC SGC team