De Novo molecular design with Deep Reinforcement Learning

De Novo molecular design with Deep Reinforcement Learning @olexandr Olexandr Isayev, Ph.D. University of North Carolina at Chapel Hill olexandr@unc.edu http://olexandrisayev.com

About me Ph.D. in Chemistry (computational) Minor in CS/ML Worked in Federal research lab on HPC & GPU computing to solve chemical problems Now I am faculty at the University of North Carolina, Chapel Hill We use ML & AI to solve challenging problems in chemistry http://olexandrisayev.com Twitter: @olexandr olexandr@unc.edu

A public-private partnership that supports the discovery of new medicines through open access research www.thesgc.org

The Long and Winding Road to Drug Discovery Data Science approaches useful across the pipeline, but very different techniques aim for success, but if not: fail early, fail cheap

internal rate of return (IRR) Source: Endpoints News https://endpts.com

Drowning in Data but starving for Knowledge

The growing appreciation of molecular modeling and informatics 7

Behold the rise of the machines

Summary of recent AI-based studies on chemical library design Molecular representations Generative models Method of biasing generated compounds Fingerprints SMILES Graphs Autoencoders Generative adversarial models (GANs) Recurrent neural networks (RNNs) Convolutional neural networks (CNNs) None Latent space optimization Fine-tuning on small subset of molecules with the desired property Reinforcement Learning

De Novo molecular design with Deep Reinforcement Learning General Approach Application to Molecular design Tm; LogP; pic50; etc Predictive Deep Network Molecules Generative Deep Network Patent pending arxiv:1711.10907

Drug discovery pipeline CHEMICAL STRUCTURES CHEMICAL DESCRIPTORS PREDICTIVE QSAR MODELS PROPERTY/ ACTIVITY CHEMICAL DATABASE QSAR MAGIC VIRTUAL SCREENING HITS (confirmed actives) ~10 6 10 9 molecules INACTIVES (confirmed inactives)

Design of the ReLeaSE* method Challenges: Generate chemically feasible SMILES Develop SMILESbased QSAR model Employ Predictive ML model to bias library generation *Popova, Mariya, Olexandr Isayev, and Alexander Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Language of SMILEs

Generative model 1.5M molecules from ChEMBL <START>c1ccc(O)cc1<END> c1cc) ( F) cc1<end> c1ccc(o)cc1 NO <START> c1ccc( O) cc1 + loss YES Did the training converge? Softmax loss

Reinforcement learning for chemical design Generative model Predictive model FC(F)COc1ccc2c(Nc3ccc(Cl)c(Cl)c3)ncnc2c1 M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model INACTIVE! Predictive model M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model Predictive model M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model Predictive model Fc1ccc2c(Nc3ccc(F)c(F)c3)ncnc2c1 M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Reinforcement learning for chemical design Generative model ACTIVE! Predictive model M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Technical details Models were trained on Nvidia Titan X and Titan V GPUs Training generative model on ChEMBL took ~ 25 days Training predictive models took ~ 2 hours Biasing generative model with reinforcement learning for one property ~ 1 day Generative model produces 1000s compounds per minute

Results: Biasing target properties in the designed libraries Optimized Baseline * -2 0 2 4 6 8 10 12-200 -100 0 100 200 300 JAK2 Inhibition (pic50) Melting temperature (T m ), o C 400 Partition coefficient (logp) M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

JAK2 (Kinase) inhibition Train data distribution Maximized property distribution Minimized property distribution NEW CHEMOTYPE CAS 236-084-2 (buffer reagent) ZINC37859566 SIMILAR SCAFFOLDS New molecule arxiv:1711.10907

Results: analysis of similarity Distribution of Tanimoto similarity to the nearest neighbor in training dataset for compounds predicted to be active for EGFR by consensus of QSAR models: Similarity= 0.69 Similarity= 0.57 Similarity = 0.86 0.5 0.6 0.7 0.8 0.9 Tanimoto similarity 1.0

Results: Synthetic accessibility score* of the designed libraries *Ertl, Peter, and Ansgar Schuffenhauer. "Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions." Journal of cheminformatics 1.1 (2009): 8.

Target predictions for generated compounds using SEA* *Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK. Relating protein pharmacology by ligand chemistry. Nat Biotech 25 (2), 197-206 (2007).

Model visualization for JAK2 (projection using t-sne) ZINC19982368 pic50 = 8.64 ZINC66347860 pic50 = 3.31 ZINC469992 pic50 = 8.23 ZINC3549031 pic50 = 3.76 ZINC2876515 pic50 = 8.39 pic50 = 10.37 pic50 = 0.63 10 9 8 7 6 5 4 3 2 1 M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Examples of Stack-RNN cells with interpretable gate activations M. Popova, O. Isayev, A. Tropsha. "Deep reinforcement learning for de-novo drug design." arxiv preprint arxiv:1711.10907 (2017).

Summary AI methods coupled with SMILES representation afford biased libraries generation The system naturally embeds reinforcement to produce novel structure with the desired property The system can be tuned to bias libraries towards specific property ranges Next phase is experimental validation of hits by UNC SGC team