bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012

Similar documents
Comparative Analysis of Machine Learning Techniques for the Prediction of LogP

GPU-Accelerated Machine Learning Techniques Enable QSAR Modeling Of Large HTS Data

Comparative Analysis of Machine Learning Techniques for the Prediction of the DMPK Parameters Intrinsic Clearance and Plasma Protein Binding

Introduction to Chemoinformatics and Drug Discovery

Bcl::ChemInfo - Qualitative Analysis Of Machine Learning Models For Activation Of HSD Involved In Alzheimer s Disease

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

MACHINE LEARNING ALGORITHMS FOR PREDICTION OF BIOLOGICAL ACTIVITY AND CHEMICAL PROPERTIES. Ralf Mueller. Dissertation. Submitted to the Faculty of the

In silico pharmacology for drug discovery

Structure-Activity Modeling - QSAR. Uwe Koch

Machine Learning Concepts in Chemoinformatics

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

De Novo molecular design with Deep Reinforcement Learning

Structural biology and drug design: An overview

Development and application of ligand-based computational methods for de-novo drug. design and virtual screening. Alexander Richard Geanes.

Drug Informatics for Chemical Genomics...

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.

Virtual screening in drug discovery

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Plan. Day 2: Exercise on MHC molecules.

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Using AutoDock for Virtual Screening

Advanced Medicinal Chemistry SLIDES B

Early Stages of Drug Discovery in the Pharmaceutical Industry

Introduction. OntoChem

Chemical library design

Cheminformatics analysis and learning in a data pipelining environment

Receptor Based Drug Design (1)

Iterative experimental and virtual high-throughput screening identifies metabotropic glutamate receptor subtype 4 positive allosteric modulators

Machine learning for ligand-based virtual screening and chemogenomics!

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

est Drive K20 GPUs! Experience The Acceleration Run Computational Chemistry Codes on Tesla K20 GPU today

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Progress of Compound Library Design Using In-silico Approach for Collaborative Drug Discovery

Similarity methods for ligandbased virtual screening

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

Topology based deep learning for biomolecular data

Data Quality Issues That Can Impact Drug Discovery

has its own advantages and drawbacks, depending on the questions facing the drug discovery.

Bridging the Dimensions:

Introduction to Chemoinformatics

Data Mining in the Chemical Industry. Overview of presentation

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit

CHEMINFORMATICS MODELING OF DIVERSE AND DISPARATE BIOLOGICAL DATA AND THE USE OF MODELS TO DISCOVER NOVEL BIOACTIVE MOLECULES

Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies

BUDE. A General Purpose Molecular Docking Program Using OpenCL. Richard B Sessions

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression


Hit Finding and Optimization Using BLAZE & FORGE

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Applications of multi-class machine

QSAR in Green Chemistry

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

TRAINING REAXYS MEDICINAL CHEMISTRY

Biologically Relevant Molecular Comparisons. Mark Mackey

Kernel-based Machine Learning for Virtual Screening

Kinome-wide Activity Models from Diverse High-Quality Datasets

Development of QSAR Models for Identification of CYP3A4 Substrates and Inhibitors

FRAGMENT SCREENING IN LEAD DISCOVERY BY WEAK AFFINITY CHROMATOGRAPHY (WAC )

Dispensing Processes Profoundly Impact Biological, Computational and Statistical Analyses

Principles of Drug Design

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

Exploring the black box: structural and functional interpretation of QSAR models.

Structure-based maximal affinity model predicts small-molecule druggability

Computational Biology 1

FRAUNHOFER IME SCREENINGPORT

Xia Ning,*, Huzefa Rangwala, and George Karypis

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Computational Methods and Drug-Likeness. Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004

Computational chemical biology to address non-traditional drug targets. John Karanicolas

An Integrated Approach to in-silico

Virtual Screening: How Are We Doing?

LIBRARY DESIGN FOR COLLABORATIVE DRUG DISCOVERY: EXPANDING DRUGGABLE CHEMOGENOMIC SPACE

Important Aspects of Fragment Screening Collection Design

Structural interpretation of QSAR models a universal approach

A COMPARATIVE STUDY OF MACHINE-LEARNING-BASED SCORING FUNCTIONS IN PREDICTING PROTEIN-LIGAND BINDING AFFINITY. Hossam Mohamed Farg Ashtawy A THESIS

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Development of a Structure Generator to Explore Target Areas on Chemical Space

Classification of Highly Unbalanced CYP450 Data of Drugs Using Cost Sensitive Machine Learning Techniques

ESPRESSO (Extremely Speedy PRE-Screening method with Segmented compounds) 1

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

György M. Keserű H2020 FRAGNET Network Hungarian Academy of Sciences

Enamine Golden Fragment Library

Solved and Unsolved Problems in Chemoinformatics

Similarity Search. Uwe Koch

The reuse of structural data for fragment binding site prediction

Translating Methods from Pharma to Flavours & Fragrances

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

1. Some examples of coping with Molecular informatics data legacy data (accuracy)

MM-PBSA Validation Study. Trent E. Balius Department of Applied Mathematics and Statistics AMS

Protein structure based approaches to inhibit Plasmodium DHODH for malaria

Interactive Feature Selection with

Quantitative structure activity relationship and drug design: A Review

Chemical Space: Modeling Exploration & Understanding

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

COMPUTER AIDED DRUG DESIGN (CADD) AND DEVELOPMENT METHODS

A mapping based on physico-chemical features: lessons learned

A reliable computational workflow for the selection of optimal screening libraries

Transcription:

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012

Outline Machine Learning Cheminformatics Framework QSPR logp QSAR mglur 5 CYP 3A4 Malaria KRas

Outline Machine Learning Cheminformatics Framework QSPR logp QSAR mglur 5 CYP 3A4 Malaria KRas

BCL BioChemistry Library C++ library for small molecule and protein modeling Machine learning techniques OpenCL GPU-Acceleration

bcl::cheminfo Goal MYSQL GPU HPC Automation

bcl::cheminfo Goal MYSQL GPU HPC Automation

Machine Learning Calculates Properties from Numerical Description Chemical Structure a) b) c) d) e) f) I(s) 1.0 0.8 0.6 0.4 0.2 0.0-0.2-0.4-0.6-0.8 s -1.0 0 2 4 6 8 10 12 14 Predicted Value 7

Encoding Chemical Data Scalar Descriptors 2D/3D Autocorrelation Weight H-Bond donor H-Bond acceptor, Topological polar surface area (TPSA) Radial Distribution Function vdwaals Surface Area 60 descriptor groups 1284 numerical descriptor values

0 0.7 1.4 2.1 2.8 3.5 4.2 4.9 5.6 6.3 7 7.7 8.4 9.1 9.8 10.5 11.2 11.9 12.6 RDF identity Radial Distribution Functions Describe 3D Shape 30 5.19 Å 3.26 Å 25 20 15 10 5.79 Å 5 0 d / Å ǁ where: d ij distance between two atoms B temperature factor, here 100

0 0.7 1.4 2.1 2.8 3.5 4.2 4.9 5.6 6.3 7 7.7 8.4 9.1 9.8 10.5 11.2 11.9 12.6 RDF partial charge but can also Encode Chemical Properties such as Partial Charge 25 5.19 Å 3.26 Å 20 15 10 5.79 Å 5 0 d / Å ǁ where: d ij distance between two atoms A i, A j atom properties, here lone pair electro negativity B temperature factor, here 100

Machine Learning Calculates Properties from Numerical Description Chemical Structure a) b) c) d) e) f) I(s) 1.0 0.8 0.6 0.4 0.2 0.0-0.2-0.4-0.6-0.8 s -1.0 0 2 4 6 8 10 12 14 Predicted Value 11

Protocol For Model Training 10% independent 10% monitoring 80% training Feature forward descriptor selection 5-fold cross-validated models consensus prediction

Forward Feature Selection cv * n n+1 2 = 9150

GPU Performance Data Set ID Actives Inactives 884 3,438 7,066 893 5,398 65,259 1445 883 206,897 ML Method 884 893 1445 ANN 109/1 (109) 1151/10 (115) 3660/32 (114) SVM 14/0.4 (35) 145/5 (29) 441/14 (32) KNN 7/0.4 (18) 714/25 (29) 6118/90 (68)

GPU Performance Data Set ID Actives Inactives 884 3,438 7,066 893 5,398 65,259 1445 883 206,897 Similarity Measure 884 893 1445 Tanimoto 53/0.2 (265) 147/0.55 (267) 3.4/0.02 (170) Cosine 47/0.2 (235) 150/0.53 (283) 3.5/0.02 (175) Dice 52/0.2 (260) 145/0.54 (269) 3.9/0.02 (195) Euclidean 27/0.2 (138) 95/0.51 (186) 2.3/0.01 (230) Manhattan 20/0.2 (100) 56/0.52 (108) 1.6/0.01 (160)

bcl::cheminfo Suite Molecule -> feature vectors (descriptors) Feature Selection (FFS, BFS, ISA, PCA*; PBS) Diverse objective functions ANN*, SVR*, knn*, Kohonen, DT MYSQL Model analysis Virtual Screening Similarity Analysis* Note: * = GPU-accelerated Lowe Jr, E.W., et al. GPU-Accelerated Machine Learning Techniques Enable QSAR Modeling of Large HTS Data. in Symposium on Computational Intelligence in Bioinformatics and Computational Biology. 2012. San Diego, CA: IEEE.

Outline Machine Learning Cheminformatics Framework QSPR logp QSAR mglur 5 CYP 3A4 Malaria KRas

logp Prediction Metric of hydrophobicity Important for molecule fate Logarithm of the octanol-water partition coefficient 22,500 compounds from MDDR, reaxys, SciFinder

SVM predicted LogP values XlogP predicted LogP values KNN predicted LogP values ANN predicted LogP Values logp Prediction 8 7 6 5 4 3 2 1 0-1 -2-3 -4-5 8 7 6 5 4 3 2 1 0-1 -2-3 -4-5 -5-4 -3-2 -1 0 1 2 3 4 5 6 7 8 experimental LogP values -5-4 -3-2 -1 0 1 2 3 4 5 6 7 8 experimental LogP values knn SVM ANN XLogP 8 7 6 5 4 3 2 1 0-1 -2-3 -4-5 8 7 6 5 4 3 2 1 0-1 -2-3 -4-5 -5-4 -3-2 -1 0 1 2 3 4 5 6 7 8 experimental LogP values -5-4 -3-2 -1 0 1 2 3 4 5 6 7 8 Experimental LogP values

Consensus prediction of ANN, SVM, and k-nn ANN+SVM+KNN 8 7 6 5 4 3 2 1 0-1 -2-3 -4-5 -5-4 -3-2 -1 0 1 2 3 4 5 6 7 8 experimental LogP values Lowe, E.W., Jr., et al., Comparative Analysis of Machine Learning Techniques for the Prediction of LogP, in SSCI 2011 CIBCB - 2011 Symposium on Computational Intelligence in Bioinformatics and Computational Biology2011: Paris, France

Outline Machine Learning Cheminformatics Framework QSPR logp QSAR mglur 5 CYP 3A4 Malaria KRas

High-Throughput Screen yields 1387 PAMs and 345 NAMs of mglur 5 150,000 compounds were tested for allosteric modulation of mglur 5 1,387 (0.94%) compounds were verified as PAMs of mglur 5 345 (0.23%) compounds were verified as NAMs of mglur 5. Niswender, C. M.; Johnson, K. A.; Luo, Q.; Ayala, J. E.; Kim, C.; Conn, P. J.; Weaver, C. D. Mol Pharmacol 2008, 73, 1213-24.

True Positives (%) Virtual Screen for Highly Active Compounds and Novel Leads vhts Training Optimization (ROC curves) A) True positive B) False negative C) False positive D) True negative False Positives (%) Enrichment of Active Compounds by 43x Enrichment = TP P TP + FP P + N

Experimental Results mglur 5 Positive Allosteric Modulators ~450,000 ChemBridge 824 Compounds predicted with EC 50 < 1μM by QSAR model 232 Compounds (28.1%) were confirmed as mglur 5 PAMs Enrichment = 28.1% / 0.96% = 30 Mueller, R., et al., Identification of Metabotropic Glutamate Receptor Subtype 5 Potentiators Using Virtual High-Throughput Screening. ACS Chemical Neuroscience, 2010. 1(4): p. 288-305.

Experimental Results mglur 5 Negative Allosteric Modulators ~750,000 ChemBridge 749 Compounds with novel Scaffolds predicted with EC 50 < 10μM by QSAR model 12 Compounds (3.6%) were confirmed as mglur 5 NAMs Enrichment = 3.6% / 0.23% = 16 VU0240790-4 EC 50 = 75 nm HET HET VU0360620-1 EC 50 = 124 nm HET HET Ar CN Ar COOEt Mueller, R., et al., Discovery of 2-(2-Benzoxazoyl amino)-4-aryl-5-cyanopyrimidine as Negative Allosteric Modulators (NAMs) of Metabotropic Glutamate Receptor 5 (mglu5): From an Artificial Neural Network Virtual Screen to an In Vivo Tool Compound. ChemMedChem, 2012. 7(3): p. 406-414.

Outline Machine Learning Cheminformatics Framework QSPR logp QSAR mglur 5 CYP 3A4 Malaria KRas

CYP 3A4 Metabolism of xenobiotics Oxidizes largest range of substrates of all CYPs Present in largest quantity in liver Involved in metabolism of ½ of the drugs used today Activates many toxins 3,438 actives 7,066 inactive

CYP3A4 Model Performance Method Average Enrichment Number Features ANN 2.78 298 SVM 2.67 392 KNN 2.78 73 Kohonen 2.71 94 DT 1.43 332 ANN/KNN/ Kohonen 2.89 * Enrichment = TP P TP + FP P + N

Outline Machine Learning Cheminformatics Framework QSPR logp QSAR mglur 5 CYP 3A4 Malaria KRas

Malaria parasitic disease high fevers, flu-like symptoms, anemia 250 million cases of fever and ~1 million deaths annually Malaria risk Malaria free Parasite digests hemoglobin free heme toxic to host cells parasite crystallizes heme to hemozoin hemozoin crystallization target of Malaria therapeutics

Malaria Model Optimization Workflow ~134,000 compounds screened for inhibition of hemozoin crystallization 1,314 inhibitors were found Train consensus QSAR model 134K compounds 1314 Hits Acquire predicted hits (vendor) Virtually screen GSK library

Malaria Model Performance Quality Measures: Integral under ROC Curve RMSD 0.85 0.29 Enrichments for different cutoffs: Cutoff (False Positive Rate) Enrichment top 1% 33.2 top 2% 27.1 top 5% 19.0

Outline Machine Learning Cheminformatics Framework QSPR logp QSAR mglur 5 CYP 3A4 Malaria KRas

KRas GTPase Indicated in Cancer Leukemia Colon Pancreatic Lung

KRas NMR Fragment Screen ~10k fragments screened hits with K d s = 242 Virtual screen of PubChem and Chembridge (~40m) Rank-list top 2500

Acknowledgments Nils Woetzel Mariusz Butkiewicz Ralf Mueller Matthew Spellings Albert Omlor Zollie White Jens Meiler Collaborators Conn Wright Fesik www.meilerlab.org Funding NIH 5T90DA022873-02 (Integrative Training in Therapeutic Discovery; PI: Marnett) NIH 1R21MH082254 and 1R01MH090192 (NIMH; PI: Meiler) NSF OCI-1122919 (Transformative Computational Science using CyberInfrastructure; PI: Lowe)