A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

Similar documents
Chemical Space: Modeling Exploration & Understanding

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Introduction. OntoChem

Drug Informatics for Chemical Genomics...

An Integrated Approach to in-silico

Structural biology and drug design: An overview

Virtual screening in drug discovery

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Chemical library design

Machine Learning Concepts in Chemoinformatics

The PhilOEsophy. There are only two fundamental molecular descriptors

Development of a Structure Generator to Explore Target Areas on Chemical Space

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Practical QSAR and Library Design: Advanced tools for research teams

Virtual Screening: How Are We Doing?

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

Kinome-wide Activity Models from Diverse High-Quality Datasets

Structure-Activity Modeling - QSAR. Uwe Koch

Studying the effect of noise on Laplacian-modified Bayesian Analysis and Tanimoto Similarity

Machine learning for ligand-based virtual screening and chemogenomics!

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Clustering Ambiguity: An Overview

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Modeling Mutagenicity Status of a Diverse Set of Chemical Compounds by Envelope Methods

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Early Stages of Drug Discovery in the Pharmaceutical Industry

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

Prediction in bioinformatics applications by conformal predictors

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

KNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös

Virtual affinity fingerprints in drug discovery: The Drug Profile Matching method

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Functional Group Fingerprints CNS Chemistry Wilmington, USA

Exploring the chemical space of screening results

CS7267 MACHINE LEARNING

Using Self-Organizing maps to accelerate similarity search

Priority Setting of Endocrine Disruptors Using QSARs

Emerging patterns mining and automated detection of contrasting chemical features

Chemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013

Using AutoDock for Virtual Screening

Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies

Fast similarity searching making the virtual real. Stephen Pickett, GSK

Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt

De Novo molecular design with Deep Reinforcement Learning

An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Variable Selection in Random Forest with Application to Quantitative Structure-Activity Relationship

Xia Ning,*, Huzefa Rangwala, and George Karypis

MM-GBSA for Calculating Binding Affinity A rank-ordering study for the lead optimization of Fxa and COX-2 inhibitors

Translating Methods from Pharma to Flavours & Fragrances

Cheminformatics analysis and learning in a data pipelining environment

Data Mining in the Chemical Industry. Overview of presentation

Chemical Data Retrieval and Management

1. Some examples of coping with Molecular informatics data legacy data (accuracy)

est Drive K20 GPUs! Experience The Acceleration Run Computational Chemistry Codes on Tesla K20 GPU today

UVA CS 4501: Machine Learning

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery

Chemical Data Mining of the NCI Human Tumor Cell Line Database

COMBINATORIAL CHEMISTRY: CURRENT APPROACH

Universities of Leeds, Sheffield and York

Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification

Exploring the black box: structural and functional interpretation of QSAR models.

Similarity Search. Uwe Koch

Machine-learning scoring functions for docking

In Silico Investigation of Off-Target Effects

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Enamine Golden Fragment Library

The Changing Requirements for Informatics Systems During the Growth of a Collaborative Drug Discovery Service Company. Sally Rose BioFocus plc

CHEMINFORMATICS MODELING OF DIVERSE AND DISPARATE BIOLOGICAL DATA AND THE USE OF MODELS TO DISCOVER NOVEL BIOACTIVE MOLECULES

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

has its own advantages and drawbacks, depending on the questions facing the drug discovery.

Supplementary Material

Docking. GBCB 5874: Problem Solving in GBCB

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Large scale classification of chemical reactions from patent data

The Schrödinger KNIME extensions

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

Molecular Similarity Searching Using Inference Network

The Conformation Search Problem

Variable Selection and Weighting by Nearest Neighbor Ensembles

Classification using stochastic ensembles

Selecting Diversified Compounds to Build a Tangible Library for Biological and Biochemical Assays

Implementation of novel tools to facilitate fragment-based drug discovery by NMR:

Electrical and Computer Engineering Department University of Waterloo Canada

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

Holdout and Cross-Validation Methods Overfitting Avoidance

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Research Article. Chemical compound classification based on improved Max-Min kernel

Benchmarking of Multivariate Similarity Measures for High-Content Screening Fingerprints in Phenotypic Drug Discovery

MultiscaleMaterialsDesignUsingInformatics. S. R. Kalidindi, A. Agrawal, A. Choudhary, V. Sundararaghavan AFOSR-FA

In silico pharmacology for drug discovery

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Transcription:

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors Rajarshi Guha, Debojyoti Dutta, Ting Chen and David J. Wild School of Informatics Indiana University and Dept. Computational Biology University of Southern California 28 th March, 2007 Chicago

Background Most drugs target HIV reverse transcriptase or protease HIV integrase is vital for viral replication No drugs have been approved for HIV integrase 1BIS

Previous Approaches Pharmacophores Docking Disadvantages Requires a reliable receptor structure Computationally intensive Not necessarily diverse Nicklaus, M.C. et al., J. Med. Chem., 1997, 40, 920 929 Nair, V; Chi, G.; Neamati, N.; J. Med. Chem., 2006, 49, 445 447 Dayam, R; Sanchez, T.; Neamati, N.; J. Med. Chem., 2005, 48, 8009 8015

Goals High speed Be able to process large libraries rapidly Avoid docking till required Try and use connectivity information only Reliability Use consensus approaches for prediction Use molecular similarity Novely Try to obtain a diverse set of hits Try to obtain hits suitable for lead hopping

Datasets & Tools Training Data Curated dataset 529 actives 395 inactives Vendor Database 50,000 compounds 2D structures Software MOE R 2.2.0 MASS, randomforest, fingerprint

Overview of the Screening Protocol

Descriptor Calculations Restricted to topological descriptors Calculated 142 descriptors Removed low variance and correlated descrirptors Size of reduced pool was 45 Only considers 2D connectivity Very fast to compute

Predictive Models Linear discriminant analysis Why? Simple and may be sufficient How? Used a genetic algorithm to search for descriptor subsets Finally used a 6-descriptor model Accuracy On the whole dataset, 72% On TSET/PSET, 72% / 71% With leave-10%-out, 71%

Predictive Models Random Forest Why? No feature selection required Does not overfit May capture non-linearities How? Used 500 trees Sampled 6 features at each split Accuracy On the whole dataset, 72% On TSET/PSET, 75% 70% Breiman, L. et al., Classification and Regresion Trees, CRC Press, Boca Raton, FL. 1984 Borisov, A. et al., Intel Technology Journal, 2005, 9

Consensus Predictions We first consider compounds predicted active by both models For the LDA model we consider compounds with a posterior > P For the RF model we consider compounds which had a majority > M We then apply the similarity filter This results in two hit lists A final hit list is obtained from the intersection of the hit lists for the individual model

Similarity Filter The goal is to predict actives from the vendor database Evaluate 166 bit MACCS fingerprints The Tanimoto Similarity between two compounds is defined by S = N c N a + N b Nc For a compound predicted active Calculate average similarity to TSET actives (S 1 ) Calculate average similarity to TSET inactives (S 2 ) Select compounds where S 1 S 2 ɛ

Suggesting Good Leads Suggest further prioritization? Look for TSET actives that are not in the bulk of the dataset These compounds may be good starting points for further modifications Evaluate similarity of the hit list compounds to these isolated TSET compounds Hit list compounds most similar to the most isolated TSET active may be good leads Saeh, J.C.; Lyne, P.D.; Takasaki, B.K.; Cosgrove, D.A.; J. Chem. Inf. Comput. Sci., 2005, 45, 1122 1133. Guha, R.; Dutta, D.; Jurs, P.C; Chen, T.; J. Chem. Inf. Model., 2006, 46, 1713 1722

Time Considerations Model development Descriptor calculation is rapid and a one time event Building individual LDA or RF models is fast Time required to obtain optimal model can be large when a GA is used (partly due to interpreted code) Predictions for 50,000 compounds < 1min Similarity calculation is very time consuming Detecting spatial outliers is slow Can be improved with approximate NN algorithms Dutta, D.; Guha, R.; Jurs, P.C.; Chen, T.; J. Chem. Inf. Model., 2006, 46, 321 333

Results Parameters P, M > 0.7 ɛ 0.01 Of the 34 hits, 7 have a similarity greater than 0.5 with the most outlying TSET active We obtained 66 more hits from an ensemble of LDA models LDA - 313 hits RF - 57 hits Consensus - 34 hits Average similarity = 0.64 None were in common with pharmacophore predictions

Results Parameters P, M > 0.7 ɛ 0.01 Of the 34 hits, 7 have a similarity greater than 0.5 with the most outlying TSET active We obtained 66 more hits from an ensemble of LDA models LDA - 313 hits RF - 57 hits Consensus - 34 hits Average similarity = 0.64 None were in common with pharmacophore predictions

The Search for β diketones A recently published inhibitor exhibited a β diketone moiety Searched the vendor DB for structures with this feature and predicted them LDA model : 244 actives (posterior > 0.8) RF model : 4 actives (score > 0.85) The intersection set contained 4 compounds We missed these hits due to our similarity constraints being too strict R [C,N] R O O Nair, V.; Chi, G.; Ptak, R.; Neamati, N.; J. Med. Chem., 2006, 45, 445 447

Similarity to the Published Compound Published Predicted

Future Work Investigate similarity to known inhibitors in terms of pharmacophore similarity Dock our best hits (may not be conclusive) Investigate the distribution of vendor compounds in descriptor space Cluster the vendor database and predict representative members of clusters Perform assays

Conclusions A consensus approach to predictive models allows us to increase confidence in activity predictions Prioritization based on similarity is intuitive, but may not work well with homogenous datasets The protocol appears to have identified possible leads which await confirmation

Acknowledgements Prof. Nouri Neamati Dr. Raveendra Dayam