Kinome-wide Activity Models from Diverse High-Quality Datasets

Similar documents
Structure-Activity Modeling - QSAR. Uwe Koch

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Studying the effect of noise on Laplacian-modified Bayesian Analysis and Tanimoto Similarity

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

Data Quality Issues That Can Impact Drug Discovery

Capturing Chemistry. What you see is what you get In the world of mechanism and chemical transformations

Kinase SAR Knowledgebase Hot Targets

Virtual affinity fingerprints in drug discovery: The Drug Profile Matching method

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

Supplementary information

Dispensing Processes Profoundly Impact Biological, Computational and Statistical Analyses

Introducing a Bioinformatics Similarity Search Solution

An Integrated Approach to in-silico

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Chemical Data Retrieval and Management

Introduction. OntoChem

Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt

Machine Learning Concepts in Chemoinformatics

Cheminformatics analysis and learning in a data pipelining environment

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Virtual screening in drug discovery

Use of data mining and chemoinformatics in the identification and optimization of high-throughput screening hits for NTDs

CH MEDICINAL CHEMISTRY

has its own advantages and drawbacks, depending on the questions facing the drug discovery.

Notes of Dr. Anil Mishra at 1

1. Some examples of coping with Molecular informatics data legacy data (accuracy)

Using AutoDock for Virtual Screening

The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction

De Novo molecular design with Deep Reinforcement Learning

Chemical Space: Modeling Exploration & Understanding

Design and Synthesis of the Comprehensive Fragment Library

Characterization of Reversible Kinase Inhibitors using Microfluidic Mobility-Shift Assays

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Practical QSAR and Library Design: Advanced tools for research teams

FRAUNHOFER IME SCREENINGPORT

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data. Han Liu

Mechanistic insight into inhibition of two-component system signaling

The PhilOEsophy. There are only two fundamental molecular descriptors

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

FRAGMENT SCREENING IN LEAD DISCOVERY BY WEAK AFFINITY CHROMATOGRAPHY (WAC )

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

Xia Ning,*, Huzefa Rangwala, and George Karypis

Hit Finding and Optimization Using BLAZE & FORGE

Reaxys Pipeline Pilot Components Installation and User Guide

In silico pharmacology for drug discovery

LIBRARY DESIGN FOR COLLABORATIVE DRUG DISCOVERY: EXPANDING DRUGGABLE CHEMOGENOMIC SPACE

In Silico Investigation of Off-Target Effects

Holdout and Cross-Validation Methods Overfitting Avoidance

Structure based drug design and LIE models for GPCRs

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

DivCalc: A Utility for Diversity Analysis and Compound Sampling

Development of a Structure Generator to Explore Target Areas on Chemical Space

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012

AMRI COMPOUND LIBRARY CONSORTIUM: A NOVEL WAY TO FILL YOUR DRUG PIPELINE

Exploring the black box: structural and functional interpretation of QSAR models.

MM-PBSA Validation Study. Trent E. Balius Department of Applied Mathematics and Statistics AMS

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Quantitative Structure-Activity Relationship (QSAR) computational-drug-design.html

Structural interpretation of QSAR models a universal approach

Supporting Information

Assessing and Improving the Reliability of In Silico Property Predictions by Incorporating Inhouse. Pranas Japertas

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Computational Chemical Biology Research Group. Connecting Compound Structures to Biological Effects by Gene-Expression Profiling

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Reaxys Medicinal Chemistry Fact Sheet

Implementation of novel tools to facilitate fragment-based drug discovery by NMR:

Prediction in bioinformatics applications by conformal predictors

Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R

Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

est Drive K20 GPUs! Experience The Acceleration Run Computational Chemistry Codes on Tesla K20 GPU today

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Similarity Search. Uwe Koch

Analyzing Building Blocks Diversity for DNA Encoded Library Design. Cresset User Group Meeting Nik Stiefl & Finton Sirockin, Novartis

Overview. Descriptors. Definition. Descriptors. Overview 2D-QSAR. Number Vector Function. Physicochemical property (log P) Atom

Merck Virtual Library (MVL): Deployment, Application, and Future Enhancement

Data Mining in the Chemical Industry. Overview of presentation

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring

Large Scale FEP on Congeneric Ligand Series Have Practical Free Energy Calculations arrived at Last?

Homology and Information Gathering and Domain Annotation for Proteins

Research Article. Chemical compound classification based on improved Max-Min kernel

Genetic Mutations. Jing Lu

Outline. Terminologies and Ontologies. Communication and Computation. Communication. Outline. Terminologies and Vocabularies.

György M. Keserű H2020 FRAGNET Network Hungarian Academy of Sciences

ChemBioNet: Chemical Biology supported by Networks of Chemists and Biologists. Affinity Proteomics Meeting Alpbach. Michael Lisurek 15.3.

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

CS-E3210 Machine Learning: Basic Principles

HIGH-THROUGHPUT DETERMINATION OF AQUEOUS SOLUBILITY, PRECIPITATION OR COLLOIDAL AGGREGATE FORMATION IN SMALL MOLECULE SCREENING SETS

Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology

Mining Classification Knowledge

STA 414/2104, Spring 2014, Practice Problem Set #1

Similar compounds versus similar conformers: complementarity between PubChem 2 D and 3 D neighboring sets

CHEMISTRY (CHE) CHE 104 General Descriptive Chemistry II 3

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.

Transcription:

Kinome-wide Activity Models from Diverse High-Quality Datasets Stephan C. Schürer*,1 and Steven M. Muskal 2 1 Department of Molecular and Cellular Pharmacology, Miller School of Medicine and Center for Computational Science, University of Miami, Miami, FL 33136, USA. 2 Eidogen-Sertanty, Inc. 3460 Marron Rd #103-475, Oceanside, CA 92056, USA. Steven Muskal Eidogen-Sertanty, Inc smuskal@eidogen-sertanty.com

FDA Approved Protein Kinase Inhibitors Reference: dx.doi.org/10.1021/jm3003203 J. Med. Chem. 2012, 55, 6243 6262

Outline Kinase Data (KKB) Regression Models Conclusions Naïve Bayesian Classifier Models Conclusions

Kinase Knowledgebase (Q2 2012) - Hot Targets

Kinase Knowledgebase (KKB) - Q2 2012

Kinase Summary Statistics - Q2 2012

Kinase Data Used in this Study - Q4 2009 [Q2 2012] Biological Activity Data Points: > 437,000 [> 649K] Unique kinase molecules w/assay data: >162,000 [> 241K] Unique kinase molecules patents/articles: > 507,000 [> 586K] Number of unique kinase targets with assay data: 394 [480] Number of annotated assay protocols: 20,593 [25,472]

Data Pre-Processing Starting point: KKB-Q2 2009 Only enzymatic (homogeneous) assays with defined target Only high quality data (IC50, Ki, Kd) Standardized chemical structures (salt forms, stereochemistry, E/Z geometry, tautomers, ionization) Kinase target Entrez Gene names and SwissProt accessions 233,667 unique data points (411 kinases) 126,114 unique chemical structures

KinomeScan Data (Experimental Validation Data) NIH HMS LINCS DataBase (Harvard Medical School LINCS center) http://lincs.hms.harvard.edu/resources/software/hms-lincs-database/ The LINCS program develops a library of molecular signatures based on gene expression and other cellular changes in response to perturbing agents across a variety of cell types using various high-throughput screening approaches 25,064 total datapoints downloaded: 60 unique compounds (43 with defined/known chemical structure) against 486 targets Kinase activity screened at 10 μm concentration Targets mapped to KKB targets by UniProt accessions Data not in KKB Result: 4,796 datapoints from 43 compounds

Outline Kinase Data (KKB) Regression Models Conclusions Naïve Bayesian Classifier Models Conclusions

Quantitative Regression Models K nearest neighbors (knn) [20 nn, Gaussian weighting 0.5] Partial least squares (PLS) [components restricted to 20] Works best with congeneric series Sensitive to outliers and noise 168 kinase datasets have 20 Dps PipelinePilot - ECFP4 (circular) fingerprints as descriptors Full 10 fold cross validation for both PLS and knn (20 repetitions), report R 2, q 2, RMSE

Conclusions - Regression Models Work well for many kinase data sets knn performs slightly better than PLS Larger numbers of data points improve both PLS and knn models Best results for kinases with 50 data points Regression models improve with increased activity range

Outline Kinase Data (KKB) Regression Models Conclusions Naïve Bayesian Classifier Models Conclusions

Laplacien-modified Naïve Bayesian Classifiers Scale linearly, work in high-dimensional spaces (no overfitting), good for structurally diverse cmpds, multiple activity classes, robust to outliers Define data sets by unique kinase gene IDs with active compounds defined as pic50 6 189 kinase data sets with at least 10 active molecules Data were treated in two ways: Known Active - Known Inactive (KA-KI) Presumed Inactive (PI): 126,114 unique chemical structures - Ninact ECFP4 (circular) fingerprints Leave-one-out cross-validation and repetitive train/test evaluation measuring ROC and enrichment factors

* At least 25 actives

Neratinib (HKI 272) CSK Kd = 550 nm Hunt, et al. 2011

Conclusions - Classification Models Work very well for known actives - known inactives (KA-KI) Relevant and applicable for real world, highly unbalanced data sets (KA-PI) Leave-one-out ROC is a good guide of model quality Naïve Bayesian classification is excellent for the majority of kinases (>140) Performance increases markedly with 50 active compounds Very useful for virtual screening and rapid profiling

Eidogen s iphone, ipad, and Android Apps See: eidogen.com, kinasedb.com or kinasedata.com

MobileApps Support Real Scientific Workflows Bioactivity searching (e.g. kinase SAR) Commercial availability Synthesis planning

Acknowledgements Dr. Rajan Sharma and Prof. Stephan Schurer ioncology Dr. Maurizio Bronzetti Dr. Alex Clark MMDSLib: Dr. Tony Yuan JSDraw: SPRESImobile SPRESImobile Dr. Peter Löw, Dr. Josef Eiblmaier, et al. SPRESImobile