QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Similar documents
In Silico Investigation of Off-Target Effects

Reaxys Medicinal Chemistry Fact Sheet

Elsevier R&D Solutions. Tool Sheet. Exploring a chemical reaction

The shortest path to chemistry data and literature

Reaxys Improved synthetic planning with Reaxys

Reaxys Pipeline Pilot Components Installation and User Guide

Solution Story: Dr. Carsten Schauerte, Managing Director of solid-chem GmbH

In silico pharmacology for drug discovery

A powerful site for all chemists CHOICE CRC Handbook of Chemistry and Physics

Introducing a Bioinformatics Similarity Search Solution

Geofacets EDUCATION & RESEARCH

PIOTR GOLKIEWICZ LIFE SCIENCES SOLUTIONS CONSULTANT CENTRAL-EASTERN EUROPE

Modifying natural products: a fresh look at traditional medicine

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Chemically Intelligent Experiment Data Management

TRAINING REAXYS MEDICINAL CHEMISTRY

DRUG DISCOVERY TODAY ELN ELN. Chemistry. Biology. Known ligands. DBs. Generate chemistry ideas. Check chemical feasibility In-house.

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Structure-Activity Modeling - QSAR. Uwe Koch

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit

Reaxys The Highlights

Kinome-wide Activity Models from Diverse High-Quality Datasets

SciFinder Premier CAS solutions to explore all chemistry MethodsNow, PatentPak, ChemZent, SciFinder n

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

The Case for Use Cases

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

CHAPTER 6 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) ANALYSIS

Similarity Search. Uwe Koch

Tutorials on Library Design E. Lounkine and J. Bajorath (University of Bonn) C. Muller and A. Varnek (University of Strasbourg)

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build a userdefined

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Notes of Dr. Anil Mishra at 1

Reaxys Managing Complexity

Statistical concepts in QSAR.

How to Create a Substance Answer Set

OECD QSAR Toolbox v.4.1. Step-by-step example for building QSAR model

Geofacets. Do more with less OIL & GAS

Structural biology and drug design: An overview

Receptor Based Drug Design (1)

Database Speaks. Ling-Kang Liu ( 劉陵崗 ) Institute of Chemistry, Academia Sinica Nangang, Taipei 115, Taiwan

CSD. Unlock value from crystal structure information in the CSD

Chemical Data Retrieval and Management

De Novo molecular design with Deep Reinforcement Learning

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Supplementary information

Searching Substances in Reaxys

Data Quality Issues That Can Impact Drug Discovery

How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space

Practical QSAR and Library Design: Advanced tools for research teams

for XPS surface analysis

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Introduction to Chemoinformatics and Drug Discovery

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Schrodinger ebootcamp #3, Summer EXPLORING METHODS FOR CONFORMER SEARCHING Jas Bhachoo, Senior Applications Scientist

Priority Setting of Endocrine Disruptors Using QSARs

Plan. Day 2: Exercise on MHC molecules.

Capturing Chemistry. What you see is what you get In the world of mechanism and chemical transformations

Development of Pharmacophore Model for Indeno[1,2-b]indoles as Human Protein Kinase CK2 Inhibitors and Database Mining

Kristin P. Bennett. Rensselaer Polytechnic Institute

Structure and Reaction querying in Reaxys

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

JCICS Major Research Areas

Bisphosphonate Inhibition of a Plasmodium Farnesyl Diphosphate. Synthase and a General Method for Predicting Cell-Based Activity.

OECD QSAR Toolbox v.3.2. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

Supplementary Materials: Phenolic Constituents of Medicinal Plants with Activity against Trypanosoma brucei

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.

Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data

Emerging patterns mining and automated detection of contrasting chemical features

CHEMINFORMATICS MODELING OF DIVERSE AND DISPARATE BIOLOGICAL DATA AND THE USE OF MODELS TO DISCOVER NOVEL BIOACTIVE MOLECULES

5.1. Hardwares, Softwares and Web server used in Molecular modeling

User Guide for LeDock

Creating a Pharmacophore Query from a Reference Molecule & Scaffold Hopping in CSD-CrossMiner

Caspase 3 Inhibitor Drug Detection Kit

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

OECD QSAR Toolbox v.3.4. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

Using Phase for Pharmacophore Modelling. 5th European Life Science Bootcamp March, 2017

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

Machine learning for ligand-based virtual screening and chemogenomics!

Exploring the black box: structural and functional interpretation of QSAR models.

Introduction. OntoChem

Dispensing Processes Profoundly Impact Biological, Computational and Statistical Analyses

Virtual screening in drug discovery

Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Chemical Space: Modeling Exploration & Understanding

Application integration: Providing coherent drug discovery solutions

1.3.Software coding the model: QSARModel Molcode Ltd., Turu 2, Tartu, 51014, Estonia

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

Machine Learning Concepts in Chemoinformatics

Virtual affinity fingerprints in drug discovery: The Drug Profile Matching method

Design and Synthesis of the Comprehensive Fragment Library

An Integrated Approach to in-silico

DECEMBER 2014 REAXYS R201 ADVANCED STRUCTURE SEARCHING

Computational chemical biology to address non-traditional drug targets. John Karanicolas

DivCalc: A Utility for Diversity Analysis and Compound Sampling

Using Self-Organizing maps to accelerate similarity search

Transcription:

APPLICATION NOTE QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression GAINING EFFICIENCY IN QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS ErbB1 kinase is the cell-surface receptor for epidermal growth factor. Its dysfunction has been implicated in diseases such as cancer. This paper illustrates the use of the Reaxys Medicinal Chemistry knowledgebase to build efficient QSAR models using ErbB1 kinase as an example.

QSAR relates molecular descriptors to biological activity for known compounds. Reaxys Medicinal Chemistry can provide computational chemists with large and diverse datasets for building predictive models. These models can be used for virtual screening or during lead optimization. Concisely, QSAR (quantitative structure activity relationships) modeling is based on a mathematical equation that relates molecular descriptors to biological activity for known series of compounds to create a model for evaluating the activity of new chemical entities (1, 2). This paper illustrates the use of the Reaxys Medicinal Chemistry knowledgebase to build efficient QSAR models. ErbB1 kinase will be used as an example. DATASET SELECTION The first step in building a suitable dataset for QSAR modeling is to collect all pertinent chemical and biological information relating to the ErbB1 target. Searching by target name followed by selecting the specific isoform enables easy retrieval of all the relevant chemical and biological data for ErbB1 (Figure 1). Similar searches were conducted for the other ErbB kinases to illustrate the extent of chemical and biological data available in Reaxys Medicinal Chemistry (Table 1). To facilitate comparisons of bioactivity data from different publications and assay types, all in vitro data points in Reaxys Medicinal Chemistry have px values. px values are calculated by transforming parameters such as EC 50, IC 50, K i into the log equivalent (pec 50, pic 50, pk i ). These are normalized values assigned to the data that enable easy quantification of compound target affinity and comparison of information from all around the world. Figure 1. EGFR (ErbB) kinase query (Note: VEGFR not included in this search) 2

Target Number of bioactivities Number of substances Number of citations Number of bioactivities with px > 7 Number of substances with px > 7 EGFR all* 201,803 80,905 3,147 23,049 10,099 ErbB1 4,255 3,776 271 749 659 ErbB2 57,687 34,251 1,190 4,494 3,154 ErbB3 2,295 2,201 103 42 20 ErbB4 15,295 11,252 485 344 272 Table 1. Statistics on the EGFR (ErbB) kinase data available in Reaxys Medicinal Chemistry. *VEGFR not included, but ErbB variants, isoforms and mutants are included An activity profile for the most potent inhibitors of all EGFR (ErbB) receptors with a px value of greater than 7.0 (affinity <100 nm) can be generated and viewed as a Heatmap (Figure 2). The Heatmap visualizes the relationships between compounds and their targets in terms of key parameters, allowing rapid identification of relevant compound target interactions. The highest px values were selected for display in the Heatmap. Figure 2. Heatmap for inhibitors of all EGFR kinases with px activity above 7.0 (affinity <100 nm) 3

In the Heatmap, biological affinities or activities are quantified as a px value and displayed from 1 (low activity) in blue to 15 (high activity) in red. The color of the Heatmap cells represents the maximal px retrieved for a given compound (line) against a given target (column). The thumbnail provides an overview of the entire Heatmap with a panel highlighting the section of the map currently displayed on screen. The dataset can be analyzed using the data density display, which shows the number of compounds retrieved per target. At the time of the query, a total of 3,776 compounds were recorded in Reaxys Medicinal Chemistry as being inhibitors of the ErbB1 (EGFR-1) kinase. 4,255 biological activities coming from 271 citations were registered. When the query was limited to humans, Reaxys Medicinal Chemistry retrieved 880 compounds with 1,227 associated bioactivities (Figure 3). A B Figure 3. A Heatmap for inhibitors of ErbB1 kinase. The dataset can be filtered by target species as highlighted. B The most potent ErbB1 inhibitors with px activity above 7.0 (affinity < 100 nm) were selected and the Heatmap was sorted by activity against ErbB1 with the compound highlighted having a px = 9.5 (IC 50 = 0.33 nm). 4

Filtering the dataset further by parameter and selecting only those with IC 50 values, 746 compounds with 982 associated bioactivities were retrieved. Finally, the most potent ErbB1 inhibitors were selected (those with a px >7; i.e., IC 50 < 100 nm). Using the Substances tab, detailed information can be obtained for each compound (Figure 4). Figure 4. Selection of ErbB1 kinase inhibitors as shown in the Substances tab in Reaxys Medicinal Chemistry DATASET SELECTION The first step in building a suitable dataset for QSAR modeling is to collect all pertinent COMPUTATION Prior to building models, 56 2D molecular descriptors were computed for a set of 66 ErbB1 compounds. They included 32 P_VSA based descriptors, 12 BCUT descriptors and 12 GCUT descriptors (3). These descriptors encode the three main physicochemical properties: hydrophobicity (SlogP_VSA, BCUT_SlogP, GCUT_SlogP), polarizability (SMR_VSA, BCUT_ SMR, GCUT_SMR) and electrostatic interactions (PEOE_VSA, BCUT_PEOE, GCUT_PEOE). To build present regression models, QuaSAR-Evolution was used, which is a genetic-based algorithm implemented in the MOE cheminformatic suite. QuaSAR-Evolution applies the genetic algorithm to the problem of descriptor selection in QSAR. Descriptors selected at random are combined and a population of regression models is generated. The default setting was used for descriptor selection and the initial length (the number of descriptors) was set to 4. 5

RESULTS AND DISCUSSION An initial model was built considering the set of 66 ErbB1 compounds as a training set. The data training set was selected by removing ErbB1 variants and mutants and a subset of bioassays was used. Among the resulting QSAR equations, the best-performing one was: pic 50 = 11.195 + 13.613GCUT_SlogP_1 0.015PEOE_VSA+0 0.177PEOE_ VSA-3 + 0.037SMR_VSA4 N = 66 ; n = 4 ; R 2 = 0.84 ; RMSE = 0.47. N is the number of data points; n is the number of molecular descriptor (initial length). Despite the chemical diversity of the training set, only 4 descriptors were sufficient to explain the biological activity leading to a satisfactory goodness of fit R 2 of 0.84 (Figure 5). Figure 5. Predicted pic 50 vs observed pic 50 in model 1 (left) and model 2 (right) The selected descriptors exhibit a very small inter-correlation, as can be shown on their correlation matrix (Figure 6). Figure 6. Matrix correlation of four molecular descriptors selected by the GA(right) 6

The leave-one-out (LOO) cross-validation led to a cross-validated Q2 of 0.81, revealing the model validity. Examination of selected descriptors indicates a strong contribution of hydrophobic interactions expressed by GCUT_SlogP_1 descriptor. As mentioned previously, the full set of 66 molecules were used in this initial model to include as much structural information as possible. However, it is worthwhile to validate the approach using an external dataset. The set of 66 ErbB1 inhibitors were therefore divided into two sets: a training set of 50 molecules for modeling and a test set of 16 molecules for prediction. The best-performing equation was: pic 50 = 11.1061 + 13.4414GCUT_SLOGP_1 0.0124712PEOE_VSA+ 0 0.222155PEOE_VSA-3 + 0.0352586SMR_VSA4 N = 66 ; n = 4 ; R 2 = 0.86 ; RMSE = 0.46 ; RMSE_Test = 0.49. The equation was found to be formulated with the same set of molecular descriptors and associated coefficients close to those obtained in model 1. This may be because the training set comprised the top 50 dissimilar molecules, covering all the chemical diversity of the entire dataset. Szántai-Kis et al. reported on a 2D QSAR model using automatic Variable Subset Selection by Genetic Algorithm (3). However, these authors considered all the ligands tested on EGFR kinase without any further specification regarding the target (ErbB1, ErbB2, ErbB3 or ErbB4), target type or species. These additional filters typically help to extract homogeneous datasets and minimize errors that may arise from biological data. Many other EGFR QSAR studies have been reported but all have used 3D approaches with structurally close chemical cores. Regarding the chemical diversity covered by this analysis, the local/global character of the present model cannot be determined due to the dataset being limited to 66 ErbB1 compounds. Yet, the selected ErbB1 dataset shows reasonable chemical diversity (Figure 7). Figure 7. ErbB1 dataset projected on the first three principal components computed from used 2D molecular descriptors 7

Reaxys Medicinal Chemistry encompasses more than 447,000 literature sources, 12,700 targets and over 29 million bioactivity data points. WHY USE REAXYS MEDICINAL CHEMISTRY FOR QSAR MODELING? The database of this research solution is organized around compounds, targets and biological activities. Each element is described and organized into logical hierarchies according to experimental protocols appearing in the literature. Taken as a whole, the information compiled by Reaxys Medicinal Chemistry creates a global pharmacology space encompassing more than 447,000 literature sources, 12,700 targets and over 29 million bioactivity data points. The user interface enables scientists to navigate this pharmacology space and conduct a variety of searches, including substructure, chemical similarity and target-specific searches to explore the bioactivity profile for targets, cell lines or drugs/compounds. Retrieved data is displayed in the interactive results screen, where the user can filter data to exclude or limit certain data sets, focus on compounds or targets or view a complete list of citations. Reaxys Medicinal Chemistry displays compound target data as an interactive Heatmap so that researchers can rapidly visualize, navigate and filter results based on various parameters such as activity, species, bioassay protocols, publication type and standard target classification hierarchies. Importantly for QSAR and other chemoinformatics modeling, Reaxys Medicinal Chemistry provides multiple options to use and integrate the content into existing tools and in-house systems. Data can easily be exported for use in a number of popular modeling packages such as MOE (Molecular Operating Environment, Chemical Computing Group), Schrödinger Small-Molecule Drug Discovery Suite, Biovia (Dassault Systèmes) and ChemAxon products or integrated into in-house software. The extensive database can be used to conduct pharmacophoric similarity searches, chemical space analyses, structural analog searching, virtual screening and quantitative structure activity or structure property relationship (QSAR/QSPR) modeling. ESSENTIAL DRUG DISCOVERY SOLUTION Reaxys Medicinal Chemistry is an extensive database containing chemical information linked to in vitro and in vivo biological activities extracted from over 300,000 articles, 90,000 patents and 5,000 journals. More than 6 million chemical compounds are associated with their biological data (> 29 million bioactivity data points) and linked to information on 12,700 pharmacological targets, allowing the scientists to reveal connections between compounds, effects and targets. The data is indexed and normalized for maximum searchability and consistency. Note: This application note is for illustrative purposes only: the information in this report should not be referenced or relied upon as a basis for further research and development. 8

REFERENCES 1. Engel, T. (2006) Basic Overview of Cheminformatics. J. Chem. Inf. Model. 46: 2267 2277. 2. Polanski, J. Bak, A., Gielciak, R. and Magdiarz, T. (2006) Modeling robust QSAR. J. Chem. Inf. Model. 46: 2310 2318. 3. Szántai-Kis, C., Kövesdi, I., Eros, D., Bánhegyi, P., Ullrich, A. (2006) Prediction oriented QSAR modelling of EGFR inhibition. Current Medicinal Chemistry 13: 277 287. 9

LEARN MORE To request information or a product demonstration, please visit elsevier.com/reaxys or email us at reaxys@elsevier.com. Visit elsevier.com/rd-solutions or contact your nearest Elsevier office. ASIA AND AUSTRALIA Tel: +65 6349 0222 Email: sginfo@elsevier.com JAPAN Tel: +81 3 5561 5034 Email: jpinfo@elsevier.com KOREA AND TAIWAN Tel: +82 2 6714 3000 Email: krinfo.corp@elsevier.com EUROPE, MIDDLE EAST AND AFRICA Tel: +31 20 485 3767 Email: nlinfo@elsevier.com NORTH AMERICA, CENTRAL AMERICA AND CANADA Tel: +1 888 615 4500 Email: usinfo@elsevier.com SOUTH AMERICA Tel: +55 21 3970 9300 Email: brinfo@elsevier.com REAXYS is a trademark of Reed Elsevier Properties SA, used under license. MOE is a trademark of Chemical Computing Group. Schrödinger is a registered trademark of Schrödinger, LLC. Biovia is a registered trademark of Dassault Systèmes. Copyright 2015 Elsevier B.V. All rights reserved. November 2015