The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration

Similar documents
Hit Finding and Optimization Using BLAZE & FORGE

Bioisosteres in Medicinal Chemistry

AMRI COMPOUND LIBRARY CONSORTIUM: A NOVEL WAY TO FILL YOUR DRUG PIPELINE

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

PROVIDING CHEMINFORMATICS SOLUTIONS TO SUPPORT DRUG DISCOVERY DECISIONS

Medicinal Chemist s Relationship with Additivity: Are we Taking the Fundamentals for Granted?

Use of data mining and chemoinformatics in the identification and optimization of high-throughput screening hits for NTDs

Bridging the Dimensions:

Exploring the chemical space of screening results

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery

Practical QSAR and Library Design: Advanced tools for research teams

Computational Methods and Drug-Likeness. Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004

Automated Compound Collection Enhancement: how Pipeline Pilot preserved our sanity. Darren Green GSK

De Novo molecular design with Deep Reinforcement Learning

Web tools for Monomer selection, Library Design and Compound Acquisition. Andrew Leach GlaxoSmithKline Research and Development Stevenage

Introduction. OntoChem

Applying Bioisosteric Transformations to Predict Novel, High Quality Compounds

Interactive Feature Selection with

Early Stages of Drug Discovery in the Pharmaceutical Industry

COMBINATORIAL CHEMISTRY IN A HISTORICAL PERSPECTIVE

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Advanced Medicinal Chemistry SLIDES B

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS

OECD QSAR Toolbox v.3.0

Design Drugs Collaboratively Using Spotfire Visualization and Analysis

Integrated Cheminformatics to Guide Drug Discovery

OECD QSAR Toolbox v.3.3. Step-by-step example of how to categorize an inventory by mechanistic behaviour of the chemicals which it consists

Important Aspects of Fragment Screening Collection Design

Notes of Dr. Anil Mishra at 1

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

An Integrated Approach to in-silico

Biologically Relevant Molecular Comparisons. Mark Mackey

CHAPTER 6 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) ANALYSIS

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

In Silico Investigation of Off-Target Effects

Capturing Chemistry. What you see is what you get In the world of mechanism and chemical transformations

COMBINATORIAL CHEMISTRY: CURRENT APPROACH

Statistical concepts in QSAR.

The Institute of Cancer Research PHD STUDENTSHIP PROJECT PROPOSAL

MOLECULAR DIVERSITY IN DRUG DESIGN

Quantification of free ligand conformational preferences by NMR and their relationship to the bioactive conformation

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

DivCalc: A Utility for Diversity Analysis and Compound Sampling

Translating Methods from Pharma to Flavours & Fragrances

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

György M. Keserű H2020 FRAGNET Network Hungarian Academy of Sciences

Quantum Mechanical Models of P450 Metabolism to Guide Optimization of Metabolic Stability

Chemical library design

OECD QSAR Toolbox v.3.3

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database

Virtual Screening: How Are We Doing?

Design and Synthesis of the Comprehensive Fragment Library

OECD QSAR Toolbox v.3.4

Kinome-wide Activity Models from Diverse High-Quality Datasets

Situation. The XPS project. PSO publication pattern. Problem. Aims. Areas

Overview. Descriptors. Definition. Descriptors. Overview 2D-QSAR. Number Vector Function. Physicochemical property (log P) Atom

Receptor Based Drug Design (1)

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.

Enamine Golden Fragment Library

Using AutoDock for Virtual Screening

Medicinal Chemistry/ CHEM 458/658 Chapter 3- SAR and QSAR

Reaxys Medicinal Chemistry Fact Sheet

Molecular Dynamics Graphical Visualization 3-D QSAR Pharmacophore QSAR, COMBINE, Scoring Functions, Homology Modeling,..

In silico pharmacology for drug discovery

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction

Lecture for Molekylär bioinformatik X3 Feb Computational Chemistry in Drug Discovery. Mats Kihlén. Head of Research Informatics.

Synthetic organic compounds

Sensitivity Analysis with Correlated Variables

Unlocking the potential of your drug discovery programme

Regulatory use of (Q)SARs under REACH

Structure-Based Drug Discovery An Overview

Identifying Promising Compounds in Drug Discovery: Genetic Algorithms and Some New Statistical Techniques

Introduction to Statistical Analysis using IBM SPSS Statistics (v24)

A Picture Paints a Thousand Words Visualisation of SAR. John Cumming, AstraZeneca, Alderley Park, UK

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Structure-Activity Modeling - QSAR. Uwe Koch

Hydrogen Bonding & Molecular Design Peter

Introduction to FBDD Fragment screening methods and library design

LIFE Project Acronym and Number ANTARES LIFE08 ENV/IT/ Deliverable Report. Deliverable Name and Number

(e.g.training and prediction set, algorithm, ecc...). 2.9.Availability of another QMRF for exactly the same model: No other information available

Automated Stability Testing Delivering Rapid Product Development

TRAINING REAXYS MEDICINAL CHEMISTRY

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

October 6 University Faculty of pharmacy Computer Aided Drug Design Unit

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Exploring the black box: structural and functional interpretation of QSAR models.

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Analyzing Molecular Conformations Using the Cambridge Structural Database. Jason Cole Cambridge Crystallographic Data Centre

Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt

Synthetic organic compounds

OECD QSAR Toolbox v.3.2. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

Protein structure based approaches to inhibit Plasmodium DHODH for malaria

Kernel-based Machine Learning for Virtual Screening

An automated synthesis programme for drug discovery

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Dynamic Combinatorial Chemistry in the identification of new host guest interactions: proof of principle

OECD QSAR Toolbox v.4.1

New Synthetic Technologies in Medicinal Chemistry

Project I. Heterocyclic and medicinal chemistry

Transcription:

The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration Chris Luscombe, Computational Chemistry GlaxoSmithKline

Summary of Talk Traditional approaches SAR Free-Wilson Design of experiments Examples Learnings Conclusions

Objectives of Lead Optimisation Design Array experiments to answer SAR questions to enhance potency Improve physicochemical properties Discover new monomer groups of interest. LHS O N H Core/Linker O RHS

How do we traditionally determine SAR array design Optimisation at a single position allows Easy synthesis planning Detailed understanding of SAR Assumes FW type additivity Substituent contributions at different positions are independent and additive This approach is widely used and very successful

Free Wilson theory R1-Core-R2 First mathematical technique for quantitative SAR Response = effect of Core + effect R1 substituent + effect of R2 substituent Assumptions Core makes a constant contribution All contributions are additive No interactions between core and substituent No interaction between substituents Can only explore chemical space defined by R-group combinations in the training set

Assessing Additivity Assumptions Assessment of Additive/Nonadditive Effects in Structure-Activity Relationships: Implications for Iterative Drug Design J. Med. Chem. 2008, 51, 7552 7562 Yogendra Patel, Valerie J. Gillet, Trevor Howe, Joaquin Pastor, Julen Oyarzabal, and Peter Willett

Design of Experiments (DOE) Experimental Design approaches are well established for the optimization of multi-factor experiments, such as reaction conditions. Typically these domains utilize continuous variables such as temperature, addition rate, time etc Can these same techniques be use where each variable is categorical?

DOE in Medicinal Chemistry? We propose that Design of Experiments (DOE) based approaches can be applied to array scenarios where the full (e.g. M x N) array cannot be synthesized for practical reasons. By treating each monomer in the array as a categorical factor of the design, a balanced fractional ( Sparse ) array design can be generated. This novel approach can be successfully used to understand and exploit the SAR of a late stage optimisation programme

Example of a Sparse Array 1/3 rd fraction from an 6 x 12 array Scatter Plot Scatter Plot amine6 amine6 amine5 amine5 amine4 amine4 amine3 amine3 amine2 amine2 amine1 amine1 Phenol_ID Phenol_ID

Questions Is the fraction selected sufficient to explore the chemistry space? Can we adequately assess monomer potential? Can we predict the missing compounds? Is it a practical way to direct chemistry synthesis? Is it an efficient process? Does it work?

Sparse Array :What are the key steps? Monomers Identify/ select which monomers to incorporate into the design Design Create an experimental design template appropriate to the investigational space define from the monomer numbers Optimise Allocate monomers into the design so define which compounds to actually synthesize Analyse Measure assay endpoints and build free-wilson models to understand the SAR

Monomer Selection Identify appropriate monomers at each position Use diversity, physico-chemical, ADME and scientific rationale to reduce the monomer lists Calculate the average desirability score from each monomer across the whole virtual library. Select the higher scoring ones to be included in the Final DOE array design landscape A9 A8 A7 A6 A5 A4 A3 A24 A23 A22 A21 A20 A2 A19 A18 A17 A16 A15 A14 A13 A12 A11 A10 A1 amines 0.35 0.4 0.45 0.5 0.55 0.6 Mean_NR profile_by_amine B9 B8 B7 B6 B5 B4 B3 B22 B21 B20 B2 B19 B18 B17 B16 B15 B14 B13 B12 B11 B10 B1 nr score aldehydes 0.4 0.45 0.5 0.55 0.6 Mean_NR profile_by_aldehyde 1 2820 0.9 2500 2437 0.8 2000 0.7 0.6 1500 1298 1382 1624 0.5 1000 901 897 0.4 500 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 NR profile 0 109 146 2 0 0.09 0.18 0.27 0.36 0.45 0.54 0.63 0.72 0.81 0.9 Binned NR profile

Sparse Array :What are the key steps? Monomers Identify/ select which monomers to incorporate into the design Design Create an experimental design template define from the monomer numbers Optimise Allocate monomers into the design so define which compounds to synthesize Analyse Measure assay endpoints and build free-wilson models to understand the SAR

Design Creation (Sparse arrays) Create an in-complete balanced D-Optimal design Even numbers of monomers at each R position D Optimality Force balance Scatter Plot Level 9 of A Level 8 of A Level 7 of A Level 6 of A Level 5 of A Level 4 of A Level 3 of A Level 2 of A Level 12 of A Level 11 of A Level 10 of A Many software packages around which can generate these types of Experimental Design Level 1 of A Level 1 of B Level 2 of B Level 3 of B Level 4 of B Level 5 of B Level 6 of B Amine ID

Sparse Array :What are the key steps? Monomers Identify/ select which monomers to incorporate into the design Design Create an experimental design template appropriate to the investigational space define from the monomer numbers Optimise Allocate monomers into the design to define which compounds to actually synthesize Analyse Measure assay endpoints and build free-wilson models to understand the SAR

Which Monomer at which Position? Scatter Plot Level 9 of A Level 8 of A Level 7 of A Level 6 of A Level 5 of A Level 4 of A Level 3 of A Level 2 of A Level 12 of A Level 11 of A Level 10 of A Level 1 of A In principle monomers could be allocated in any order, including random, into the DOE array GSK use an in-house algorithmic approach to allocate monomers into the define positions in the DOE array so as to optimise the compounds to be synthesized against another property Eg diversity, lead-likeness, logp etc Level 1 of B Level 2 of B Level 3 of B Level 4 of B Level 5 of B Level 6 of B Amine ID Compounds A:Phenols B:Amines 1 Phenol 3 Amine 2 2 Phenol 6 Amine 6 3 Phenol 7 Amine 5 4 Phenol 3 Amine 3 5 Phenol 9 Amine 5 6 Phenol 6 Amine 1 7 Phenol 10 Amine 4 8 Phenol 10 Amine 2 9 Phenol 8 Amine 4 10 Phenol 5 Amine 3 11 Phenol 4 Amine 6 12 Phenol 5 Amine 1 13 Phenol 1 Amine 1 14 Phenol 11 Amine 5 15 Phenol 12 Amine 2 16 Phenol 12 Amine 6 17 Phenol 1 Amine 5 18 Phenol 8 Amine 3 19 Phenol 7 Amine 4 20 Phenol 11 Amine 6 21 Phenol 2 Amine 1 22 Phenol 9 Amine 3 23 Phenol 4 Amine 2 24 Phenol 2 Amine 4

Sparse Array :What are the key steps? Monomers Identify/ select which monomers to incorporate into the design Design Create an experimental design template appropriate to the investigational space define from the monomer numbers Optimise Allocate monomers into the design so define which compounds to actually synthesize Analyse Measure assay endpoints and build free- Wilson models to understand the SAR

Predicted FW analysis of monomer contribution A Free Wilson analysis is a regression based approach Design-Expert to Software GTPgS_CCR4_pIC50 establish monomer contributions to a predictive model Color points by value of GTPgS_CCR4_pIC50: 7.7 A high degree of fit suggests 5.5 that the potency profile could be additive in nature. The presence of outliers may imply non-additive behaviour Assess potential interaction terms between monomers if the output appears to be non-additive 7.80 7.13 6.45 5.78 5.10 2 2 2 Predicted vs. Actual 5.00 5.46 5.91 6.37 6.83 7.28 7.74 Actual

Example 1 Sparse array to evaluate defined N x M combinatorial space with a fractional subset Design 12 Indazoles (R1) Identified using classical SAR approaches Scatter Plot 48 sulphonyl chlorides monomers (R2) selected from library using a variety of criteria Lead-likeness score R2 Indazoles R1 12 monomers per R1 3 monomers per R2

Measured Potency for the Sparse array 142 of 144 compounds from patchwork array were synthesised and tested Coloured for potency, sized by ligand efficiency Clear that some Indazoles are more promising than others Array Indazole R1

Predicted potency from FW model Sparse Array Data Analysis Scatter Plot (2) 7.5 Statistical analysis was done to evaluate additivity 7 6.5 6 5.5 Free Wilson model: Predicted potencies were plotted against measured potencies The FW model show potential excellent additivity with no outliers. 5 5.5 6 6.5 7 7.5 FW-Fit:RG-All:mol8_GR203498:GTPgS_CCR4_Human_Antagonist_pic50_Value (1) Measured potency

Predicted Potency for the complete array of 576 compounds (Fit and Predict), only Actives (pic50>6.5 shown) Array RG-R2 (48 Variants) Indazole R1 RG-R1 (12 Variants)

Find the predicted most potent compounds that haven t already been synthesized Array C1 C2 C3 C4 RG-R2 (48 variants) C5 C6 C7 Indazole R1 RG-R1 (12 variants)

Predicted potent compounds All compounds subsequently synthesized had measured potencies within +/- 0.2 pic50 of the predicted value Validated the Additivity assumption Identified promising alternatives which were sent for further PK analysis potential back up to the current pre-candidate C1 Predicted GTPgS = 7.6 BEI = 16.0 Measured = 7.6 C4 Predicted GTPgS = 7.5 BEI = 14.2 Measured = 7.4 C2 Predicted GTPgS = 7.5 BEI = 13.5 Measured = 7.6 C5 Predicted GTPgS = 7.6 BEI = 15.6 Measured = 7.5 C3 Predicted GTPgS = 7.5 BEI = 14.8 Measured = 7.3

CAT friendly example Sparse Array Automation CAT : Automated array chemistry system A particular design (nicknamed the Tetris array) which is array automation friendly and thus allows these investigational approaches to be carried out efficiently from a synthetic perspective.

Exploration of Chemical space coverage for a Dual targetting programme R4 monomers (96) Scatter Plot [4*]Cc1cccnc1 [4*]Cc1ccc(Cl)cc1 Status of exploration [4*]CCn1cccn1 [4*]CCc1ccccc1OCC [4*]CCc1c[nH]c(n1)C(C)C [4*]CCN1CCOCC1 [4*]CCCOC [4*]CCC [4*]CC(=C)C [4*]C(CCO)C1CC1 [4*]C(C)C1(C)CC1 [4*]C R6 RG-R6 monomers (67) 32 R4 and 12 R6 monomers were chosen for inclusion in Sparse array

R4 Monomers CAT friendly 8 (from 32) x 12 Tetris Array The experimental design chosen is a 8 x12 chosen from a potential 32 x 12 fully enumerated array (384 potential compounds). (¼ fraction) Level 9 of A Level 8 of A Level 7 of A Level 6 of A Level 5 of A Level 4 of A Level 32 of A Level 31 of A Level 30 of A Level 3 of A Level 29 of A Level 28 of A Level 27 of A Level 26 of A Level 25 of A Level 24 of A Level 23 of A Level 22 of A Level 21 of A Level 20 of A Level 2 of A Level 19 of A Level 18 of A Level 17 of A Level 16 of A Level 15 of A Level 14 of A Level 13 of A Level 12 of A Level 11 of A Level 10 of A Level 1 of A Scatter Plot (2) Each coloured block represents one of the 32 R4 monomers Each R4 monomer is used 3 times Each R6 monomer is used 12 times 8 7 6 5 4 3 2 1 Factor Bar Chart 2 (2) 8 8 8 8 8 8 8 8 8 8 8 8 0 Factor 2 R6 Monomers

Sparse array results R6 bar Using the CAT the synthesis was done efficiently and effectively Synthesis was actually done using 8 linear (1x12) arrays 8 7 6 5 4 3 2 7 8 8 6 5 8 8 4 7 7 2 7 For the Sparse array synthesis of 77 of the 96 compounds was achieved and the compounds delivered to screening. This is approximately 75% of full sparse array Only 20% of the fully enumerated array 1 0 B94 B93 B92 B91 B9 B88 B87 B80 B76 B73 B70 B66 B63 B61 B60 B58 B57 B56 B55 B54 B46 B44 B41 B40 B39 B37 B35 B34 B33 B16 B13 B11 C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66 RG_R6 Code design C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66 RG_R6 Code

Monomer contribution The Programme team concluded that the chemistry within this area of chemical space was well understood wrt target potency. 2 1.5 1 0.5 0-0.5-1 -1.5 R4 Coeff bar B11 B16 B34 B37 B40 B44 B54 B56 B58 B61 B66 B73 B80 B88 B91 B93 RG_R4 Code The Programme team predicted potent analogues with targetted physchem profiles for synthesis 0.5 0-0.5-1 -1.5-2 R6 coeff bar C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66 RG_R6 Code

Orthogonal dual assay Further Developments Dual target The monomers chosen in the array were selected to create Primary actives but were not thought likely to have any potential in Secondary target assay However, surprisingly 14 compounds were found to be active in the second assay Currently being followed up in the programme team as potential dual antagonists 6.4 6.2 6 5.8 5.6 5.4 5.2 FW (2) Further analogue expansion around this monomer C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66 RG_R6 Code Sized by Primary potency coloured by R4 group

EXAMPLE 3 EXTENDED 3 RG TETRIS SPARSE ARRAY 3 points of change on the molecule Acids O N H Core O Phenols

Extended 3 RG Tetris Sparse array Cores (A) = 3 (These were used to explore a stereo chemistry question) Phenols (B) = 4 Acids (C) = 24 All acids represented 3 times 3x4x24 = 288 compounds 25% of full array synthesised Distribution balanced Extended TETRIS array Coloured by Acid monomer group

OTHER DESIGN TYPES Latin Squares: Symmetrical design spaces Useful for n x n x n problems where n = number of monomers in each RG position Eg 6 R1 x 6 R2 x 6 R3 A 1/n fraction is selected R2 N R1 R3

Three RG positions Latin Squares Predictive Array Design: LIPKIN, ROSE,SAR and QSAR in Environmental Research, 2002 Vol 13 (3-4) pp425-432 Scatter Plot Pie Chart B6 B5 B4 B3 B2 B1 A1 A2 A3 A4 A5 A6 Factor 1 Each possible pair of monomers is present once and that each monomer is present an equal number of times defined by the array dimensions.

Pros and Cons of Sparse array approaches Objective exploration generates an optimal data set for ANOVA / Free-Wilson analysis. Chemistry may be more difficult to carry-out Complete evaluation of potency response within the design space from only a fraction of the possible compounds Needs a reasonable resource commitment upfront Defined endpoint to the work Needs majority of compounds chosen in the array to be made and measured for the analysis to be robust Excellent data set for QSAR Assumes Additivity (but then so does Linear SAR exploration)

Learnings from experience Ideally 3 examples minimum for each monomer within the design, although 2 will work for a robust assay and chemistry Need to have confidence in getting some active compounds If all the compounds are inactive its difficult to fit a model! Confidence in ability to synthesize compounds Some loss of particular compounds can be tolerated but if whole reactions fail then the array design will be compromised

Summary Experimental Design may provide an alternative /complementary strategy which may be suitable in some circumstances E.g. Initial exploration of new monomer space Identification of back up compounds Establish Addivity in the series Efficient Lead Optimisation by exploring more than one point of change at the same time on the molecular template Can unearth some surprises which may never have been found by traditional processes There are different design types for different situations Software is available to create the designs Work well in situations where the bespoke synthesis is contracted out

Acknowledgements Tony Cooper Heather Hobbs Medicinal Chemistry Nick Barton Stephen Pickett Darren Green Computational Chemistry

References to literature to date How design Concepts can Improve Experimentation: Mager 1997 Use of iterative search techniques to find monomers which have the required levels of particular physico chemical properties to fit into an ideal experimental design Statistical Molecular Design of BB for CombiChem: Linusson, Wold etal 1999 Selection of BB s using t- scores of the enumerated library as variables and then applying D-Optimal, Space Filling or Cluster based selection strategies Predictive Array Design: LIPKIN, ROSE etal, 2002 Use of Latin Squares Simulated Annealing for Monomer assignment