The use of Design of Experiments to develop Efficient Arrays for SAR and Property Exploration Chris Luscombe, Computational Chemistry GlaxoSmithKline
Summary of Talk Traditional approaches SAR Free-Wilson Design of experiments Examples Learnings Conclusions
Objectives of Lead Optimisation Design Array experiments to answer SAR questions to enhance potency Improve physicochemical properties Discover new monomer groups of interest. LHS O N H Core/Linker O RHS
How do we traditionally determine SAR array design Optimisation at a single position allows Easy synthesis planning Detailed understanding of SAR Assumes FW type additivity Substituent contributions at different positions are independent and additive This approach is widely used and very successful
Free Wilson theory R1-Core-R2 First mathematical technique for quantitative SAR Response = effect of Core + effect R1 substituent + effect of R2 substituent Assumptions Core makes a constant contribution All contributions are additive No interactions between core and substituent No interaction between substituents Can only explore chemical space defined by R-group combinations in the training set
Assessing Additivity Assumptions Assessment of Additive/Nonadditive Effects in Structure-Activity Relationships: Implications for Iterative Drug Design J. Med. Chem. 2008, 51, 7552 7562 Yogendra Patel, Valerie J. Gillet, Trevor Howe, Joaquin Pastor, Julen Oyarzabal, and Peter Willett
Design of Experiments (DOE) Experimental Design approaches are well established for the optimization of multi-factor experiments, such as reaction conditions. Typically these domains utilize continuous variables such as temperature, addition rate, time etc Can these same techniques be use where each variable is categorical?
DOE in Medicinal Chemistry? We propose that Design of Experiments (DOE) based approaches can be applied to array scenarios where the full (e.g. M x N) array cannot be synthesized for practical reasons. By treating each monomer in the array as a categorical factor of the design, a balanced fractional ( Sparse ) array design can be generated. This novel approach can be successfully used to understand and exploit the SAR of a late stage optimisation programme
Example of a Sparse Array 1/3 rd fraction from an 6 x 12 array Scatter Plot Scatter Plot amine6 amine6 amine5 amine5 amine4 amine4 amine3 amine3 amine2 amine2 amine1 amine1 Phenol_ID Phenol_ID
Questions Is the fraction selected sufficient to explore the chemistry space? Can we adequately assess monomer potential? Can we predict the missing compounds? Is it a practical way to direct chemistry synthesis? Is it an efficient process? Does it work?
Sparse Array :What are the key steps? Monomers Identify/ select which monomers to incorporate into the design Design Create an experimental design template appropriate to the investigational space define from the monomer numbers Optimise Allocate monomers into the design so define which compounds to actually synthesize Analyse Measure assay endpoints and build free-wilson models to understand the SAR
Monomer Selection Identify appropriate monomers at each position Use diversity, physico-chemical, ADME and scientific rationale to reduce the monomer lists Calculate the average desirability score from each monomer across the whole virtual library. Select the higher scoring ones to be included in the Final DOE array design landscape A9 A8 A7 A6 A5 A4 A3 A24 A23 A22 A21 A20 A2 A19 A18 A17 A16 A15 A14 A13 A12 A11 A10 A1 amines 0.35 0.4 0.45 0.5 0.55 0.6 Mean_NR profile_by_amine B9 B8 B7 B6 B5 B4 B3 B22 B21 B20 B2 B19 B18 B17 B16 B15 B14 B13 B12 B11 B10 B1 nr score aldehydes 0.4 0.45 0.5 0.55 0.6 Mean_NR profile_by_aldehyde 1 2820 0.9 2500 2437 0.8 2000 0.7 0.6 1500 1298 1382 1624 0.5 1000 901 897 0.4 500 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 NR profile 0 109 146 2 0 0.09 0.18 0.27 0.36 0.45 0.54 0.63 0.72 0.81 0.9 Binned NR profile
Sparse Array :What are the key steps? Monomers Identify/ select which monomers to incorporate into the design Design Create an experimental design template define from the monomer numbers Optimise Allocate monomers into the design so define which compounds to synthesize Analyse Measure assay endpoints and build free-wilson models to understand the SAR
Design Creation (Sparse arrays) Create an in-complete balanced D-Optimal design Even numbers of monomers at each R position D Optimality Force balance Scatter Plot Level 9 of A Level 8 of A Level 7 of A Level 6 of A Level 5 of A Level 4 of A Level 3 of A Level 2 of A Level 12 of A Level 11 of A Level 10 of A Many software packages around which can generate these types of Experimental Design Level 1 of A Level 1 of B Level 2 of B Level 3 of B Level 4 of B Level 5 of B Level 6 of B Amine ID
Sparse Array :What are the key steps? Monomers Identify/ select which monomers to incorporate into the design Design Create an experimental design template appropriate to the investigational space define from the monomer numbers Optimise Allocate monomers into the design to define which compounds to actually synthesize Analyse Measure assay endpoints and build free-wilson models to understand the SAR
Which Monomer at which Position? Scatter Plot Level 9 of A Level 8 of A Level 7 of A Level 6 of A Level 5 of A Level 4 of A Level 3 of A Level 2 of A Level 12 of A Level 11 of A Level 10 of A Level 1 of A In principle monomers could be allocated in any order, including random, into the DOE array GSK use an in-house algorithmic approach to allocate monomers into the define positions in the DOE array so as to optimise the compounds to be synthesized against another property Eg diversity, lead-likeness, logp etc Level 1 of B Level 2 of B Level 3 of B Level 4 of B Level 5 of B Level 6 of B Amine ID Compounds A:Phenols B:Amines 1 Phenol 3 Amine 2 2 Phenol 6 Amine 6 3 Phenol 7 Amine 5 4 Phenol 3 Amine 3 5 Phenol 9 Amine 5 6 Phenol 6 Amine 1 7 Phenol 10 Amine 4 8 Phenol 10 Amine 2 9 Phenol 8 Amine 4 10 Phenol 5 Amine 3 11 Phenol 4 Amine 6 12 Phenol 5 Amine 1 13 Phenol 1 Amine 1 14 Phenol 11 Amine 5 15 Phenol 12 Amine 2 16 Phenol 12 Amine 6 17 Phenol 1 Amine 5 18 Phenol 8 Amine 3 19 Phenol 7 Amine 4 20 Phenol 11 Amine 6 21 Phenol 2 Amine 1 22 Phenol 9 Amine 3 23 Phenol 4 Amine 2 24 Phenol 2 Amine 4
Sparse Array :What are the key steps? Monomers Identify/ select which monomers to incorporate into the design Design Create an experimental design template appropriate to the investigational space define from the monomer numbers Optimise Allocate monomers into the design so define which compounds to actually synthesize Analyse Measure assay endpoints and build free- Wilson models to understand the SAR
Predicted FW analysis of monomer contribution A Free Wilson analysis is a regression based approach Design-Expert to Software GTPgS_CCR4_pIC50 establish monomer contributions to a predictive model Color points by value of GTPgS_CCR4_pIC50: 7.7 A high degree of fit suggests 5.5 that the potency profile could be additive in nature. The presence of outliers may imply non-additive behaviour Assess potential interaction terms between monomers if the output appears to be non-additive 7.80 7.13 6.45 5.78 5.10 2 2 2 Predicted vs. Actual 5.00 5.46 5.91 6.37 6.83 7.28 7.74 Actual
Example 1 Sparse array to evaluate defined N x M combinatorial space with a fractional subset Design 12 Indazoles (R1) Identified using classical SAR approaches Scatter Plot 48 sulphonyl chlorides monomers (R2) selected from library using a variety of criteria Lead-likeness score R2 Indazoles R1 12 monomers per R1 3 monomers per R2
Measured Potency for the Sparse array 142 of 144 compounds from patchwork array were synthesised and tested Coloured for potency, sized by ligand efficiency Clear that some Indazoles are more promising than others Array Indazole R1
Predicted potency from FW model Sparse Array Data Analysis Scatter Plot (2) 7.5 Statistical analysis was done to evaluate additivity 7 6.5 6 5.5 Free Wilson model: Predicted potencies were plotted against measured potencies The FW model show potential excellent additivity with no outliers. 5 5.5 6 6.5 7 7.5 FW-Fit:RG-All:mol8_GR203498:GTPgS_CCR4_Human_Antagonist_pic50_Value (1) Measured potency
Predicted Potency for the complete array of 576 compounds (Fit and Predict), only Actives (pic50>6.5 shown) Array RG-R2 (48 Variants) Indazole R1 RG-R1 (12 Variants)
Find the predicted most potent compounds that haven t already been synthesized Array C1 C2 C3 C4 RG-R2 (48 variants) C5 C6 C7 Indazole R1 RG-R1 (12 variants)
Predicted potent compounds All compounds subsequently synthesized had measured potencies within +/- 0.2 pic50 of the predicted value Validated the Additivity assumption Identified promising alternatives which were sent for further PK analysis potential back up to the current pre-candidate C1 Predicted GTPgS = 7.6 BEI = 16.0 Measured = 7.6 C4 Predicted GTPgS = 7.5 BEI = 14.2 Measured = 7.4 C2 Predicted GTPgS = 7.5 BEI = 13.5 Measured = 7.6 C5 Predicted GTPgS = 7.6 BEI = 15.6 Measured = 7.5 C3 Predicted GTPgS = 7.5 BEI = 14.8 Measured = 7.3
CAT friendly example Sparse Array Automation CAT : Automated array chemistry system A particular design (nicknamed the Tetris array) which is array automation friendly and thus allows these investigational approaches to be carried out efficiently from a synthetic perspective.
Exploration of Chemical space coverage for a Dual targetting programme R4 monomers (96) Scatter Plot [4*]Cc1cccnc1 [4*]Cc1ccc(Cl)cc1 Status of exploration [4*]CCn1cccn1 [4*]CCc1ccccc1OCC [4*]CCc1c[nH]c(n1)C(C)C [4*]CCN1CCOCC1 [4*]CCCOC [4*]CCC [4*]CC(=C)C [4*]C(CCO)C1CC1 [4*]C(C)C1(C)CC1 [4*]C R6 RG-R6 monomers (67) 32 R4 and 12 R6 monomers were chosen for inclusion in Sparse array
R4 Monomers CAT friendly 8 (from 32) x 12 Tetris Array The experimental design chosen is a 8 x12 chosen from a potential 32 x 12 fully enumerated array (384 potential compounds). (¼ fraction) Level 9 of A Level 8 of A Level 7 of A Level 6 of A Level 5 of A Level 4 of A Level 32 of A Level 31 of A Level 30 of A Level 3 of A Level 29 of A Level 28 of A Level 27 of A Level 26 of A Level 25 of A Level 24 of A Level 23 of A Level 22 of A Level 21 of A Level 20 of A Level 2 of A Level 19 of A Level 18 of A Level 17 of A Level 16 of A Level 15 of A Level 14 of A Level 13 of A Level 12 of A Level 11 of A Level 10 of A Level 1 of A Scatter Plot (2) Each coloured block represents one of the 32 R4 monomers Each R4 monomer is used 3 times Each R6 monomer is used 12 times 8 7 6 5 4 3 2 1 Factor Bar Chart 2 (2) 8 8 8 8 8 8 8 8 8 8 8 8 0 Factor 2 R6 Monomers
Sparse array results R6 bar Using the CAT the synthesis was done efficiently and effectively Synthesis was actually done using 8 linear (1x12) arrays 8 7 6 5 4 3 2 7 8 8 6 5 8 8 4 7 7 2 7 For the Sparse array synthesis of 77 of the 96 compounds was achieved and the compounds delivered to screening. This is approximately 75% of full sparse array Only 20% of the fully enumerated array 1 0 B94 B93 B92 B91 B9 B88 B87 B80 B76 B73 B70 B66 B63 B61 B60 B58 B57 B56 B55 B54 B46 B44 B41 B40 B39 B37 B35 B34 B33 B16 B13 B11 C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66 RG_R6 Code design C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66 RG_R6 Code
Monomer contribution The Programme team concluded that the chemistry within this area of chemical space was well understood wrt target potency. 2 1.5 1 0.5 0-0.5-1 -1.5 R4 Coeff bar B11 B16 B34 B37 B40 B44 B54 B56 B58 B61 B66 B73 B80 B88 B91 B93 RG_R4 Code The Programme team predicted potent analogues with targetted physchem profiles for synthesis 0.5 0-0.5-1 -1.5-2 R6 coeff bar C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66 RG_R6 Code
Orthogonal dual assay Further Developments Dual target The monomers chosen in the array were selected to create Primary actives but were not thought likely to have any potential in Secondary target assay However, surprisingly 14 compounds were found to be active in the second assay Currently being followed up in the programme team as potential dual antagonists 6.4 6.2 6 5.8 5.6 5.4 5.2 FW (2) Further analogue expansion around this monomer C15 C18 C21 C23 C30 C36 C39 C53 C54 C57 C60 C66 RG_R6 Code Sized by Primary potency coloured by R4 group
EXAMPLE 3 EXTENDED 3 RG TETRIS SPARSE ARRAY 3 points of change on the molecule Acids O N H Core O Phenols
Extended 3 RG Tetris Sparse array Cores (A) = 3 (These were used to explore a stereo chemistry question) Phenols (B) = 4 Acids (C) = 24 All acids represented 3 times 3x4x24 = 288 compounds 25% of full array synthesised Distribution balanced Extended TETRIS array Coloured by Acid monomer group
OTHER DESIGN TYPES Latin Squares: Symmetrical design spaces Useful for n x n x n problems where n = number of monomers in each RG position Eg 6 R1 x 6 R2 x 6 R3 A 1/n fraction is selected R2 N R1 R3
Three RG positions Latin Squares Predictive Array Design: LIPKIN, ROSE,SAR and QSAR in Environmental Research, 2002 Vol 13 (3-4) pp425-432 Scatter Plot Pie Chart B6 B5 B4 B3 B2 B1 A1 A2 A3 A4 A5 A6 Factor 1 Each possible pair of monomers is present once and that each monomer is present an equal number of times defined by the array dimensions.
Pros and Cons of Sparse array approaches Objective exploration generates an optimal data set for ANOVA / Free-Wilson analysis. Chemistry may be more difficult to carry-out Complete evaluation of potency response within the design space from only a fraction of the possible compounds Needs a reasonable resource commitment upfront Defined endpoint to the work Needs majority of compounds chosen in the array to be made and measured for the analysis to be robust Excellent data set for QSAR Assumes Additivity (but then so does Linear SAR exploration)
Learnings from experience Ideally 3 examples minimum for each monomer within the design, although 2 will work for a robust assay and chemistry Need to have confidence in getting some active compounds If all the compounds are inactive its difficult to fit a model! Confidence in ability to synthesize compounds Some loss of particular compounds can be tolerated but if whole reactions fail then the array design will be compromised
Summary Experimental Design may provide an alternative /complementary strategy which may be suitable in some circumstances E.g. Initial exploration of new monomer space Identification of back up compounds Establish Addivity in the series Efficient Lead Optimisation by exploring more than one point of change at the same time on the molecular template Can unearth some surprises which may never have been found by traditional processes There are different design types for different situations Software is available to create the designs Work well in situations where the bespoke synthesis is contracted out
Acknowledgements Tony Cooper Heather Hobbs Medicinal Chemistry Nick Barton Stephen Pickett Darren Green Computational Chemistry
References to literature to date How design Concepts can Improve Experimentation: Mager 1997 Use of iterative search techniques to find monomers which have the required levels of particular physico chemical properties to fit into an ideal experimental design Statistical Molecular Design of BB for CombiChem: Linusson, Wold etal 1999 Selection of BB s using t- scores of the enumerated library as variables and then applying D-Optimal, Space Filling or Cluster based selection strategies Predictive Array Design: LIPKIN, ROSE etal, 2002 Use of Latin Squares Simulated Annealing for Monomer assignment