Chemical library design Pavel Polishchuk Institute of Molecular and Translational Medicine Palacky University pavlo.polishchuk@upol.cz
Drug development workflow Vistoli G., et al., Drug Discovery Today, 2008, 13, 285-294
Leeson P.D. and Springthorpe B., Nature Reviews Drug Discovery, 2007, 6, 881-890
Applicability of approaches Unknown protein structure Known protein structure Unknown ligands structures Screening (bruteforce) De novo Known ligands structures Ligand-based similarity searching, pharmacophores, QSAR Structure-based Molecular docking, pharmacophores
Size of explored chemical space real datasets ~ 110 M compounds Commercial ~ 75 M compounds ~ 88 M compounds Free virtually enumerated dataset GDB-17 166 B compounds = 1.66x10 11
Estimated size of chemical space Number of compounds Limitations Method Reference size composition other 6,2 10 13 40 atoms* C, H Acyclic alkanes exhaustive enumeration H.R. Henze, C.M. Blair, 1931 [4] without stereoisomers 1,3 10 15 38 atoms* C, H Acyclic exhaustive enumeration C.M. Blair, H.R. Henze, 1932 [5]. stereoisomeric alkanes 10 21 < 7Å 40 functional groups Neurological drugs combinatorial enumeration D. F. Weaver, C. A. Weaver, 2011 [8] 10 23 36 atoms C, N, O, S, P, Se, Si, Hal Scaffold with 2 or 3 combinatorial estimation P. Ertl, 2002 [7] attachment points 10 26 50 atoms C, N, O, S, Cl - combinatorial enumeration K. Ogata et al., 2007 [24] 10 33 750 Da C, N, O, F Heptanes and hexanes including stereoisomers 10 36 36 atoms, 500 Da C, N, O, S, Hal Stable compounds including stereoisomers combinatorial enumeration Learning of exhaustively enumerated structures from GDB-17 10 60 30 atoms C, N, O, S - combinatorial enumeration 10 390 300 amino Natural amino acids Proteins Possible number of acid combinations of amino residues acids *Polishchuk, P. G.; Madzhidov, T. I.; Varnek, A. J Comput Aided Mol Des 2013, 27, 675-679 D. Weininger, 2002 [23] This work* R. S. Bohacek et al. 1996 [6] C.M. Dobson, 2004 [28] 10 82 atoms in the Universe (http://www.universetoday.com/36302/atoms-in-the-universe/)
How to select compounds for screening? Human experts?
Human experts Lajiness M.S., et. al. J. Med. Chem., 2004, 47, 4891-4896
How to select compounds for screening? Human experts? Physico-chemical filters?
Physico-chemical filters (rule of 5) Lipinski C.A., et al. analyzed 2245 small molecules which reached phase II clinical trials and setup the threshold to remain for 90% of compounds Properties relevant for absorption and permeation: Molecular weight (MW) 500 Lipophilicity (CLogP) 5 H-bond donors 5 H-bond acceptors 10
Physico-chemical filters for drug-likeness MW MlogP HBD HBA RTB nrings Formal charge TPSA, Å Year Author 500 4.15 5 10 1997 Lipinski 200-450 -2.0-4.5 5 1-8 1-9 5 2000 Oprea 200-500 -5.0-5.0 5 10 8-2 - +2 2002 (1998) Walters 10 140 2002 Veber 500 5 5 10 10 140 2003 Rishton C.A. Lipinski et. al., Advanced Drug Delivery Reviews,1997, 23, 3-25 T.I. Oprea et al., J. Comp.-Aided Mol. Design, 2000, 14, 251-264 W.P. Walters and M.A. Murcko, Adv. Drug Deliv. Rev., 2002, 54, 255-271 D.F. Veber et al., J. Med. Chem., 2002, 45, 2615-2623 G.M. Rishton, Drug Discov. Today, 2003, 8, 86-96
Lead-likeness filters Drug +69 +1.8 +1 +2 0 +1 +0.43 +0.97 ΔMW ΔCMR ΔnRings ΔRTB ΔHBD ΔHBA ΔCLogP ΔLogD 7.4 Lead Lead-like compounds: MW 350; CLogP 3 Oprea T.I. et al., J. Chem. Inf. Comput. Sci., 2001, 41, 1308 1315 Teague S.J. et al., Angew. Chem. Int. Ed., 1999, 38, 3743-3748
How to select compounds for screening? Human experts? Physico-chemical filters? Structural filters?
Structural filters Remove potentially toxic and mutagen compounds Remove metabolically liable compounds Remove false positives: interference of signal detection (e.g. dyes for fluorescent assays) reactive groups aggregates non-specific binders (sticky compounds)
reactive structures: Michael acceptors: C=C-C=O, C=C-CN, C=C-NO 2 anhydride alpha haloketone peroxide frequent hitters: more then two nitro groups dihydroxybenzene dye-like structures: two nitro group on the same aromatic ring unlikely drug candidates: large rings (>C 9 ) crown ethers conjugated alkenes: C=CC=CC=C Structural filters AstraZeneca filters Cumming J.G. et al., Nature Reviews Drug Discovery, 2013, 948 962 'ugly' halogens: 2- or 3-valent halogens triflates: SO 2 CX 3 'ugly' oxygen: 5 or more OH groups formic acid esters 'ugly' nitrogen: hydrazines (not in ring) oxime carbodiimide 3 or more guanidines 'ugly' sulfur: 5 or more S atoms disulfide thiocyanate thiol
Promiscuous compounds malarial protease, IC 50 (μm) --- 8 β-lactamase, IC 50 (μm) 0.2 10 chymotrypsin, IC 50 (μm) --- 55 IC 50 with incubation no change 22-fold IC 50 with 10x β-lactamase no change 40-fold DLS concentration, μm 100 10 particle diameter, nm no particles 394.6 ± 12.5 IC 50 in presence of guanidine --- 6-fold IC 50 in presence of urea --- 4-fold IC 50 in presence of BSA --- > 50-fold K 3 PO 4, 5mM 0.2 4 K 3 PO 4, 50mM 0.2 10 K 3 PO 4, 500mM 0.3 15 McGowarn S.L. et al, J. Med. Chem., 2002, 1712-1722
Promiscuity as a function of structure Leeson P.D. and Springthorpe B., Nature Reviews Drug Discovery, 2007, 6, 881-890
Promiscuity as a function of structure Hopkins A.L. et al., Current Opinion in Structural Biology, 2006, 16, 127 136
Pan-assay interference compounds (PAINS) Baell J.and Walters M., Nature, 2014, 513, 481-483
Pan-assay interference compounds (PAINS) Baell J.B and Holloway G.A., J. Med. Chem., 2010, 53, 2719 2740
Pan-assay interference compounds (PAINS) not passed passed Baell J.B and Holloway G.A., J. Med. Chem., 2010, 53, 2719 2740
Pan-assay interference compounds (PAINS) Baell J.and Walters M., Nature, 2014, 513, 481-483
Promiscuous compounds: conclusion Promiscuous compounds have: higher lipophilicity low complexity reactive
Dark matter of HTS libraries Macarron R., Nature Chemical Biology, 2015, 11, 904 905
Dark matter of HTS libraries Dark chemical matter is a compound which was inactive in 100 or more assays Assays Number of compounds Activity overall dark matter threshold Novartis 234 803 990 112 872 (14.0%) z-score is 2 or more PubChem 429 363 598 131 726 (36.2%) standard (IC 50 < 10μM) Quality control of Novartis dataset Wassermann A.M. et al., Nature Chemical Biology, 2015, 11, 958 966
Dark matter of HTS libraries Compounds which are centers of clusters with actives (greens) dark mater (black) Wassermann A.M. et al., Nature Chemical Biology, 2015, 11, 958 966
Dark matter of HTS libraries Active Dark Active Dark Wassermann A.M. et al., Nature Chemical Biology, 2015, 11, 958 966
Dark matter of HTS libraries Structural rules to discriminate dark chemical matter Wassermann A.M. et al., Nature Chemical Biology, 2015, 11, 958 966
Dark matter of HTS libraries Compounds were tested in 34 additional assays Hit rates and selectivity Wassermann A.M. et al., Nature Chemical Biology, 2015, 11, 958 966
Dark matter of HTS libraries: conclusion Dark matter compounds: less potent more selective If a compound is inactive in 100 assays it can be active in the next one. There is no correlation. Dark chemical matter is a valuable resource of potent and selective compounds which should be tested in higher concentrations.
How to select compounds for screening? Human experts? Physico-chemical filters? Structural filters? Chemoinformatics?
Similarity and diversity Similar property principle: structurally similar compounds tend to exhibit similar properties Select of subset of size N from dataset of size M can be done in M! N!( M N)! ways It means that there are ~10 13 ways to select 10 compounds from 100
Diverse libraries Similarity/dissimilarity measures Dissimilarity = 1 -Similarity Euclidean distance Tanimoto Diversity measures Sum of pairwise dissimilarities/distances Similarity is a property of a pair of compounds Diversity is a property of a library of compounds
Diverse libraries: basic algorithm 1. Select a compound and place it the subset (randomly, least similar, etc) 2. Calculate dissimilarity between each remaining compound and compounds in the subset 3. Choose next compounds which is the most dissimilar to compounds in the subset 4. If less then N compounds were selected, return to step 2 fast can be used in high dimensional data tend to select outliers
Diverse libraries: clustering Group compounds: compounds with the cluster are similar compounds from different clusters are dissimilar good for high dimensional data reveals natural clustering not suitable for big datasets
Diverse libraries: sphere exclusion 1. Define threshold similarity T 2. Select a compounds from the data set and place it in the subset 3. Remove all compounds with dissimilarity < T 4. If compounds left in the data set, return to step 2 T
Diverse libraries: cell-based 1. Split the whole space on cells 2. Select one (or more) compounds from each cell fast works only in low dimensional space if there are 100 dimensions (descriptors) with 2 splits in each it will be 2 100 =10 30 cells
Library filters H-bond acceptor lipophilicity MW H-bond donor metabolically liable reactive toxic PAINS diversity Physico-chemical filters Structural filters Diversity filters
PhD topic Computationally guided de novo design of compounds with desired properties Development of a platform for multi-objective optimization of compound properties (ADME, activity, selectivity, etc) based on computational models (QSAR, pharmacophore, docking, etc) Input structure(s) to be optimized QSAR model(s), docking Estimation of atoms/fragments contributions based on QSAR models, docking Transformation of structural motifs with negative influence on the target property Pharmacophore model(s), etc Property prediction Decision module: Is compound satisfy optimal criteria (profile)? Yes Selection, synthesis and testing No
Thank you for your attention
Promiscuous compounds / frequent hitters