Computational Methods and Drug-Likeness Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004
The Problem Drug development in pharmaceutical industry: >8-12 years time ~$800m costs >90% failure rate PK 39 % Animal Tox 11 % Other 5 % Adverse effects 10 % Lack of Efficacy 30 % Business 5 % Kennedy (1997) DDT 2:436-444 Provided by Dr. Andreas Reichel, Schering AG Page 2
Overview General Problem Drug-likeness, mathematical formulation and realization Molecular Properties Calculation and Approximation Countable Properties Complex Properties Practical Application Databases and Data Mining Methods Counting Methods Knowledge-based Methods Computational Methods: PCA, genetic algortihms, decision trees and other Summary, Conclusion and References Page 3
General Problem Summary Drug-likeness: Profile of common chemical, physical and/or physiological properties of successful drugs. Positive match of candidate molecule properties with such a profile points to drug-likeness. Problem: Mathematical formulation of property profiles. Specification of scoring function. Algorithmic implementation. Page 4
Realization of Drug-likeness Mathematics: N-dimensional space S spanned by molecule properties. f: S R (Score) Score may express general drug-likeness or may approximate a value for a property correlated with a drug effect. Example: Prediction of colon permeability from molecular weight and a structural property. Classification by scoring function. Page 5
Calculation and Approximation of Properties Looking for molecular properties which: 1. allow an assertion about physiological behavior of the molecule. 2. give hints towards the qualification of a molecule as Lead or drug. 3. can be calculated efficiently or readily approximated. Page 6
Elementary/Countable Properties Molecular weight (MW) Functional groups H-bonds donors and -acceptors Rotatable bonds (single covalent bonds) Page 7
Complex Properties (1) Molecular properties, which can not be deduced from a single structural feature, approximation with heuristic formulas computational methods bioinformatics Page 8
Complex Properties (2) LogP Lipophilicity of a molecule Different schemes for calculation: MlogP: Calculation with rule-based function of 13 elementary properties MlogP = 1.244 * Num_CX^0.6-1.017 * Num_NO^0.9 + 0.406 * N_O_Prox - 0.145 * Num_UB^0.8... ClogP: Calculation by structure via fragmentation of isolated C- atoms (combinatorial libraries) Page 9
Complex Properties (3) Polar surface area (PSA) Relative share of polar surface to the total molecule surface Page 10
Complex Properties (4) Molecular Fingerprints Bit-vector representation of molecule structure Daylight method: patterns for paths of different lengths within the molecule Example: OC=CN 0-bond paths: C O N 1-bond paths: OC C=C CN 2-bond paths: OC=C C=CN 3-bond paths: OC=CN Each pattern is used as a seed in a PRNG to obtain bits Summation of the bits for each pattern yields fingerprint Page 11
Complex Properties (5) Properties of combinatorial libraries: Diversity, Potency, Selectivity Diversity: Similarity measure among components of the library. Potency: Measure of efficacy of a compound. Selectivity: Measure of target specificity. Page 12
Practice: Databases World Drug Index (WDI) 60,000 drugs Comprehensive Medicinal Chemistry Database CMC 8,400 drugs with 3D models and properties Available Chemicals Database ACD 238,000 commercially available compounds Page 13
Practice: Structure Data Formats SMART / SMILE: Regular Expressions for Molecules http://www.daylight.com/ - WDI entry for VIAGRA Page 14
Practice: Data Mining (CMC Screenshots) Page 15
Counting Methods Pfizer Rule of Five USAN library: 2,245 drug-like compounds extracted from WDI database Criteria: Existing USAN or INN Name No polymeres or peptides clinical exposure Rules: A compound is not drug-like, if >= 5 H-bond donors >= 10 H-bond acceptors MW >= 500 Log P <= 5 Page 16
Knowledge-based Methods (1) Approximation of target properties with simple functions on known properties. QSPR (Quantitative Structure-Property Relationship) QSAR (Quantitative Structure-Activity Relationship) Examples: Caco2 permeability: Caco2 logp = 0.008 x MW 0.043 x PSA 5165 Human effective permeability: logp = -2.546 0.011 x PSA 0.278 x HBD Blood-Brain-Barrier permeability: logbb = -0.0148 x PSA + 0.152 x clogp + 0.139 Page 17
Knowledge-based Methods (2) Filters for functional Groups (SMART Pattern) Groups with toxic properties, undesired reactivity or chemical properties Example: Page 18
Knowledge-based Methods (3) Search for drug-like structure patterns (Frameworks) Data set: 5,120 CMC compounds Framework definition: Result: 32 frameworks describe 50% of the data. Page 19
Knowledge-based Methods (4) Frameworks Page 20
Computational: Principal Components Analysis 21_A 15_A 21_A 18_B 11_B 18_B 15_A 11_B Principal Components Analysis: Each compound is characterized by certain (selected) properties. A compound could be represented as a point in an n-dimensional space, where n is the total number of properties. PCA is used to project this high-dimensional space into three-dimensional space. The set is broken down to three principal components, where each principal component is a linear combination of defined values of the properties. The result are three components maintaining as much of the variability of the experiments as possible. Page 21
Computational: Genetic Algorithms (1) Principle: Given: Chromosomen-Space Fitness-function Algorithm: Create random initial population P Iteration until convergence: compute fitness values in P, f : discard chromosomes with unsufficient fitness apply evolutionary operations on P (mutations, crossing over) X X R f ( p), p P Page 22
Computational: Genetic Algorithms (2) Example: Optimisation of combinatorial libraries (Multi-property Optimization) A Chromosome = components of a combinatorial library f ( A) = w (1 D( A)) + w f + K Fitness function: d f1 1 D( A) = B sim( Ai, Aj) 2 ( f1( A) b X b) i j b = 1 2 f1 = N B Page 23
Computational: Decision Trees Training with 3,500 molecules from WDI and ACD each. 70-80% classification success Page 24
Computational: Other Approaches The generic nature of the problem setting allows for numerous different approaches: - Simulated Annealing - Neural Networks - Bayesian Networks Page 25
Conclusion (1) Advantges: faster and cheaper to the goal rough classification of compounds available very quickly fewer experiments early indication of possible success high troughput screening possible reduced failure rate in later stages of drug development Page 26
Conclusion (2) Disadvantages: learning from known compounds may prevent innovation drugs may not be useful to find new leads models are strongly simplified physiological interpretation is not always possible binary results of some computational methods are less useful for Lead-Design beyond first indication Page 27
References -Clark DE, Pickett SD (2000) Computational methods for the prediction of 'druglikeness' Drug Discov Today 5(2), 49 45, -Walters WP, Murcko MA (2002) Prediction of 'drug-likeness'. AdvDrug DelivRev 54(3), 255 257 -Gillet, V.J., Willett, P., Bradshaw, J. Green (1999) Selecting Combinatorial Libraries to Optimise Diversity and Physical Properties. Journal of Chemical Information and Computer Sciences. 39: 169-177 -Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 46(1-3), 3 2 Page 28